Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: INFENG-382: Release redesign #10002

Merged
merged 27 commits into from
Oct 18, 2024
Merged

Conversation

davidfluck-hpe
Copy link
Contributor

Ticket

INFENG-382 (epic)
INFENG-807 (epic)
See also: individual tickets from PRs linked in commits

Introduction

This changeset represents a significant overhaul to the Determined release process.

History

Early on in this work, I decided that it would be easier to maintain a long-running feature branch and merge relevant functionality into it over time, instead of attempting to fiddle with build/release functionality on main while somehow also gating it so it doesn't trigger before it was ready. I still believe this was prudent. As a result, though, there is now a feature branch with a significant amount of changes (and I apologize for the large diff).

I'm hoping that any big issues and questions were resolved in the individual pull requests I made along the way. Other than some last-minute model_hub removal due to a messy rebase, this diff should be a union of everything in those individual pull requests. Please consult the closed PRs before commenting to see if your issue was already addressed there.

I will still attempt to summarize the big changes here, and provide another opportunity for folks to look things over before merging what is a significant amount of work.

Description

I had the following goals when redesigning the release process:

  • Make it easier to tag commits to do releases.
  • Obviate the need for additional end-user release tooling.
  • Remove the bump2version dependency so releases are self-contained and not reliant on a static, on-disk version number. The tag represents the version, and everything downstream uses that instead.

I didn't quite achieve that last bullet point, mostly because of how release notes work now, but other than that, I think I've addressed all of these. I'll describe in detail the changes, and how Determined developers should expect to interact with the build and release process going forward.

No more bump2version and VERSION file

bump2version (colloquially: bumpversion, even though both bumpversion and bump2version are deprecated, replaced by bump-my-version) looks for version strings based on specified files or glob patterns and then modifies those files with new version strings, in a prescribed order. Part of the release process was a complicated state machine of bumpversion invocations to nudge different parts of the version string up depending on what we were doing.

Although tracking version numbers on disk is an option (the Git project, for example, does this1), it also has downsides. It means that every release needs at least one additional commit each time a version number is incremented, along with an increasing number of files across the repository modified each time. It's also possible for it to become out of sync with the version that's being built, depending on how tagging is performed.

I would rather have git tags be the source of truth for version strings, especially since that's how we trigger release jobs anyway, so that's what I've attempted here.

The VERSION file and bumpversion have been replaced with a shell script, version.sh, that, by default, returns the next local version number. Setting the VERSION environment variable will override this behavior. The script itself has extensive comments, but it tries to be sensitive to where you are within the git DAG. If, for example, your current branch was made off of main after 0.35.0 and before 0.36.0, then it would return something like 0.35.0+5df5142c38, where 5df5142c38 is a short hash of HEAD.

Instead of reading from VERSION, any builds that relied on that file instead run version.sh. If CircleCI is building from a feature branch, it will do the right thing with the local version string. If it's a proper release, then the version itself will be set from a tag further back in the process.

Going forward, if you need to work with a version string, attempt to get it from version.sh or via tag. Please talk to me if you need help designing around this constraint, I'm happy to help.

1 The Git project also tags main for their releases, which simplifies a lot of things. They also only have a single version string in one file.

version-switcher.json has been removed

Instead of storing version-switcher.json on disk and updating it every time, I wrote a script called gen-versions.py during a build to generate the existing static one up until this point, and then it will consult git for all subsequent versions. (We might need to talk about how this works with patch releases.)

CircleCI

.circleci/real_config.yml has several changes.

I consolidated some of the tag regular expressions. I also added a v prefix to the pattern that will trigger release builds. This is important.

I added a make-component job to somewhat modularize all of the various ways we were running make across the project. It also made it easier to add dry-run release job functionality, by having a single place to toggle a boolean dryrun variable. (Under the hood, if dryrun: true, it just appends -dryrun to the given make target. It's not perfect, but it works.)

I added a release-dryrun workflow, which works similarly to the release workflow, except it (mostly) pushes artifacts to non-production locations (GitHub, PyPI, Docker Hub, etc.). There's a small clerical issue with the determined package being pushed to test.pypi.org, which means one particular job will technically fail every time, but once we resolve the issue, that should be fixed. It doesn't particularly affect the efficacy of the workflow itself.

Dry-run build steps

Some of the more verbose changes are to support the previously-mentioned release-dryrun workflow. This necessitated entirely separate files for dry-run GoReleaser configurations, as well as separate -dryrun suffixed make targets for all build steps that were necessary for a full production release.

Going forward, assuming we want the release-dryrun workflow to continue to work, we'll need to keep in mind that we have to update the dry-run functionality that exists in various makefiles and GoReleaser configs along with actual production updates to the build. Otherwise, the two workflows will begin to drift.

determined Python package build changes

I moved the determined (harness/) package to PyProject. This was more relevant before I made some changes, but I believe PyProject is becoming the blessed way to configure Python projects going forward anyway, so I've left it.

The determined package also returns its own version differently now. Instead of opening VERSION and outputting it, it uses importlib.metadata to grab the version string from its own package metadata, which should always exist as long as the package has been built. (So it's effectively the same thing, it's just using a canonical file that setuptools generates during a build, instead of something we maintain ourselves.)

I've also removed all __version__.py files. Although the __version__ dunder is discouraged (and PEP-396 to standardize it was rejected), I've kept that in because that would cause some serious downstream breakage for anyone who relies on it, and migrating to something else would be unnecessary work.

Things to pay attention to

Some of these pieces are more critical than others. In general, I'd recommend giving a closer look at the following:

  • version.sh
  • docs/gen-versions.py and docs/gen-versions.sh
  • harness/setup.py
  • Anything that invokes version.sh and does something with that string.
  • harness/determined/__init__.py and, in particular, the importlib.metadata version-reading functionality.

Test Plan

I hope that the test plans I provided for the individual constituent pull requests were adequate. I've also done extensive testing over the last few months, including fixing a lot of broken tests and build steps along the way. Given the global and production-impacting nature of these changes, I'm not sure I can provide an overarching test plan for everything here. I might recommend consulting the individual linked PRs and following their test plans. Otherwise, I plan to help out with or captain the next release, which will be a good debut for everything here.

Checklist

  • Changes have been manually QA'd
  • New features have been approved by the corresponding PM
  • User-facing API changes have the "User-facing API Change" label
  • Release notes have been added as a separate file under docs/release-notes/
    See Release Note for details.
  • Licenses have been included for new code which was copied and/or modified from any external code

…#9442)

INFENG-704: Remove bumpversion

* Update determined Python package to fetch its own version dynamically
  from package metadata using importlib.metadata, which seems to be the
  canonical way to fetch the version string at runtime. In lieu of
  something more complicated, I assigned a module-level __version__
  variable in __init__.py to avoid touching the rest of the package code
  that relies on this behavior.

* Migrate all of the setup.py options into the existing pyproject.toml,
  except options that are no longer necessary, leaving the
  long_description in because pyproject.toml seems to be unable to
  reference README files above the pyproject.toml root.

* Remove harness/determined/__version__.py and
  model_hub/model_hub/__version__.py, as they are no longer
  necessary. The __version__ dunder is now provided by __init__.py. See
  PEP-396 <https://peps.python.org/pep-0396/> for more information about
  why this is no longer desirable.

* Add explicit dependencies on setuptools>=64 and setuptools_scm>=8 in
  harness/pyproject.toml and model_hub/pyproject.toml to allow tagging
  versions dynamically from git.

* Add setuptools==70.0.0 and setuptools-scm==8.1.0 as dependencies to
  support setuptools dynamically using git tag information.

* Set build-backend to setuptools.build_meta to use setuptools_scm.

* Move all setuptools.setup() args into pyproject.toml, because newer
  version of setuptools, if I understand correctly, have support for
  them.

* Remove bumpversion config for setup.py and __version__.py.
INFENG-709: Remove CloudFormation versions

* Remove hardcoded Determined versions from CloudFormation
  templates. Because we're already passing through an optional command
  line parameter anyway, we can instead set that default value to the
  existing determined.__version__ variable pulled from package metadata,
  instead of setting a default value in the `Version` parameter of
  CloudFormation templates.
…9515)

INFENG-726: Generate versions.json from git tags

This changeset introduces a new Python script, docs/gen-versions.py, to
generate the existing versions.json file from git tags, while
maintaining the previous structure of the file and incorporating new
tags as they get created.

My initial thought was to use the existing versions.json file as a
template, and then append subsequent entries to it from new git tags as
they get created and pushed. This felt brittle, though, and very special
case-y.

Instead, I use the GitPython version of git-rev-list to walk the DAG
from all tags back to right before 0.21.0. I then filter out all tags
that aren't a major, minor, or patch release, and exclude versions that
have tags, but without corresponding entries in the original
versions.json file. This gives us all relevant doc tags between now and
0.21.0 (inclusive), which we can use to generate versions.json. It also
works for any new tags we use going forward.

The resultant file still winds up where it normally would, and should
get packaged and released like it would if it were stored in git on disk.

The summary of changes is as follows:

* Add gen-versions.py, which generates versions.json from git tags.

* Add a new docs/Makefile target, gen-versions, to run gen-versions.py
  as part of the docs build process.

* Remove versions.json from .bumpversion.cfg.

* Remove versions.json.
INFENG-739: remove version string from Vite config

* Replace the process.env.VERSION variable explicitly defined in
  vite.config.mts with either the actual VERSION environment variable,
  if it's set, or the most recent git tag, followed by an optional
  commit counter and partial unique SHA hash. This shells out to git,
  and accounts for the rare but possible case that the config is being
  from outside of a git repository. In that situation, the version
  string is useless anyway, so we return "unknown" from the shell.

* During the release, we will set VERSION ourselves based on the tag, so
  we don't have to go over the top trying to parse it out here. The
  local version doesn't matter too much, either, as it's only referenced
  during development, and, as far as I can tell, is really only used to
  display the current version to the user. During the release, we can
  set VERSION from CircleCI directly, as it already provides the tag if
  the run was triggered by one.
INFENG-744: Remove VERSION file

* Remove dependency on static VERSION file and replace it with a bash
  script, version.sh, that either fetches the version from git tags,
  usually used for local builds, or returns an existing environment
  variable, mainly used in CI.

* The script uses a few different git tools to find the most recent
  previous tag for a given commit, a combination of git-describe,
  git-tag, and git-merge-base. Additional implementation details are
  available in the comments in that file. The resulting version, though,
  is <prev-tag>+<HEAD-short-SHA>. This ensures that certain Python
  dependency constraints are able to be satisfied when building
  Determined locally. Otherwise, if VERSION is already set, as will be
  the case in CI, the script simply returns that instead.

* Replace existing `cat`s of the VERSION file with invocations of this
  shell script; specifically, in the following files:

- .devcontainer/server.sh
- .github/workflows/lint-python.yml
- agent/Makefile
- docs/Makefile
- docs/deploy/Makefile
- helm/Makefile
- master/Makefile
- model_hub/Makefile

* Invoke version.sh in the Python Determined module Makefile:
  harness/Makefile. This was necessary to replace my previous attempt at
  fetching version string information from git using a Pythom setuptools
  plugin called setuptools_scm. Although that library works and is quite
  handy, and I sincerely thank Ronny Pfannschmidt and all other
  contributors for their work, configuring it was non-trivial, at times,
  and it was becoming difficult to cleanly get it to do what I wanted. I
  also decided that it makes more sense to have a single, canonical
  version string (not unlike the VERSION file itself) that we use for as
  many things as possible, notwithstanding certain potential
  incongruences between how different systems parse version strings.

  Now, instead of using setuptools_scm, we read the environment
  variable in harness/setup.py, and pass that to setuptools as a
  dynamic version. This also helps ensure that there aren't multiple
  ways to fetch a canonical version, which could become out of sync
  if we aren't careful. It does, however, mean that you need to use
  the Makefile to build the determined module; if you want to invoke
  `python -m build` directly, you'll have to set VERSION
  yourself. This is unlikely. If VERSION is unset, setup.py will
  raise an exception.

* Remove the --version-file flag from docs/deploy/upload.py. I don't
  think it's even used, but it's certainly unnecessary now.

* Replace reading the VERSION file in docs/deploy/upload.py with
  fetching the VERSION environment variable, which should be set by the
  docs makefile. I've done the same thing in docs/conf.py.

* Although I've described what version.sh does, the diff will show some
  minor tweaks, like editing comments and moving some tag-munging
  commands around. I've also removed `set -x`, because I forgot that
  executing the script locally, even in a subshell, will set shell
  options globally, and apply after the script exits. This is
  undesirable. There are ways to mitigate this, but it's not a pressing
  issue.

* Remove VERSION file from .bumpversion.cfg.
INFENG-743: Remove Helm chart bumpversion

* Update Chart.yaml version strings to dummy values: 0.0.0, and instead
  pass VERSION in through --version and --app-version flags during helm
  package in helm/Makefile.
* Remove bump2version ("bumpversion", colloquially) dependency, both
  literally from requirements.txt and also the last remaining reference
  in its config file, .bumpversion.cfg, along with the config file
  itself. The remaining reference was in .circleci/real_config.yml.
  Removing it, though, was somewhat tricky, and had a few knock-on
  effects that had to be solved.

  The version string itself was a default for the pipeline parameter
  det-version, which is threaded through the entirety of the build
  process as a source of truth for the Determined version
  information. Instead of updating the CircleCI config itself, we can
  dynamically generate it in .circleci/config.yml and set the parameter
  there. There's still some work left to do to get tag builds working
  correctly, but for now, I call version.sh to set det-version, which
  should work correctly for all branch builds, by returning the local
  version string, e.g. 0.34.0+2e616a3d7

* Update version.sh to fix a small bug that caused git tags with
  subsequent 'v's in them, e.g. v1.0.3-dev0, to be munged incorrectly
  into 1.0.3-de0. Now, we use ${parameter#word} shell expansion to
  remove the first singular 'v' from tags.

* Update two direct invocations of setup.py in CircleCI to instead use
  the build module. Invoking setup.py directly is deprecated.

* Update the agent creation code to URL-escape the query parameter
  that's used to set the agent version. Under the new versioning scheme,
  local build versions use a short SHA after a plus + for uniqueness,
  and the plus sign was being interpreted as being part of a URL, where
  it was then turned into a space. This broke tests that attempt to
  fetch the agent version (it's also just plain incorrect). Now, the
  version string is properly escaped, and the proper agent version is
  returned.

* Update how setup.py and the determined (harness/) Python package
  manage versions. I kept running into test errors that indicated that
  the correct version wasn't being set somewhere in setuptools. At
  first, I mistakenly assumed that pytest was, somehow, using the bare
  source for testing. This is incorrect, though (and makes no sense
  anyway). pytest still needs to import packages, and it does so after
  we pip install an editable install of determined. Editable installs
  still end up calling setuptools, which means the built package has
  package metadata, and so we can still use importlib.metadata locally
  to fetch version information.

  The only bit that really had to be changed was harness/setup.py and
  model_hub/setup.py. I did tweak harness/determined/__init__.py, but
  that's just to catch the possible
  importlib.metadata.PackageNotFoundError exception. I believe this will
  never happen, but I've been wrong before. And there really is no
  excuse for __init__.py to ever do anything that isn't successful. So
  now, and it feels mildly gross, but is not too different from how
  other large projects like numpy manage versions, setup.py simply
  shells out to version.sh to get a version number during package build,
  if VERSION isn't already set. I realize that I also check VERSION in
  version.sh, but a belt and suspenders won't hurt. If VERSION is set,
  we use that; otherwise, we get the local (+SHA) version.

  This enables us to do editable installations, which a bunch of things
  assume anyway, without having to ensure VERSION is set. Before,
  setup.py assumed VERSION was set, or else it would error out. But when
  you pip install -e harness directly, like in CI or even locally (which
  can happen with pip install -r requirements.txt, as -e harness is the
  first line), that variable won't necessarily be set. So instead,
  setup.py checks VERSION, and then, if it's empty, attempts to fetch
  the version itself.

  If setup.py and __init__.py can't retrieve versions correctly, they'll
  return default values. setup.py will return 1!0.0.0+unknown, whereas
  __init__.py will return 2!0.0.0+unknown. My hope is that using two
  different versions can help indicate where the error is, and using
  epochs should make package requirement resolution still pass, as both
  1 and 2 are greater than the implicit 0 epoch that our version numbers
  use now.

* Update/remove comments that reference bumpversion.

* Add additional VERSION environment variable to CircleCI config.

* Update Python file to comport with black.

* Add return types to functions in setup.py to satisfy mypy.
* Update package-and-push-system-dev and package-and-push-system-dev-ee
   CircleCI jobs to build dev Docker images correctly. Specifically, fix
   a bash if statement to check for if we're running the job from the
   "main" branch or a branch prefixed with "release-". The previous
   behavior would match on any git branch regardless.
Some of my changes from months ago included modifications to how we
build and release model_hub. Subsequently, model_hub was removed
entirely. This changeset removes the last vestiges of model_hub that
crept in during my rebase.

* Modify .circleci/real_config.yml to remove remaining references to
  model_hub.

* Remove model_hub directory and contents.
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci added the documentation Improvements or additions to documentation label Oct 1, 2024
@determined-ci determined-ci requested a review from a team October 1, 2024 02:08
Copy link

netlify bot commented Oct 1, 2024

Deploy Preview for determined-ui canceled.

Name Link
🔨 Latest commit da9ca68
🔍 Latest deploy log https://app.netlify.com/sites/determined-ui/deploys/67080afe8949610008493382

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 54.58%. Comparing base (25ca6d0) to head (da9ca68).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
agent/internal/agent.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10002      +/-   ##
==========================================
- Coverage   54.59%   54.58%   -0.02%     
==========================================
  Files        1258     1257       -1     
  Lines      157136   157140       +4     
  Branches     3617     3615       -2     
==========================================
- Hits        85795    85770      -25     
- Misses      71208    71237      +29     
  Partials      133      133              
Flag Coverage Δ
backend 45.32% <0.00%> (-0.06%) ⬇️
harness 72.75% <100.00%> (+<0.01%) ⬆️
web 54.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
harness/determined/__init__.py 100.00% <100.00%> (ø)
harness/determined/core/_log_shipper.py 35.24% <ø> (ø)
harness/determined/deploy/aws/cli.py 17.34% <100.00%> (+0.48%) ⬆️
master/pkg/tasks/copy.go 31.70% <ø> (ø)
agent/internal/agent.go 0.00% <0.00%> (ø)

... and 9 files with indirect coverage changes

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

docs/README.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1,118 +0,0 @@
[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's talk about what happens for patch releases, as said in your description: I think anytime we do a release, the version switcher should be updated to include it. And we probably shouldn't retire releases from docs as quickly as we will stop maintaining them. IMO, we should consider populating the list from the list of available versions in S3 plus the one currently being built. This requires checking right before publishing, so that if we do a couple of releases in short-ish succession, we don't overwrite one of them because it's docs weren't actually published yet, and the build process of the other release didn't know about it. And then we would just need to decide when (if ever) we retire a release from docs.

What are your thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm familiar enough with the docs publishing part of the automation to answer reliably as part of this review. What exactly would we be reading from S3? With this current changeset, though, every new minor or patch release will get a corresponding docs version published, which includes appending to the version picker versions.json file. This all happens without relying on external services, because the git tags should be the source of truth for which versions we've published. But there's extra docs automation happening that I didn't look at as part of this work (mostly because it never came up), and that might require alteration, ideally as a separate pull request. If we want to pick and choose versions for docs, I would argue that we should then decouple the docs entirely from the Determined release process.

I'm also not sure if the existing documentation automation will handle jumping around between versions well (e.g. if we do a patch release on an older, minor release branch), because of the assumptions that code still makes. I'd need to spend more time looking closely; the docs automation is doing a bunch of stuff that I haven't needed to look at yet, and I don't think any of those necessary changes would make sense to go in as part of this PR.

version.sh Show resolved Hide resolved
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci determined-ci requested a review from a team October 7, 2024 17:02
  same archive artifact naming scheme as the production artifacts use.
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

  harness/setup.py. This might fix a Windows test.
@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@azhou-determined
Copy link
Contributor

is there a reason why we're not deleting the VERSION file?

@determined-ci
Copy link
Collaborator

Docsite preview being generated for this PR.
You can (eventually) find the generated docsite here.

@davidfluck-hpe
Copy link
Contributor Author

Ha! Four-ish months of work and of course I forgot to delete the VERSION file. Thank you.

@davidfluck-hpe davidfluck-hpe merged commit 91e358a into main Oct 18, 2024
85 of 101 checks passed
@davidfluck-hpe davidfluck-hpe deleted the INFENG-382-release-redesign branch October 18, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants