-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Add --exclude-from-upload
and --exclude-from-download
options
#370
Conversation
13064be
to
8bc1632
Compare
Ah, this has a bug around absolute vs. relative path handling and pattern specification. I'll address it and repush. Otherwise, though, still ready for 👀 (just not testing). |
Excluding files from upload is very handy for ad-hoc skipping of large ancillary files or previous output files in the build directory that the user wants to ignore for the remote build on AWS Batch (i.e. to start it "fresh"). Existing workarounds for the lack of an exclusion mechanism include git worktrees to obtain a clean state and moving files or directories temporarily out of the build directory. A future improvement would be adding support to also specify these patterns via a file, which can then be checked in to a build repo and shared by all users. As a broader improvement, I'd like to design away this issue by separating out the workflow-as-program (rules, code, etc.) from the workflow-as-state (config, inputs, outputs, etc.). This comes out of several other goals, but would apply to this "excluded files" need too. For example, instead of relying on the implicit state combined with the code like now in a single build directory, we'd instead explicitly pass an empty starting state separate from the workflow program. This change includes a small bug fix for --download to allow _only_ negated patterns. Resolves: <#219>
8bc1632
to
6d465f0
Compare
Bug squashed in repush. |
@lmoncla @jameshadfield @trvrb If you want to test this out before merge/release to make sure it works for you, you can temporarily install the CLI built from this PR with:
Ignore the shell setup instructions printed at the end. To test, run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good by inspection. Nice to see the implementation wasn't too far from how I thought.
@@ -97,13 +97,67 @@ options | |||
wildcards (``**``), extended globbing (``@(…)``, ``+(…)``, etc.), | |||
and negation (``!…``). | |||
|
|||
Patterns should be relative to the build directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially tangential to this PR
Reading this code my understanding is that only the build directory is uploaded, and so patterns are relative to that. How does this work when the build directory is (e.g.) ingest/
but within that the Snakefile refers back to a parent directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a great point. The directory here that matters—the one that's uploaded and one that patterns must be relative to—is influenced by the nextstrain-pathogen.yaml
file. For example, running nextstrain build ingest/
in a directory with nextstrain-pathogen.yaml
will require patterns be relative to the directory containing the marker file, not the ingest/
directory, i.e. to exclude something in ingest/
would require the pattern to be ingest/x/y/z
not just x/y/z
.
I'll update the docs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I want to leave the option docs/help as-is for now for this PR. I'm realizing there's larger documentation reworking to do to clarify terminology here as a result of #355 (something I'd postponed in that PR). Writing a clear help snippet for these pattern options will be much easier and less inconsistent/confusing once the larger terminology shift and doc updates happen. I should prioritize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 totally forgot about this subtlety. Is this behaviour mirrored in what directories are bind-mounted into the docker container (when running locally)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
Before #355 the directory given to nextstrain build
ended up as /nextstrain/build
in the container and /nextstrain/build
was the fixed working directory for the container process.
After #355, the /nextstrain/build
mount is either the directory given to nextstrain build
or, if present, the closest ancestor of the given directory which contains a nextstrain-pathogen.yaml
. The working directory for the container process is the /nextstrain/build
path corresponding to the directory given to nextstrain build
.
This feature will be in the being-released-as-we-speak 8.4.0. |
Excluding files from upload is very handy for ad-hoc skipping of large ancillary files or previous output files in the build directory that the user wants to ignore for the remote build on AWS Batch (i.e. to start it "fresh"). Existing workarounds for the lack of an exclusion mechanism include git worktrees to obtain a clean state and moving files or directories temporarily out of the build directory.
A future improvement would be adding support to also specify these patterns via a file, which can then be checked in to a build repo and shared by all users.
As a broader improvement, I'd like to design away this issue by separating out the workflow-as-program (rules, code, etc.) from the workflow-as-state (config, inputs, outputs, etc.). This comes out of several other goals, but would apply to this "excluded files" need too. For example, instead of relying on the implicit state combined with the code like now in a single build directory, we'd instead explicitly pass an empty starting state separate from the workflow program.
This change includes a small bug fix for --download to allow only negated patterns.
Resolves: #219
Checklist