-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IBCDPE-946] Adds Optional Uploading #133
Conversation
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I'm less good at knowing what to look for in the testing functions so maybe someone else can weigh in on that part of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM! Excellent - just to confirm, the upload parameter just limits to the GX reports? Or is it also the output files?
@thomasyu888 It prevents both from being uploaded. |
Problem:
Every time the
agora-data-tools
processing pipeline is run, new output data files and GX report files are produced and uploaded to Synapse. This even happens when developers run the pipeline locally for testing. This causes an unnecessarily large number of files to be added to Synapse making it harder than it needs to be to find the output files from a particular run.Solution:
Add an
upload
option to the CLI command and make it so that files are not uploaded by default. Additionally, add aplatform
argument to the CLI command so that the user can indicate whether their run isLOCAL
or being executed onGITHUB
orNEXTFLOW
(Defaults toLOCAL
).The idea now is that when the pipeline is being run locally, it can be run exactly as before:
This command does not change at all because the default behavior is
platform = LOCAL
andupload = False
When we want file uploading to occur for testing purposes, we can make the following changes:
Note that I have provided a different configuration file this time. That is because in this instance I am a developer who needs to test the file uploading capabilities of the pipeline and I am providing a different configuration file with the
destination
andgx_folder
fields pointing to a testing Synapse project that I have so that my test runs do not create new versions of the files in the Agora Synapse project.Finally, for the Nextflow Tower execution, the command will look like (
nf-agora
PR):and the GitHub Action command is now (this PR):
Notes:
destination
argument where a developer could change the upload destination on Synapse without changing the configuration file, but this way is simpler to implement and sticks with our current paradigm where the configuration file dictates pipeline behavior.platform
argument is only used to throw a warning if you haveplatform
set toLOCAL
andupload
isTrue
to remind users to not upload files from a local run unnecessarily. This will be used more extensively in IBCDPE-947.CONTRIBUTING.md
. The actual change I made is lines 78-88.