-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IBCDPE-947] GX Validation Record Keeping #135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work here! Just some comments
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 LGTM!
Problem:
As the
agora-data-tools
pipeline is run more and more, the versions of output files and GX reports are piling up. It is important that we are able to identify which files were produced during a particular run, and a record-keeping solution is therefore necessary.Solution:
Design Doc
Implement a method of record keeping within ADT that will upload metadata about a processing run to a Synapse Table in the Agora Project. This way, we will keep a historical record of what files were produced during which ADT runs and we will be able to look up the specific GX report files for those runs.
The majority of logic for this new feature is contained within two new classes:
DatasetReport
: Contains all of the fields needed to populate one row of the Synapse Table. There will be one of these per GX-enabled dataset from each ADT run.ADTGXReporter
: Contains all of the fields common for allDatasetReport
s in a run, and also performs the updating of the Synapse table in the end.Notes:
run_id
has been added to help keep track of specific GH Actions and Nextflow Tower runs. nf-agora PR.gx_table
parameter to the config files so that testing runs and live runs have their records stored separately.Platform
enum is moved to its own module to prevent circular import issues.Future Work:
While working on this feature, a couple of issues came up I have created tickets to track: