Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

219 refactor workflow automation to use berkley metadata models #253

Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
3237aee
Update workflows.yaml for berkeley
aclum Aug 19, 2024
8230b40
Berkeley compatible workflows-mt.yaml
aclum Aug 19, 2024
ad2f143
Berkeley compatibility for workflows.yaml
aclum Aug 19, 2024
afb2c48
Berkeley update import.yaml
aclum Aug 20, 2024
9fa3737
Berkeley pdate import_test.yaml
aclum Aug 20, 2024
a4b9cfd
Update test_activities.py
aclum Aug 20, 2024
5767100
Berkeley Update activity_mapper.py to match config changes
aclum Aug 20, 2024
2f47fbf
bump nmdc-schema verison to berkeley
aclum Aug 20, 2024
a58a9a5
Merge branch 'update-berkeley-configs' of https://github.com/microbio…
aclum Aug 20, 2024
99db9ee
updating poetry.lock file
aclum Aug 20, 2024
5fc6d51
update how nmdc package is imported
aclum Aug 20, 2024
05f7e8e
update nmdc import
aclum Aug 20, 2024
1194436
remove extra space
aclum Aug 20, 2024
e209d15
update some tests to berkeley
aclum Aug 20, 2024
88268df
updates to test data for berkeley
aclum Aug 20, 2024
cad7731
udpates to test_import.py
aclum Aug 20, 2024
af1e2e8
remove part_of on workflow executions
aclum Aug 20, 2024
08f9185
pinning to rc16 until release files issue is resolved
aclum Aug 20, 2024
58e9583
get most tests working for berkeley
aclum Aug 21, 2024
308d0c2
adding back data object reference to metagenome_annotation_activity_s…
aclum Aug 21, 2024
39a75fa
updating workflow endpoint from activites to workflow_executions
aclum Aug 29, 2024
e4bd43d
Merge pull request #243 from microbiomedata/221-update-workflow-autom…
mbthornton-lbl Sep 6, 2024
f0fb546
Merge in main after 195 -> main
mbthornton-lbl Sep 24, 2024
f774152
Merge branch '219-refactor-workflow_automation-to-use-berkley-metadat…
mbthornton-lbl Sep 24, 2024
12fca7e
Merge pull request #237 from microbiomedata/update-berkeley-configs
mbthornton-lbl Sep 24, 2024
24b01f4
Update test fixtures and fix failing rest_activities
mbthornton-lbl Sep 24, 2024
d00cb3d
Merge in update berkley configs and fix failing tests
mbthornton-lbl Sep 25, 2024
a79eaf8
remove commented-out assertions
mbthornton-lbl Sep 25, 2024
eca5dd7
clean up redundant test fixtures
mbthornton-lbl Sep 25, 2024
95fa281
Refactor Watcher class
mbthornton-lbl Sep 25, 2024
b59ce2d
Add WorkflowExecutionNode class. Extends nmdc_schema WorkflowExecutio…
mbthornton-lbl Sep 25, 2024
c7a2686
Update test_sched.py
mbthornton-lbl Sep 25, 2024
19d2e3d
Add models and workflow_execution_factory and unit tests and fixtures
mbthornton-lbl Sep 26, 2024
4e6122e
Add WorkflowProcessNode model, tests, and fixtures
mbthornton-lbl Sep 26, 2024
8248ed4
Update test fixtures to conform to Berkley schema standards
mbthornton-lbl Sep 26, 2024
77cbc49
Refactor to remove Activity class
mbthornton-lbl Sep 26, 2024
be22cb5
Refactor to use nmdc_schema data object
mbthornton-lbl Sep 26, 2024
87c1315
Clean up Data)bject.as_dict
mbthornton-lbl Sep 26, 2024
50d4fb5
Refactor workflows.Workflow to models.WorkflowConfig
mbthornton-lbl Sep 26, 2024
2772ede
refactor activities.py
mbthornton-lbl Sep 26, 2024
49c2e94
remove "./test_data" in tests replace with test_data fixture
mbthornton-lbl Sep 26, 2024
8530a72
refactor create_or_use_existing_job
mbthornton-lbl Sep 26, 2024
1614747
add type hints
mbthornton-lbl Sep 27, 2024
01235e5
update metaT rqc workflow version
mbthornton-lbl Sep 27, 2024
0bbf611
update tests and fixtures
mbthornton-lbl Sep 27, 2024
d1b6af5
Add beginning of Job model to represent jobs in the DB
mbthornton-lbl Sep 27, 2024
77ab904
add docstrings and minor cleanup
mbthornton-lbl Sep 27, 2024
3c1ee81
Udate MAGs version to 1.3.10
mbthornton-lbl Sep 27, 2024
bb546e3
add metaT types to activity_map
mbthornton-lbl Sep 27, 2024
dbd715f
delete unused mags fixture
mbthornton-lbl Sep 27, 2024
c8273f9
remove unused workflows_test.yaml files
mbthornton-lbl Sep 27, 2024
7a1c56d
updated workflow_process_node.git_url to return None if there is no p…
mbthornton-lbl Sep 30, 2024
cc1c385
Add Job dataclass and frame out workflow_job classes
mbthornton-lbl Sep 30, 2024
1ebc634
update a type hint
mbthornton-lbl Sep 30, 2024
dd8d78a
Delete workflow_job.py
mbthornton-lbl Sep 30, 2024
bdee984
Update workflows.yaml
mbthornton-lbl Sep 30, 2024
696c8f6
update HQMQ output definition
mbthornton-lbl Oct 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 84 additions & 84 deletions configs/import.yaml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion nmdc_automation/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .api import nmdcapi
from .config import config
from .import_automation import activity_mapper
from .workflow_automation import watch_nmdc, wfutils, workflows, activities
from .workflow_automation import watch_nmdc, wfutils, workflows, workflow_process
35 changes: 21 additions & 14 deletions nmdc_automation/api/nmdcapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,30 @@
import mimetypes
from pathlib import Path
from time import time
from typing import Union
from typing import Union, List
from datetime import datetime, timedelta, timezone
from nmdc_automation.config import Config, UserConfig
import logging


def _get_sha256(fn):
hashfn = fn + ".sha256"
if os.path.exists(hashfn):
with open(hashfn) as f:
def _get_sha256(fn: Union[str, Path]) -> str:
"""
Helper function to get the sha256 hash of a file if it exists.
"""
shahash = hashlib.sha256()
if isinstance(fn, str):
fn = Path(fn)
hash_fn = fn.with_suffix(".sha256")
if hash_fn.exists():
with hash_fn.open() as f:
sha = f.read().rstrip()
else:
logging.info("hashing %s" % (fn))
shahash = hashlib.sha256()
with open(fn, "rb") as f:
# Read and update hash string value in blocks of 4K
logging.info(f"hashing {fn}")
with fn.open("rb") as f:
for byte_block in iter(lambda: f.read(1048576), b""):
shahash.update(byte_block)
sha = shahash.hexdigest()
with open(hashfn, "w") as f:
with hash_fn.open("w") as f:
f.write(sha)
f.write("\n")
return sha
Expand All @@ -46,8 +50,10 @@ class NmdcRuntimeApi:
client_id = None
client_secret = None

def __init__(self, site_configuration: Union[str, Path]):
self.config = Config(site_configuration)
def __init__(self, site_configuration: Union[str, Path, Config]):
if isinstance(site_configuration, str) or isinstance(site_configuration, Path):
site_configuration = Config(site_configuration)
self.config = site_configuration
self._base_url = self.config.api_url
self.client_id = self.config.client_id
self.client_secret = self.config.client_secret
Expand Down Expand Up @@ -184,7 +190,7 @@ def create_object(self, fn, description, dataurl):

@refresh_token
def post_objects(self, obj_data):
url = self._base_url + "workflows/activities"
url = self._base_url + "workflows/workflow_executions"

resp = requests.post(url, headers=self.header, data=json.dumps(obj_data))
return resp.json()
Expand All @@ -206,7 +212,7 @@ def bump_time(self, obj):
return resp.json()

@refresh_token
def list_jobs(self, filt=None, max=100):
def list_jobs(self, filt=None, max=100) -> List[dict]:
url = "%sjobs?max_page_size=%s" % (self._base_url, max)
d = {}
if filt:
Expand Down Expand Up @@ -316,6 +322,7 @@ def run_query(self, query):
return resp.json()


# TODO - This is deprecated and should be removed along with the re_iding code that uses it
class NmdcRuntimeUserApi:
"""
Basic Runtime API Client with user/password authentication.
Expand Down
66 changes: 33 additions & 33 deletions nmdc_automation/config/workflows/workflows-mt.yaml
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
Workflows:
- Name: Sequencing Noninterleaved
Collection: omics_processing_set
Collection: data_generation_set
Enabled: True
Analyte Category: Metatranscriptome
Filter Output Objects:
- Metagenome Raw Read 1
- Metagenome Raw Read 2

- Name: Sequencing Interleaved
Collection: omics_processing_set
Collection: data_generation_set
Enabled: True
Analyte Category: Metatranscriptome
Filter Output Objects:
- Metagenome Raw Reads

- Name: Metatranscriptome Reads QC
Type: nmdc:ReadQcAnalysisActivity
Type: nmdc:ReadQcAnalysis
Enabled: True
Analyte Category: Metatranscriptome
Git_repo: https://github.com/microbiomedata/metaT_ReadsQC
Version: v0.0.7
WDL: rqcfilter.wdl
Collection: read_qc_analysis_activity_set
Collection: workflow_execution_set
Filter Input Objects:
- Metagenome Raw Reads
Predecessors:
Expand All @@ -30,14 +30,14 @@ Workflows:
Input_prefix: nmdc_rqcfilter
Inputs:
input_files: do:Metagenome Raw Reads
proj: "{activity_id}"
Activity:
name: "Read QC Activity for {id}"
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Read QC for {id}"
input_read_bases: "{outputs.stats.input_read_bases}"
input_read_count: "{outputs.stats.input_read_count}"
output_read_bases: "{outputs.stats.output_read_bases}"
output_read_count: "{outputs.stats.output_read_count}"
type: nmdc:ReadQcAnalysisActivity
type: nmdc:ReadQcAnalysis
Outputs:
- output: filtered_final
name: Reads QC result fastq (clean data)
Expand All @@ -57,30 +57,30 @@ Workflows:
description: "rRNA fastq for {id}"

- Name: Metatranscriptome Reads QC Interleave
Type: nmdc:ReadQcAnalysisActivity
Type: nmdc:ReadQcAnalysis
Enabled: True
Analyte Category: Metatranscriptome
Git_repo: https://github.com/microbiomedata/metaT_ReadsQC
Version: v0.0.7
Collection: read_qc_analysis_activity_set
Collection: workflow_execution_set
WDL: interleave_rqcfilter.wdl
Input_prefix: nmdc_rqcfilter
Inputs:
proj: "{activity_id}"
proj: "{workflow_execution_id}"
input_fastq1: do:Metagenome Raw Read 1
input_fastq2: do:Metagenome Raw Read 2
Filter Input Objects:
- Metagenome Raw Read 1
- Metagenome Raw Read 2
Predecessors:
- Sequencing Noninterleaved
Activity:
name: "Read QC Activity for {id}"
Workflow Execution:
name: "Read QC for {id}"
input_read_bases: "{outputs.stats.input_read_bases}"
input_read_count: "{outputs.stats.input_read_count}"
output_read_bases: "{outputs.stats.output_read_bases}"
output_read_count: "{outputs.stats.output_read_count}"
type: nmdc:ReadQcAnalysisActivity
type: nmdc:ReadQcAnalysis
Outputs:
- output: filtered_final
name: Reads QC result fastq (clean data)
Expand All @@ -106,16 +106,16 @@ Workflows:
Git_repo: https://github.com/microbiomedata/metaT_Assembly
Version: v0.0.2
WDL: metaT_assembly.wdl
Collection: metatranscriptome_assembly_set
Collection: workflow_execution_set
Predecessors:
- Metatranscriptome Reads QC
- Metatranscriptome Reads QC Interleave
Input_prefix: jgi_metaASM
Inputs:
input_files: do:Filtered Sequencing Reads
proj: "{activity_id}"
Activity:
name: "Metatranscriptome Assembly Activity for {id}"
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Metatranscriptome Assembly for {id}"
type: nmdc:MetatranscriptomeAssembly
asm_score: "{outputs.stats.asm_score}"
contig_bp: "{outputs.stats.contig_bp}"
Expand Down Expand Up @@ -153,23 +153,23 @@ Workflows:
description: "Alignment index file for {id}"

- Name: Metatranscriptome Annotation
Type: nmdc:MetatranscriptomeAnnotationActivity
Type: nmdc:MetatranscriptomeAnnotation
Enabled: True
Analyte Category: Metatranscriptome
Git_repo: https://github.com/microbiomedata/mg_annotation
Version: v1.1.4
WDL: annotation_full.wdl
Collection: metatranscriptome_annotation_set
Collection: workflow_execution_set
Predecessors:
- Metatranscriptome Assembly
Input_prefix: annotation
Inputs:
input_file: do:Assembly Contigs
imgap_project_id: "scaffold"
proj: "{activity_id}"
Activity:
name: "Metatranscriptome Annotation Analysis Activity for {id}"
type: nmdc:MetatranscriptomeAnnotationActivity
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Metatranscriptome Annotation Analysis for {id}"
type: nmdc:MetatranscriptomeAnnotation
Outputs:
- output: proteins_faa
data_object_type: Annotation Amino Acid FASTA
Expand Down Expand Up @@ -282,7 +282,7 @@ Workflows:
Git_repo: https://github.com/microbiomedata/metaT_ReadCounts
Version: v0.0.5
WDL: readcount.wdl
Collection: metatranscriptome_expression_analysis_set
Collection: workflow_execution_set
Predecessors:
- Metatranscriptome Annotation
Input_prefix: nmdc_expression
Expand All @@ -291,8 +291,8 @@ Workflows:
map: do:Contig Mapping File
bam: do:Assembly Coverage BAM
rna_type: "aRNA"
proj: "{activity_id}"
Activity:
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Metatranscriptome Expression Analysis for {id}"
type: nmdc:MetatranscriptomeExpressionAnalysis
Outputs:
Expand All @@ -312,16 +312,16 @@ Workflows:
Git_repo: https://github.com/microbiomedata/metaT_ReadCounts
Version: v0.0.5
WDL: readcount.wdl
Collection: metatranscriptome_expression_analysis_set
Collection: workflow_execution_set
Predecessors:
- Metatranscriptome Annotation
Input_prefix: nmdc_expression
Inputs:
gff_file: do:Functional Annotation GFF
map: do:Contig Mapping File
bam: do:Assembly Coverage BAM
proj: "{activity_id}"
Activity:
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Metatranscriptome Expression Analysis for {id}"
type: nmdc:MetatranscriptomeExpressionAnalysis
Outputs:
Expand All @@ -341,7 +341,7 @@ Workflows:
Git_repo: https://github.com/microbiomedata/metaT_ReadCounts
Version: v0.0.5
WDL: readcount.wdl
Collection: metatranscriptome_expression_analysis_set
Collection: workflow_execution_set
Predecessors:
- Metatranscriptome Annotation
Input_prefix: nmdc_expression
Expand All @@ -350,8 +350,8 @@ Workflows:
map: do:Contig Mapping File
bam: do:Assembly Coverage BAM
rna_type: "non_stranded_RNA"
proj: "{activity_id}"
Activity:
proj: "{workflow_execution_id}"
Workflow Execution:
name: "Metatranscriptome Expression Analysis for {id}"
type: nmdc:MetatranscriptomeExpressionAnalysis
Outputs:
Expand Down
Loading
Loading