Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Install subworkflows with modules from different remotes #3083

Open
wants to merge 71 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
9825d89
tests: Add test case for cross-organization subwf
jvfe Jun 21, 2024
568363a
tests: Check module on module list instead
jvfe Jun 28, 2024
534da92
chore: Dummy commit to test action
jvfe Jul 1, 2024
286ab86
Revert "chore: Dummy commit to test action"
jvfe Jul 1, 2024
423517e
tests: Change installed subworkflow
jvfe Jul 2, 2024
6816cc1
Merge pull request #6 from sanger-tol/cross-org-test
jvfe Jul 8, 2024
af816fd
feat: Read from meta.yml in get_components_to_install
jvfe Jul 8, 2024
d0e6031
refact: Change module return type to always be dict
jvfe Jul 15, 2024
1737c88
fix: Add correct return type
jvfe Jul 15, 2024
f7aaea2
fix: Use any for values
jvfe Jul 15, 2024
d1c490d
test: Fix module key in across orgs
jvfe Jul 15, 2024
6fa5cf4
refact: Reset modules repo when git_remote not defined
jvfe Jul 15, 2024
d903282
refact: Copy parent attribute
jvfe Jul 15, 2024
7f095fa
refact: Keep old strategy as fallback
jvfe Jul 15, 2024
012e9d6
refact: Check component not in subwf list
jvfe Jul 15, 2024
d77fccb
refact: Change return type to optional
jvfe Jul 24, 2024
faa7faf
refact: Change way of handling dicts in modulesjson
jvfe Jul 24, 2024
0f08bbe
refact: Handle dicts in meta_yml lint
jvfe Jul 24, 2024
6d052de
fix: Pass branch in install too
jvfe Jul 28, 2024
c6ce67f
fix: Check if module in meta.yml is imported
jvfe Aug 5, 2024
792f68b
Merge pull request #9 from sanger-tol/fix/duplicated-downloads
jvfe Aug 5, 2024
7d23f15
Merge branch 'dev' into fix/1927
jvfe Aug 5, 2024
6f9a60e
Merge branch 'dev' into review-round-1
jvfe Aug 10, 2024
fca6f27
refact: Define org path based on git remote
jvfe Aug 10, 2024
092bf91
feat: Allow defining branches in meta.yml
jvfe Aug 10, 2024
208d796
fix: Add empty branch in other dicts
jvfe Aug 10, 2024
71c43be
refact: Rework logic to use subwfs as well
jvfe Aug 12, 2024
8bcf737
refact: Support subwf dict in recreate deps
jvfe Aug 12, 2024
efde7b7
fix: Change modules to subwfs
jvfe Aug 12, 2024
8e25587
fix: Use name value in recreate deps
jvfe Aug 12, 2024
4947913
Revert "fix: Use name value in recreate deps"
jvfe Aug 12, 2024
4888d32
fix: Use subworkflow name in recreate deps
jvfe Aug 12, 2024
aa265d8
fix: Use sw_name in appends too
jvfe Aug 12, 2024
ab0f398
Merge branch 'dev' into fix/1927
jvfe Aug 12, 2024
33c36cf
Merge branch 'fix/1927' into review-round-1
jvfe Aug 12, 2024
9356cd3
Merge branch 'feat/use-subwf' into review-round-1
jvfe Aug 12, 2024
a764c76
fix: Only add module if it's in main.nf too
jvfe Aug 12, 2024
869d816
fix: Handle incomplete meta.yml
jvfe Aug 13, 2024
c3ac1b8
refact: Remove isinstance check in lint/meta.yml
jvfe Aug 13, 2024
4eb95e6
style: Format meta_yml.py
jvfe Aug 13, 2024
e9bf238
refact: Remove isinstance check in components/install.py
jvfe Aug 13, 2024
c8b8a83
Revert "refact: Remove isinstance check in components/install.py"
jvfe Aug 13, 2024
eba8391
refact: Remove isinstance check in recreate_deps
jvfe Aug 13, 2024
e3f7b3f
Merge pull request #10 from sanger-tol/review-round-1
jvfe Aug 13, 2024
ddb3dd0
Merge branch 'dev' into merge-dev
jvfe Aug 13, 2024
634ff8f
Merge pull request #11 from sanger-tol/merge-dev
jvfe Aug 13, 2024
c72b94a
refact: Change function structure to use dicts not lists
jvfe Aug 15, 2024
bb31d78
fix: Change access key in inner dicts
jvfe Aug 15, 2024
7f4cce8
refact: Use component_name every time
jvfe Aug 16, 2024
e342678
refact: Don't default to master
jvfe Aug 16, 2024
50216c4
refact: Move instance check up
jvfe Aug 20, 2024
0e83d9a
Refactoring of component_utils.py
muffato Aug 20, 2024
f8c8251
Revert "Refactoring of component_utils.py"
jvfe Aug 20, 2024
313853e
refact: Use get in components/install.py
jvfe Aug 21, 2024
e8e4da0
refact: Avoid redefining keys when possible
jvfe Aug 21, 2024
f783d58
refact: Use get in modules_json
jvfe Aug 21, 2024
eb59308
refact: Use dictionary input in check_up_to_date
jvfe Aug 21, 2024
be4d78f
Use the power of get to skip if tests
muffato Aug 21, 2024
3c493b9
Merge pull request #13 from sanger-tol/refact/change_structure_levera…
jvfe Aug 21, 2024
b13a406
refact: Raise error if org_path not found
jvfe Aug 22, 2024
e96341e
style: Run ruff format
jvfe Aug 22, 2024
2b9a573
refact: Change UserWarning message
jvfe Aug 22, 2024
e55c74d
style: Run ruff format
jvfe Aug 22, 2024
4b9fea5
Merge pull request #12 from sanger-tol/refact/change_structure
jvfe Aug 22, 2024
ea7d1ca
Merge remote-tracking branch 'upstream/dev' into merge-dev2
jvfe Aug 23, 2024
a40d5d9
Merge remote-tracking branch 'upstream/dev' into fix/1927
jvfe Sep 2, 2024
2491207
refact: Use walrus operator in meta.yml check
jvfe Sep 6, 2024
a870787
refact: Use a union type for component arg
jvfe Sep 6, 2024
7ad4a8c
Merge remote-tracking branch 'upstream/dev' into second-review
jvfe Sep 6, 2024
f3259bd
Merge pull request #14 from sanger-tol/second-review
jvfe Sep 9, 2024
1a8c3be
docs: Add comment for clarification about the feature
jvfe Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 44 additions & 7 deletions nf_core/components/components_utils.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import logging
import re
from pathlib import Path
from typing import TYPE_CHECKING, List, Optional, Tuple, Union
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union

import questionary
import rich.prompt
import yaml

if TYPE_CHECKING:
from nf_core.modules.modules_repo import ModulesRepo
Expand Down Expand Up @@ -142,12 +143,15 @@ def prompt_component_version_sha(
return git_sha


def get_components_to_install(subworkflow_dir: Union[str, Path]) -> Tuple[List[str], List[str]]:
def get_components_to_install(
subworkflow_dir: Union[str, Path],
) -> Tuple[List[Dict[str, Optional[str]]], List[Dict[str, Optional[str]]]]:
"""
Parse the subworkflow main.nf file to retrieve all imported modules and subworkflows.
"""
modules = []
subworkflows = []
modules: Dict[str, Dict[str, Optional[str]]] = {}
subworkflows: Dict[str, Dict[str, Optional[str]]] = {}

with open(Path(subworkflow_dir, "main.nf")) as fh:
for line in fh:
regex = re.compile(
Expand All @@ -158,7 +162,40 @@ def get_components_to_install(subworkflow_dir: Union[str, Path]) -> Tuple[List[s
name, link = match.groups()
if link.startswith("../../../"):
name_split = name.lower().split("_")
modules.append("/".join(name_split))
component_name = "/".join(name_split)
component_dict: Dict[str, Optional[str]] = {
"name": component_name,
}
modules[component_name] = component_dict
elif link.startswith("../"):
subworkflows.append(name.lower())
return modules, subworkflows
component_name = name.lower()
component_dict = {"name": component_name}
subworkflows[component_name] = component_dict

if Path(subworkflow_dir, "meta.yml").exists():
jvfe marked this conversation as resolved.
Show resolved Hide resolved
with open(Path(subworkflow_dir, "meta.yml")) as fh:
meta = yaml.safe_load(fh)
if "components" in meta:
components = meta["components"]
for component in components:
if isinstance(component, dict):
component_name = list(component.keys())[0].lower()
git_remote = component[component_name]["git_remote"]
org_path_match = re.search(r"(?:https://|git@)[\w\.]+[:/](.*?)/", git_remote)
if org_path_match:
org_path = org_path_match.group(1)
else:
raise UserWarning(
f"The organisation path of {component_name} could not be established from '{git_remote}'"
)
current_comp_dict = subworkflows if component_name in subworkflows else modules

component_dict = {
"org_path": org_path,
"git_remote": git_remote,
"branch": component[component_name].get("branch"),
}

current_comp_dict[component_name].update(component_dict)

return list(modules.values()), list(subworkflows.values())
9 changes: 9 additions & 0 deletions nf_core/components/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
prompt_component_version_sha,
)
from nf_core.modules.modules_json import ModulesJson
from nf_core.modules.modules_repo import ModulesRepo

log = logging.getLogger(__name__)

Expand All @@ -38,6 +39,8 @@ def __init__(
installed_by: Optional[List[str]] = None,
):
super().__init__(component_type, pipeline_dir, remote_url, branch, no_pull)
self.current_remote = remote_url
self.branch = branch
self.force = force
self.prompt = prompt
self.sha = sha
Expand All @@ -47,6 +50,12 @@ def __init__(
self.installed_by = [self.component_type]

def install(self, component: str, silent: bool = False) -> bool:
if isinstance(component, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the isinstance check is needed (and indeed even if it isn't - because this is always a dict now) then the type hint in the method signature is incorrect.

Copy link
Member Author

@jvfe jvfe Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could see, this is the single isinstance check that was actually needed. There are some places in the codebase where this function gets called with a string input, when removing the isinstance check I noticed quite a few test failures. So, to aim for minimal impact and changes to the codebase and current implementations, I've opted to change the argument type hint, as you suggested.

jvfe marked this conversation as resolved.
Show resolved Hide resolved
remote_url = component.get("git_remote", self.current_remote)
branch = component.get("branch", self.branch)
self.modules_repo = ModulesRepo(remote_url, branch)
component = component["name"]

if self.repo_type == "modules":
log.error(f"You cannot install a {component} in a clone of nf-core/modules")
return False
Expand Down
23 changes: 15 additions & 8 deletions nf_core/modules/modules_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -674,7 +674,7 @@ def check_up_to_date(self):
dump_modules_json = True
for repo, subworkflows in subworkflows_dict.items():
for org, subworkflow in subworkflows:
self.recreate_dependencies(repo, org, subworkflow)
self.recreate_dependencies(repo, org, {"name": subworkflow})
self.pipeline_components = original_pipeline_components

if dump_modules_json:
Expand Down Expand Up @@ -1249,20 +1249,27 @@ def recreate_dependencies(self, repo, org, subworkflow):
i.e., no module or subworkflow has been installed by the user in the meantime
"""

sw_path = Path(self.subworkflows_dir, org, subworkflow)
sw_name = subworkflow["name"]
sw_path = Path(self.subworkflows_dir, org, sw_name)
dep_mods, dep_subwfs = get_components_to_install(sw_path)
assert self.modules_json is not None # mypy
for dep_mod in dep_mods:
installed_by = self.modules_json["repos"][repo]["modules"][org][dep_mod]["installed_by"]
name = dep_mod["name"]
current_repo = dep_mod.get("git_remote", repo)
current_org = dep_mod.get("org_path", org)
installed_by = self.modules_json["repos"][current_repo]["modules"][current_org][name]["installed_by"]
if installed_by == ["modules"]:
self.modules_json["repos"][repo]["modules"][org][dep_mod]["installed_by"] = []
if subworkflow not in installed_by:
self.modules_json["repos"][repo]["modules"][org][dep_mod]["installed_by"].append(subworkflow)
if sw_name not in installed_by:
self.modules_json["repos"][repo]["modules"][org][dep_mod]["installed_by"].append(sw_name)

for dep_subwf in dep_subwfs:
installed_by = self.modules_json["repos"][repo]["subworkflows"][org][dep_subwf]["installed_by"]
name = dep_subwf["name"]
current_repo = dep_subwf.get("git_remote", repo)
current_org = dep_subwf.get("org_path", org)
installed_by = self.modules_json["repos"][current_repo]["subworkflows"][current_org][name]["installed_by"]
if installed_by == ["subworkflows"]:
self.modules_json["repos"][repo]["subworkflows"][org][dep_subwf]["installed_by"] = []
if subworkflow not in installed_by:
self.modules_json["repos"][repo]["subworkflows"][org][dep_subwf]["installed_by"].append(subworkflow)
if sw_name not in installed_by:
self.modules_json["repos"][repo]["subworkflows"][org][dep_subwf]["installed_by"].append(sw_name)
self.recreate_dependencies(repo, org, dep_subwf)
1 change: 1 addition & 0 deletions nf_core/subworkflows/lint/meta_yml.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ def meta_yml(subworkflow_lint_object, subworkflow):
included_components = (
included_components[0] + included_components[1]
) # join included modules and included subworkflows in a single list
included_components = [component["name"] for component in included_components]
if "components" in meta_yaml:
meta_components = [x for x in meta_yaml["components"]]
for component in set(included_components):
Expand Down
16 changes: 16 additions & 0 deletions tests/subworkflows/test_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@

from ..test_subworkflows import TestSubworkflows
from ..utils import (
CROSS_ORGANIZATION_BRANCH,
CROSS_ORGANIZATION_URL,
GITLAB_BRANCH_TEST_BRANCH,
GITLAB_REPO,
GITLAB_SUBWORKFLOWS_BRANCH,
Expand Down Expand Up @@ -83,6 +85,20 @@ def test_subworkflows_install_different_branch_fail(self):
install_obj.install("bam_stats_samtools")
assert "Subworkflow 'bam_stats_samtools' not found in available subworkflows" in str(excinfo.value)

def test_subworkflows_install_across_organizations(self):
"""Test installing a subworkflow with modules from different organizations"""
install_obj = SubworkflowInstall(
self.pipeline_dir, remote_url=CROSS_ORGANIZATION_URL, branch=CROSS_ORGANIZATION_BRANCH
)
# The hic_bwamem2 subworkflow contains modules from different organizations
install_obj.install("get_genome_annotation")
# Verify that the installed_by entry was added correctly
modules_json = ModulesJson(self.pipeline_dir)
mod_json = modules_json.get_modules_json()
assert mod_json["repos"][CROSS_ORGANIZATION_URL]["modules"]["jvfe"]["wget"]["installed_by"] == [
"get_genome_annotation"
]

def test_subworkflows_install_tracking(self):
"""Test installing a subworkflow and finding the correct entries in installed_by section of modules.json"""
assert self.subworkflow_install.install("bam_sort_stats_samtools")
Expand Down
2 changes: 2 additions & 0 deletions tests/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
OLD_TRIMGALORE_SHA = "9b7a3bdefeaad5d42324aa7dd50f87bea1b04386"
OLD_TRIMGALORE_BRANCH = "mimic-old-trimgalore"
GITLAB_URL = "https://gitlab.com/nf-core/modules-test.git"
CROSS_ORGANIZATION_URL = "https://github.com/jvfe/test-subworkflow-remote.git"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the nf-core gitlab account for this test to avoid using external repos. I can give you access if you think it's a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure on this. I don't recommend using two remotes with the same org path as this can create major foot-guns (tools will just overwrite the ones in place and there can be weirdness in the modules json).

Might be OK for testing if you are careful but in general it is not something people should try to do.

Copy link
Member Author

@jvfe jvfe Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that using a different org_path is better for testing purposes. Although we don't have to keep it in my personal remote either, I was planning to make that repo read-only as soon as this feature is done, to ensure future stability for the test.

I don't know if it's simple to setup on your side, but maybe something like "nf_core_testing" could be a good alternative org_path? It's different enough but still under nf-core, instead of a personal user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I will have a loot at the options we have to create another nf-core owned organisation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/nf-core-test/modules I have added you both to the new organisation

CROSS_ORGANIZATION_BRANCH = "main"
GITLAB_REPO = "nf-core-test"
GITLAB_DEFAULT_BRANCH = "main"
GITLAB_SUBWORKFLOWS_BRANCH = "subworkflows"
Expand Down
Loading