-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Install subworkflows with modules from different remotes #3083
base: dev
Are you sure you want to change the base?
Conversation
fix: Avoid duplicate downloads in nested subworkflows
* dev: (299 commits) Update nf_core/pipelines/create/utils.py [automated] Update CHANGELOG.md allow numbers in custom pipeline name [automated] Update CHANGELOG.md Update pre-commit hook pre-commit/mirrors-mypy to v1.11.1 Fixed linting Updated changelog Added process_high_memory to create/lint list update chagelog use pathlib instead of os.path Apply suggestions from code review add option to skip code linters Update gitpod/workspace-base Docker digest to f189a41 Update pre-commit hook pre-commit/mirrors-mypy to v1.11.0 [automated] Update CHANGELOG.md Update python:3.12-slim Docker digest to 740d94a Update .pre-commit-config.yaml Update nf_core/__main__.py Add default version Automatically add a default version in pipelines_create ...
I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right. I think this also indicates support only for cross-organisational |
Currently, yes
Once
It's a question of time management (@jvfe is working part time on this project). We're not sure about about the handling of the default org vs the module org in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @jvfe!
I think this is a good start 😄 I agree with @awgymer that allowing component
to be both a string or a dict is confusing and it will cause problems. But I like the idea of using the information in the meta.yml
to obtain the remote form which the component should be installed.
My suggestion would be to convert component
to a dictionary in all cases since the beginning, with the default being the current remote.
@@ -16,6 +16,7 @@ | |||
OLD_TRIMGALORE_SHA = "9b7a3bdefeaad5d42324aa7dd50f87bea1b04386" | |||
OLD_TRIMGALORE_BRANCH = "mimic-old-trimgalore" | |||
GITLAB_URL = "https://gitlab.com/nf-core/modules-test.git" | |||
CROSS_ORGANIZATION_URL = "https://github.com/jvfe/test-subworkflow-remote.git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use the nf-core gitlab account for this test to avoid using external repos. I can give you access if you think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure on this. I don't recommend using two remotes with the same org path as this can create major foot-guns (tools will just overwrite the ones in place and there can be weirdness in the modules json).
Might be OK for testing if you are careful but in general it is not something people should try to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that using a different org_path is better for testing purposes. Although we don't have to keep it in my personal remote either, I was planning to make that repo read-only as soon as this feature is done, to ensure future stability for the test.
I don't know if it's simple to setup on your side, but maybe something like "nf_core_testing" could be a good alternative org_path? It's different enough but still under nf-core, instead of a personal user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will have a loot at the options we have to create another nf-core owned organisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/nf-core-test/modules I have added you both to the new organisation
* dev: (35 commits) don't remove creating a pipeline to test it 😶 update textual snapshots update textual snapshots fix indentation Update nf_core/pipelines/create/create.py [automated] Update CHANGELOG.md [automated] Update CHANGELOG.md add option to exclude changelog from custom pipeline template add option to exclude codespaces from pipeline template remove multiqc images from the template docs and the same for subworkflows run tests also after pytest-migrate update textual snapshots add multiqc potion to test data update textual snapshots fix tests [automated] Update CHANGELOG.md add option to exclude multiqc from pipeline template rename regenerae-snapshots to update-textual-snapshots Apply suggestions from code review ...
…ge_get Use the power of get to skip if tests
Co-authored-by: Matthieu Muffato <mm49@sanger.ac.uk>
refact: Change function structure to use dicts not lists
* upstream/dev: (45 commits) Apply suggestions from code review update textual snapshot add option to exclude adaptivecard and slackreport from pipeline template update pytest test_make_pipeline_schema [automated] Update CHANGELOG.md add option to exclude email from pipeline template update textual snapshots Update nf_core/pipelines/create/templatefeatures.yml [automated] Update CHANGELOG.md add option to exclude license from pipeline template [automated] Update CHANGELOG.md Update python:3.12-slim Docker digest to 59c7332 [automated] Update CHANGELOG.md Update pre-commit hook astral-sh/ruff-pre-commit to v0.6.0 Ran pre commit add matthias' suggestion woops Overwrite the verbose in ctx in case the user uses it Overwrite the verbose in ctx in case the user uses it Add a verbose click option run prettier on the pipeline output directory ...
* upstream/dev: (120 commits) markdown header add .gitignore to base required files update textual snapshots don't remove .gitignore when unselecting github update textual snapshots some more minimal template fixes udpate changelog handle cases where the directory path contains the name of the component switch to os.walk to support <3.10 versions of Python fix pipeline linting when skipping nf_schema update test to use template_features.yml and add missing files to a feature group rename templatefeatures.yml to template_features.yml update changelog fix linting add tests to ensure all files are part of a template customisation group add process cpus, memory and time to nextflow.config if we remove base.config update textual snapshots Revert "don't remove base.config when removing nf-core components" don't remove base.config when removing nf-core components apply review comments from @mashehu ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of the code implemented here this looks good.
I am unclear about the impact on the following commands:
subworkflows remove
subworkflows update
subworkflows lint
I also still have concerns about this particular style of cross-repo (or indeed single-repo) subworkflow
(although I am not intending to reignite a debate over the alternative model I support in this MR!) and at the present stage I wouldn't recommend this as a true solution to anyone wanting cross-repo subworkflows because I'm unclear on what level of commitment to maintain and work around issues this introduces there is from an nf-core
standpoint. (This isn't so much something for the authors of this PR to address but rather the wider nf-core
team)
nf_core/components/install.py
Outdated
def install(self, component: str, silent: bool = False) -> bool: | ||
if isinstance(component, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the isinstance
check is needed (and indeed even if it isn't - because this is always a dict now) then the type hint in the method signature is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I could see, this is the single isinstance
check that was actually needed. There are some places in the codebase where this function gets called with a string input, when removing the isinstance check I noticed quite a few test failures. So, to aim for minimal impact and changes to the codebase and current implementations, I've opted to change the argument type hint, as you suggested.
* upstream/dev: Move if statements to top of YAML blocks Don't look for dev, look for not master Remove tests involving environment.yml with 'name' Remove name from conda enviroment.yml in module template Don't test conda `environment.yml` `name` attribute (which should no longer be there) Update CHANGELOG.md Add --update-all flag to remove defaults channels dependents add a tabix tabix test pre commit fix issue where tool and subtool have the same name udpate changelog run nf-core lint --release on PRs to master
refact: Add suggestions from second review
From what I could test (and the unit tests themselves), there's no discernible impact to those commands. Also, I do understand your concerns and agree that there could be a more robust reworking of how cross-repo subworkflows could work (and how single-repo subworkflows are implemented themselves!). In this PR, though, I aimed for the simplest solution, one with minimal impact to the codebase and that could serve as a starting point for people looking to build these components across orgs (as is the case in Sanger Tree of Life), in a way that no changes are required for the way subworkflows are currently implemented. Thank you for your input and your review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a linting test failing, we should update the test here to add a valid fake version.
This is looking good! But it would be good to add a couple more test cases:
- Installing a subworkflow which installs a module from another repo which is also installed from nf-core, for example fastqc which is already in the pipeline template.
- Removing nf-core fastqc to make sure the fastqc module from the other repo is kept.
- Installing the subworkflow on an old version (sha) and updating it, make sure that all modules from different repos are updated.
And if you can think of other situations that I forgot, let's add them too 🙂
In addition to this PR we will need to add documentation on how to use this. Specifying the format of meta.yml
.
We should also modify the JSON schema that we use to lint subworkflows, as the elements of components
are now required to be strings, but I think this is not something we should add to the nf-core/modules repo, it should work if the non-nf-core-repo add a JSON schema file too. This should also be documented
@@ -16,6 +16,7 @@ | |||
OLD_TRIMGALORE_SHA = "9b7a3bdefeaad5d42324aa7dd50f87bea1b04386" | |||
OLD_TRIMGALORE_BRANCH = "mimic-old-trimgalore" | |||
GITLAB_URL = "https://gitlab.com/nf-core/modules-test.git" | |||
CROSS_ORGANIZATION_URL = "https://github.com/jvfe/test-subworkflow-remote.git" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will have a loot at the options we have to create another nf-core owned organisation.
Co-authored-by: Júlia Mir Pedrol <mirp.julia@gmail.com>
Hi Júlia, just wanted to let you know that we're still working on this feature! Just hit a few roadblocks that, due to the short time I can allocate to this every week, are taking a bit too long to solve. To sum up:
Other than that:
Thank you so much for your reviews and your patience while we work on the feature. |
Hi @jvfe, that's great! Thanks for implementing the tests. The documentation can be added to the website. Here you have the docs for the subworkflow install command. And for a more detailed guide like the README you shared, I would add a page in the tutorials, under "external usage" like we do for the configs (see here)
This was fixed by another PR, so if you update your branch with the dev branch the test should pass. |
PR checklist
CHANGELOG.md
is updateddocs
is updatedAddresses #1927 and #2497
Issue information
Currently,
get_components_to_install
is the function used to download a subworkflow's modules. It relies on simple text parsing to define where to get a subworkflow/module from and install it.As outlined in the above issues (and in slack conversations), it is currently not possible to install subworkflows which depend on modules that originate from different remotes. The solution is to copy over required modules to a same remote in order to use them.
Implementation
At this moment the implementation is mostly a work-around/hack. I aimed to make the least possible modifications to the source code in order to make the intended behavior functional. Here's how it works:
It still uses the same text parsing for all modules and subworkflows that don't have a meta.yml file defined.
If a subworkflow defines a meta.yml, they can optionally define the
git_remote
andorg_path
keys for a component. This is based on a suggestion given on the original issue. E.g.:This info is used to grab the respective modules from the specified remote.
I've setup a dummy repository with a subworkflow that showcases this. If no remote is specified, it assumes the module is from the same remote as the subworkflow (as it is currently).
Trying to run
nf-core subworkflows --git-remote https://github.com/jvfe/test-subworkflow-remote install get_genome_annotation
with this branch of nf-core/tools should work. There's a basic test I've setup to test for this.Known issues
All in all, I'm eager to hear what everyone thinks. I'm tagging @muffato who has been supervising and helping me in this implementation.