-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Install subworkflows with modules from different remotes #3083
base: dev
Are you sure you want to change the base?
Conversation
fix: Avoid duplicate downloads in nested subworkflows
* dev: (299 commits) Update nf_core/pipelines/create/utils.py [automated] Update CHANGELOG.md allow numbers in custom pipeline name [automated] Update CHANGELOG.md Update pre-commit hook pre-commit/mirrors-mypy to v1.11.1 Fixed linting Updated changelog Added process_high_memory to create/lint list update chagelog use pathlib instead of os.path Apply suggestions from code review add option to skip code linters Update gitpod/workspace-base Docker digest to f189a41 Update pre-commit hook pre-commit/mirrors-mypy to v1.11.0 [automated] Update CHANGELOG.md Update python:3.12-slim Docker digest to 740d94a Update .pre-commit-config.yaml Update nf_core/__main__.py Add default version Automatically add a default version in pipelines_create ...
I'm very wary of increasing the idiosyncrasies in the code and I feel that making it such that the code returns variably either strings or dicts and thus necessitates I understand the desire to simply get something working but I think that if we are going to implement this and support it we need to do it right. I think this also indicates support only for cross-organisational |
Currently, yes
Once
It's a question of time management (@jvfe is working part time on this project). We're not sure about about the handling of the default org vs the module org in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution @jvfe!
I think this is a good start 😄 I agree with @awgymer that allowing component
to be both a string or a dict is confusing and it will cause problems. But I like the idea of using the information in the meta.yml
to obtain the remote form which the component should be installed.
My suggestion would be to convert component
to a dictionary in all cases since the beginning, with the default being the current remote.
* dev: (35 commits) don't remove creating a pipeline to test it 😶 update textual snapshots update textual snapshots fix indentation Update nf_core/pipelines/create/create.py [automated] Update CHANGELOG.md [automated] Update CHANGELOG.md add option to exclude changelog from custom pipeline template add option to exclude codespaces from pipeline template remove multiqc images from the template docs and the same for subworkflows run tests also after pytest-migrate update textual snapshots add multiqc potion to test data update textual snapshots fix tests [automated] Update CHANGELOG.md add option to exclude multiqc from pipeline template rename regenerae-snapshots to update-textual-snapshots Apply suggestions from code review ...
I had to open a new PR after updating the branch because I wasn't able to push to this branch, you can see it here #3456 |
should be fixed with 04b26bb |
Hi! Just saw your comment, sorry - though you fixed it way quicker than I would have 😄 . Also, just merged your changes to this branch. We can also give you access to this fork if necessary (@muffato I think you'd have to do that, since I don't have the permission). |
* dev: only add entry to files remove remaining print [automated] Update CHANGELOG.md fix edam BAM link on modules template add ontologies to meta.yml if not present [automated] Update CHANGELOG.md [automated] Update CHANGELOG.md docs: fix contributing link in the main README fix: linting with comments after the input directive
I've given you, @mirpedrol , write access to repo. I'd prefer we continue the review here rather than #3456 so that all the review history is in one place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Does it also work with nf-core subworkflows patch
?
Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
I have tested that it works 👍 patch is independent of the included components so it should not be affected by these changes |
As I can see this PR is looking good and probably heading towards a merge I just want to gently raise my previous point
This PR adds functionality to make cross-repo subworkflows work but we are still lacking a serious CI testing solution and I'm worried about other footguns the functionality potentially opens up. I'm conscious that greater private repo support is something many (including myself!) want but I also want to ensure we achieve it with stability and long-term viability. |
It's a fundamental limitation of this style of subworkflows that you cannot test them easily in their remotes on CI right now and this tools functionality does nothing to address that. Once again my issue is not so much with the actual code of this PR but rather the wider implications this PR has. |
Ah right. Well, it sounds like it is a good idea for another internship/mentorship ;) a |
how about a CI test, where we create a test pipeline and build a mixed subworkflow there? |
It's there already: https://github.com/nf-core/tools/pull/3083/files#diff-60cea8a0cf3a284ef26219e6dc9b9b2b55ebb25214cb520b0ae38ef6179436d7R87 It uses modules/subworkflows across nf-core/modules and https://github.com/nf-core-test/modules |
PR checklist
CHANGELOG.md
is updateddocs
is updatedAddresses #1927 and #2497
Issue information
Currently,
get_components_to_install
is the function used to download a subworkflow's modules. It relies on simple text parsing to define where to get a subworkflow/module from and install it.As outlined in the above issues (and in slack conversations), it is currently not possible to install subworkflows which depend on modules that originate from different remotes. The solution is to copy over required modules to a same remote in order to use them.
Implementation
At this moment the implementation is mostly a work-around/hack. I aimed to make the least possible modifications to the source code in order to make the intended behavior functional. Here's how it works:
It still uses the same text parsing for all modules and subworkflows that don't have a meta.yml file defined.
If a subworkflow defines a meta.yml, they can optionally define the
git_remote
andorg_path
keys for a component. This is based on a suggestion given on the original issue. E.g.:This info is used to grab the respective modules from the specified remote.
I've setup a dummy repository with a subworkflow that showcases this. If no remote is specified, it assumes the module is from the same remote as the subworkflow (as it is currently).
Trying to run
nf-core subworkflows --git-remote https://github.com/jvfe/test-subworkflow-remote install get_genome_annotation
with this branch of nf-core/tools should work. There's a basic test I've setup to test for this.Known issues
All in all, I'm eager to hear what everyone thinks. I'm tagging @muffato who has been supervising and helping me in this implementation.