Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allofus - Adapt targene for use on the All of Us Researcher Workbench #174

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

roskamsh
Copy link
Collaborator

@roskamsh roskamsh commented May 10, 2024

  1. add a profile that allows execution of targene on the researcher workbench, which includes the subsequent steps:
  2. add containers which route through Google Container Registry for each container hosted on DockerHub
  3. specify executor-specific instructions (google lifesciences API as well as resources)
  4. run an end-to-end test using a test configuration (data in test/ directory
  5. run on all of us data for the FTO variant and BMI trait
  6. update docs

resources:
https://support.researchallofus.org/hc/en-us/articles/21179878475028-Using-Docker-Images-on-the-Workbench
https://workbench.researchallofus.org/workspaces/aou-rw-5b81a011/howtousenextflowintheresearcherworkbenchv7/data

@roskamsh roskamsh self-assigned this Sep 18, 2024
@roskamsh roskamsh added the enhancement New feature or request label Sep 18, 2024
@roskamsh roskamsh linked an issue Sep 18, 2024 that may be closed by this pull request
Copy link
Member

@olivierlabayle olivierlabayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Breeshey.

On the two currently non-ticked boxes:

  • did you manage to run the pipeline entirely?
  • What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

conf/allofus_container.config Show resolved Hide resolved
@@ -0,0 +1,23 @@
process {
memory = { 6.GB * task.attempt }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On all memory/cpu requirements, why not simply include base.config? It would be a good opportunity to homogeneise these configurations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, max_retries is only 3 on eddie

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing is the resource allocation needs to be slightly different on the AOU RW, given that they have couple memory-CPU constraints. Which is why these had to be changed here when you run on the AOU RW. I can update the max_retries, but there are additional error codes that need to be added when running on google life sciences API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I remember, what is the exact coupling, is it a factor? If this ie the case we could probably let the user chose memory only and infer cpu based on it, e.g. cpu = memory / 4 or something similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be about 6 GB per 1 cpu. So we would have to make the memory a function of the number of CPUs.

For example:

cpu = 1 * task.attempt
memory = 6.GB * cpu

Or something like this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally - When CPUs requested is > 2, you must request an even number of cores. Ie. 4, 6, 8 etc. Another odd feature that I found when running the end-to-end test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you wanted the user to choose memory, you could just do the inverse operation. But CPUs would have to be an integer, so something like:

cpu = ceiling(memory/6)

Unsure exactly how this can be implemented at the nextflow configuration level. Would probably have to use the Math() operators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be about 6 GB per 1 cpu. So we would have to make the memory a function of the number of CPUs.

For example:

cpu = 1 * task.attempt memory = 6.GB * cpu

Or something like this?

Then the multithreaded label does not respect this factor does it? I see 8GB for 2 cpus, should it not be 12GB? Any chance you can this info from all of us support team?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you wanted the user to choose memory, you could just do the inverse operation. But CPUs would have to be an integer, so something like:

cpu = ceiling(memory/6)

Unsure exactly how this can be implemented at the nextflow configuration level. Would probably have to use the Math() operators.

fair enough!

conf/allofus_container.config Outdated Show resolved Hide resolved
test.config Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you successfully built this image? The problem is that at the moment I would be the only one able to push it to the current docker tag since it is not built within the CI process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes i've built it and pushed it to my Dockerhub. It is here https://hub.docker.com/r/roskamsh/flashpca

Can we not just keep it as a stale file as it only needs to be done once and not updated again?

Copy link
Collaborator Author

@roskamsh roskamsh Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already built it so no need to re-build. It is pointed at in the conf/allofus_container.config. I can update container.config to point at the new docker container.

Copy link
Member

@olivierlabayle olivierlabayle Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say the file would be unused so that is not ideal. Ideal the repo should be kept as minimal as possible with only useful code. Probably best to use a container for which there is already a build process within the organisation. That's also one less download at runtime, all we need is add a label to the process that misses one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the long run I am happy to have some docker files within the pipeline repo as long as they can be built within the CI process.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it would make sense to have the Dockerfile stored here so that people knew how it was built and we had a record of it. Where else would this file be stored then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is definitely the right place for it but as part of another branch --> PR if this is not addressed fully here.

@roskamsh
Copy link
Collaborator Author

Thank you Breeshey.

On the two currently non-ticked boxes:

  • did you manage to run the pipeline entirely?
  • What is the format of the input data, does it require an additional model to the "custom"? It might be good to explicitely state in the docs even if it can use this cohort mode.

Currently I haven't run it with AOU data yet. That is the plan for this week! Then I should be able to add additional changes (if required) for a new COHORT mode. Once this is complete, I will update the docs. At this point, I have managed to run an end-to-end test using test.config (which is basically just https://github.com/TARGENE/targene-pipeline/blob/main/test/configs/custom_cohort_flat.config).

}

// Set google appropriate error strategy
errorStrategy = {task.exitStatus in [143,137,104,134,139,14] ? 'retry' : 'finish'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the error codes the same as Eddie? Just curious. And is there a place where we can view the docs for the platforms errors?

Copy link
Collaborator

@joshua-slaughter joshua-slaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

how to run this piepline in allofus workbench?
3 participants