Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource requirement tweaks #107

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

mvanniekerkHartwig
Copy link

@mvanniekerkHartwig mvanniekerkHartwig commented Oct 25, 2024

Solves #106 .

When testing out oncoanalyser against some more difficult samples I ran into quite some erratic behaviour of stages running out of memory.
In general this was caused by the java process allocating 95% of available memory, followed by the java process invoking an external (usually R) script. This would cause the container to go over the memory limits set in the base.config, and the container would be killed by the supervisor (in my case k8s).
Even without an external application (outside of java) being invoked this can happen. The Xmx parameter only sets the size of the heap. The GC operates outside of this. So if the heap fills up, the GC kicks in, filling up the remaining memory, and causing the whole container to go OOM.
Also, some stages, like purple and sage are set to a resource class that is too low for the amount of work they need to do, especially for more difficult samples.

Therefore we don't set the size of the heap higher than 75% of the container memory limits when running hmftools (see pipeline5, second is heap memory and third arg is total memory). After changing this the stability of the pipeline greatly improved. I went from sage+purple pretty much always going OOM to the pipeline completing successfully even for difficult samples.

I also bumped the process requirements for purple, sage, and orange. This brings it more in line with the resource requirements set in pipeline5. These extra resources are necessary to finish more difficult samples in a reasonable of time and without going OOM.

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 2.14.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Copy link

github-actions bot commented Oct 30, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 9caa5b0

+| ✅ 194 tests passed       |+
#| ❔   5 tests were ignored |#
!| ❗  14 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_params - Schema param panel not found from nextflow config
  • schema_params - Schema param genome_version not found from nextflow config
  • schema_params - Schema param genome_type not found from nextflow config
  • schema_params - Schema param ref_data_hmf_data_path not found from nextflow config
  • schema_params - Schema param ref_data_panel_data_path not found from nextflow config
  • schema_params - Schema param ref_data_virusbreakenddb_path not found from nextflow config
  • schema_params - Schema param ref_data_genome_gtf not found from nextflow config
  • schema_params - Schema param ref_data_hla_slice_bed not found from nextflow config

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-10-30 00:28:14

@scwatts
Copy link
Collaborator

scwatts commented Oct 30, 2024

I've made the modifier controlling the proportion of JVM maximum heap space allocated from total available memory configurable for users with defaults defined at the module level.

For processes where large amounts of memory is required I've increased the default proportion from the suggested 75%. Similarly for processes with low amounts of memory allocated, the default has also been increased since the modifier doesn't scale well at the low-end imo.

I'd rather not sacrifice usable memory to optimise where not strictly needed - please let me know if these defaults are a problem with your set up and we can adjust as necessary.

I'll run some tests to ensure these changes to indeed work, let's discuss any points you'd like review in the meantime!

@mvanniekerkHartwig
Copy link
Author

Looks great, I think it's a good solution to make this setting configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants