How to reduce memory consumption during population calling? #448

tnguyengel · 2024-01-08T21:15:54Z

We would like to reduce memory consumption during population calling. Is it possible to split SNF files by chromosome or genomic region?

Alternatively, should we supply smaller bams to Sniffles2 by splitting bams such that each bam only contains the reads that align to a chromosome/genomic region?

Related to #282.

fritzsedlazeck · 2024-01-08T21:17:28Z

There will be a new release coming very soon (days away) that reduces this and allows to split. @hermannromanek is on it :)
Thanks
Fritz

tnguyengel · 2024-02-27T16:46:28Z

Has the feature to split up SNF files by chromosome already been released? If so, where can we find the new binaries?

hermannromanek · 2024-02-27T21:04:46Z

Hi,

Sorry for the delay - we encountered some issues which had to be fixed first and are in the process of re-testing.

I just pushed the current release candidate, feel free to give it a try. Bear in mind this is not yet fully tested, there is one open bug we know of causing sniffles to report the same SVs twice. Please share with us any other issues you encounter.

To enable the improved population calling, please also make sure the library psutil is installed.

Thanks,
Hermann

tnguyengel · 2024-04-24T09:33:32Z

I noticed that there is a new release: https://github.com/fritzsedlazeck/Sniffles/releases/tag/v2.3.2. Does this happen to solve this issue of large RAM usage for many samples? (We estimated Sniffles v2.2. will use up ~500-600 GB of RAM to do multisample calling on 5000 Human ONT samples, with no way to parallelize the effort across multiple machines to reduce the RAM consumption). If so, how does Sniffles v2.3+ handle many samples? Does it automatically throttle the memory usage when it detects that memory usage is becoming too high? We can't seem to find a way to tell Sniffles2.3+ to process the SNF files by chromosome (thereby increasing parallism and reducing RAM usage on a single machine).

fritzsedlazeck · 2024-04-24T11:40:41Z

Hey @tnguyengel
as you can imagine its a bit tricky :) What @hermannromanek implemented is a window approach that lets you scale with multithreading and memory. The tight control of the memory is tricky but Hermann can explain how to run it.
Thanks
Fritz

hermannromanek · 2024-04-24T18:39:52Z

Hi @tnguyengel

Yes, sniffles 2.3 should not use as high amounts of memory for merging as 2.2 did. It does so by monitoring RAM usage and freeing up memory once the memory footprint exceeds 2gb per thread/worker process (which will be hit quite soon when processing 5000 samples). Also, while with 2.2 threads were working on one chromosome each, 2.3 threads work on the same chromosome in parallel, thus you get better parallelization when processing only one chromosome.

To process a single chromosome you can use the new parameter --contig CONTIG (or -c CONTIG) with CONTIG being the contig name you want to process.

Whats the command you've been trying to run sniffles with?

Thanks for your feedback,
Hermann

tnguyengel · 2024-04-25T08:08:51Z

Whats the command you've been trying to run sniffles with?

For both Sniffles v2.3.2 and Sniffles v2.2, we were running

sniffles -t ${threads} --allow-overwrite --input "${snf_list}" --vcf "${out_merged_vcf}"

To process a single chromosome you can use the new parameter --contig CONTIG (or -c CONTIG) with CONTIG being the contig name you want to process.

Facepalm! I missed that. My apologies. We'll try scaling tests again with the --contig option.

lfpaulin · 2024-05-22T21:35:48Z

Dear tnguyengel, did you manage to run the 5000 samples?
We just released a new version (2.3.3) that aids with some issues and are improving on merging large datasets. Your feedback is well appreciated

tnguyengel · 2024-05-27T18:08:26Z

We don't have the full 5000 samples to run yet, but that will be the final set that we eventually run with. We will rerun scaling tests with v2.3.3, and report the results here.

fritzsedlazeck · 2024-05-27T18:20:12Z

Cool. We keep testing and optimizing. Keep us posted and we will push forward.
Thanks
Fritz

tnguyengel · 2024-06-07T15:15:43Z

Dear tnguyengel, did you manage to run the 5000 samples?
We just released a new version (2.3.3) that aids with some issues and are improving on merging large datasets. Your feedback is well appreciated

Fyi, initial scaling test with up to 35 samples indicate v2.3.3 would theoretically use ~100GB of RAM to aggregate a contig across 5000 sample cohort. Much more reasonable in terms of resource usage. I'll report more results with more details as we go along.

hermannromanek · 2024-11-04T21:04:21Z

While there are more improvements to come, v2.5 should yet improve multisample calling on larger data sets significantly. Merging 35 samples should stay well below 10gb of RAM.

fritzsedlazeck · 2024-12-17T12:51:14Z

Hey guys, the new version just got live which is much better in memory consumption. Please test it out.
Cheers
Fritz

hermannromanek self-assigned this Feb 27, 2024

hermannromanek added this to the 2.5 milestone Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce memory consumption during population calling? #448

How to reduce memory consumption during population calling? #448

tnguyengel commented Jan 8, 2024

fritzsedlazeck commented Jan 8, 2024

tnguyengel commented Feb 27, 2024

hermannromanek commented Feb 27, 2024

tnguyengel commented Apr 24, 2024 •

edited

Loading

fritzsedlazeck commented Apr 24, 2024

hermannromanek commented Apr 24, 2024

tnguyengel commented Apr 25, 2024 •

edited

Loading

lfpaulin commented May 22, 2024

tnguyengel commented May 27, 2024

fritzsedlazeck commented May 27, 2024

tnguyengel commented Jun 7, 2024 •

edited

Loading

hermannromanek commented Nov 4, 2024

fritzsedlazeck commented Dec 17, 2024

How to reduce memory consumption during population calling? #448

How to reduce memory consumption during population calling? #448

Comments

tnguyengel commented Jan 8, 2024

fritzsedlazeck commented Jan 8, 2024

tnguyengel commented Feb 27, 2024

hermannromanek commented Feb 27, 2024

tnguyengel commented Apr 24, 2024 • edited Loading

fritzsedlazeck commented Apr 24, 2024

hermannromanek commented Apr 24, 2024

tnguyengel commented Apr 25, 2024 • edited Loading

lfpaulin commented May 22, 2024

tnguyengel commented May 27, 2024

fritzsedlazeck commented May 27, 2024

tnguyengel commented Jun 7, 2024 • edited Loading

hermannromanek commented Nov 4, 2024

fritzsedlazeck commented Dec 17, 2024

tnguyengel commented Apr 24, 2024 •

edited

Loading

tnguyengel commented Apr 25, 2024 •

edited

Loading

tnguyengel commented Jun 7, 2024 •

edited

Loading