-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The 19 of 20 parallelized workers always strike far before the whole job is done #88
Comments
Hi, thanks for the feedback. Sometimes certain thread(s) could indeed stick for a long time when the read depth is very high. Unfortunately, it is difficult to change the framework of cellsnp-lite to allocate jobs by the chunks of BAM file, as htslib (the low-level library that cellsnp-lite depends on to perform pileup) does not support it yet, as far as I know. |
To address this issue, we are thinking about two strategies: 1) split the SNP list (mode 1) or chromosome regions (mode 2) into smaller batches and push the batches into the thread pool. However, it could have additional overhead regarding the initialization work (e.g., prepare We may try to implement these two strategies (or some others if available) in the future. Thanks for your good question. |
Hi,I think my problem is somewhat similar to this.The code I run my job is show below: |
At the beginning, this is a very nice work. Thanks to the author.
I'm analyzing a 5' VDJ enriched single-cell data from 10X pipeline and using version 1.2.1 as micromamba has some dependency problems with the newest 1.2.3 (lacking C++11 support?).
The problem is that after the 19 paralleled workers finish their works in 1 hour, there is always worker 7 left and take another 2~3 hours to finish. I check the temp vcf file generated by worker 6 and 8 and find that the genomic region allocated to worker 7 is between chromosome 5 and 6, which may be the biased enriched region in sequencing depth. I'm not familiar with C++ but from the python version, it seems that the works are allocated once at the beginning by the region of the reference file. Is it possible to allocated the jobs by the chunks of BAM file as all reads are alligned in coordinate to avoid this situation?
The bash line I use to run the job is as follows:
The text was updated successfully, but these errors were encountered: