-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
choose pileup call over full-alignment call if pileup call has 1) a way higher QUAL, 2) is an indel #318
Comments
Many thanks Tom. I have some ideas how to fix it but can't tell which one does less harm to other variant candidates. Is it possible that you send me a minibam containing only the reads covering |
I'll have to consult - as I indicated, it's a clinical sample. The smallest expansion would be to track If you're keen, I would suggest refactoring the code to do a "classic" merge - iterating over the two VCFs at the same time. That way the cases where things collide will appear as a natural clause in the inner loop, and you don't need memory proportional to the size of the full alignment VCF. |
Unfortunately, I can't give you a BAM with the reads. |
Just to update - we have found more instances of this problem. Unfortunately in clinical samples that we can't share. It's a significant problem for us, because in trios, it is causing some variants to appear to be de novo in the proband, when in fact, they are inherited, but the call has been "overwritten" in the parent. |
Are all your instances called with good QUAL in pileup but not called in full-alignment? If yes, could you share with us the pileup QUAL and full-alignment QUAL of these instances? We need them to determine when to trust pileup indel calls over full-alignment calls. |
Actually, the recent cases we encountered were actually even simpler. My colleague Jimmy is currently working on trying to reproduce the problem in HG002 (i.e. some publicly available data). I'll keep you posted. |
Hi!
Thanks for making a good tool!
We have been doing testing and benchmarking with (clinical) trio data, and have found a corner case where the behaviour of the merge between the pileup calls and the full alignment call results in a real variant being discarded.
The pileup VCF file contains the row
And the full alignment has the row
The VCF produced by
MergeVcf
outputs theRefCall
, which causes the insertion to be dropped.It's a bit of a corner case, since the position of insertion variants in a VCF is the base before the inserted sequence, and that is ref, but in this case, we do want to report the insertion!
I don't fully understand how the positions of the
RefCall
variants are determined, so I can't tell if this is a freak event, or if there is a systematic process that means these collisions are likely to occur elsewhere. If you could explain how theseRefCall
events are generated, that would help us understand how the caller works.Looking at the code of
MergeVcf
(e.g.Clair3/preprocess/MergeVcf.py
Line 191 in b975475
Clair3/preprocess/MergeVcf.py
Line 228 in b975475
Again, thanks for making a good tool, and I hope this report helps you make it even better.
Tom.
The text was updated successfully, but these errors were encountered: