-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions: Improving PROB_GERMLINE for better FDR filtering. #424
Comments
Oh I did not realize I had closed this! I still have many questions trying to improve my scenario grammar. Was there some detailed documentation that I am missing? I do not understand the strictness of "germline: 'normal:0.5 | normal:1.0'" and I think it is causing me grief. I am observing some known germline variants at a freq as low as 0.40 due to sequencing coverage. It feels like I should drop somatic_normal. I want to understand what difference something like germline: 'normal:[0.4,0.6] | normal:1.0 would' make. Since this is a tumor/normal study I will have reads for germline variants in the tumor. Should I be including that in the event? I suppose I could guess and test, but I was hoping to gain some insight. Thank you. |
Is there a way I can provide some prior knowledge to improve the model and potentially yield a more accurate posterior? I have some variants called by Mutect2 and passing its own filtering voodoo. I have some level of allele frequency information about it from the most recent ClinVar vcf, but Im not sure how I can use it. And I would want to do this for all pathogenic variants in my tumor type. |
Very good point. We can indeed add functionality to Varlociraptor to consider a population based prior per variant. So far, it only supports global heterozygosity or somatic mutation rate. |
As for some more detailed documentation, I just want to make sure you are aware of these two sources:
There are some things in your above scenario, where I think changes might improve things:
Any germline variant should also be in the tumor, unless it is "mutated away", right? So if you have a germline variant of frequency
Yeah, a bit of systematic testing is probably a good idea. If you just want to play around with the one specific germline variant you mentioned, you might want to create a And then, you can just rerun the testcase lots of time, modifying the |
Here is a PR that starts to implement per-variant priors: #448 |
Thank you @dlaehnemann. The contamination is just a placeholder. It it opened and rewritten with the estimated values, per tumor/normal pair, during my pipeline execution. But thank you for the concern. |
@johanneskoester Thank you I look forward to this feature. If Im understanding your comments correctly you are aiming for a specific INFO field? Would this require any change to the scenario grammar? |
Greetings,
I have a paired somatic pipeline and two of the callers it uses are vardict and varscan2, which will output germline calls as well. I wanted to include them into my VLR candidates because I figure it would distinguish them as PROB_GERMLINE and I could select and filter for that even in a separate output bcf. Most of our tumor/normal pairs have some sort of known inherent variant and we want to observe its presence and allele frequency in the tumor variant calling (usually for loh purposes).
My scenario grammar looks like: (contamination fraction is filled accurately for each sample pair)
When it comes to the FDR filtering value I need to set it very high (~0.2-0.3) in order to capture these known germline variants in the output for PROB_GERMLINE. Ive repeated this in two separate datasets/tumor types each with ~50 pairs at least. When I look at the probability values at these known variant sites, PROB_GERMLINE is the the lowest value among all the evaluated events, but the values themselves seems very high.
I guess because they are "known to be real" and have good coverage I was expecting the model to be more confident and I could drop the FDR value down to 0.05. I am trying to understand if my grammar could be improved? Is VLR less suited for making GERMLINE calls in tumor/normal pairs? Maybe I am misunderstanding something else. Thank you
-bwubb
The text was updated successfully, but these errors were encountered: