-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems with --show-ref (gvcf mode) #6
Comments
We further observed that the sites output to the vcf file varied with depth. Using a ~1kb transcript at very high depth (>1000x) we find that we get a line in the vcf for all except the first/last 16 bases (as noted above), but at 50x depth a number of sites are missing from the output.
There doesn't seem an obvious pattern to those included/excluded at lower depths. For example position 93, 98 have the lowest/highest QUAL at high depth and are both missing from the low depth data, similarly 91, 100 are included and represent the lowest/highest allelic depth. These output were run with the following settings, with
|
The GVCF output module in Clair3-RNA was for Clair3 and has not yet been tuned for Clair3-RNA. For me to plan better, may I ask how often would GVCF output be used for RNA-Seq based variant calling? Or just fixing the logic of |
Well, I would think fixing the logic so it does not error on In our case getting an estimation of the QUAL of ref calls at each site is important, so I think it would be used very frequently. |
We have released an update v0.2.1 to fix the bug that misses a few variants with the Missing variants in the head and tail 16 bp of each sequence was another issue and was caused by the design of the Clair3 networks. Alignments were usually less reliable in these regions, and insufficient context will be fed to the Clair3 networks for candidates in these regions, causing increased FP. In the new version, we added a new option Please re-pull the docker image and give the new version a try. Let us know for any questions. |
Clair3-RNA skips considering those positions with only reference allele support as candidates, even when |
Thanks - I think we would expect, like you do with a gVCF to have all positions regardless of the presence of an alternative allele. |
Thanks. We have modified the logic to output all positions with at least one read support when |
Hi @zhengzhenxian, Thanks for implementing these changes. I can confirm that the (built-locally from commit:
I tried a few different configurations including with/without |
We attempted to repeat your error using different data and configurations but still failed on our slide. Not sure if the issue is related to the reference. Could you please send your reference and BAM files to my email at zxzheng@cs.hku.hk for our testing? Many thanks for your help. Zhenxian |
@mattdmem just in case the reference and bam cannot be shared, would you mind tweaking the code a bit and print out |
Although I cannot share the original data I've been able to reproduce this error by generating a random pileup of data and running Clair3-RNA on that. |
@davidnewman02 Great thanks for sharing the demo data. We have fixed the reference range issue, please try the latest code, thanks. Zhenxian |
Hi @zhengzhenxian, |
Hello...
I'm attempting to run v0.2.0 of Clair3-rna and have some observations:
Leads to the following error:
We can prevent this error by editing
/opt/bin/run_clair3_rna
to envoke the--gvcf
option.But, when we get the gvcf not all reference positions are present. You get only a handful of
RefCall
sites.It seems if you set
--snp_min_af
to 0 this outputs most of the sites in the reference as expected, however those at the very start of the reference are omitted. Is this expected behaviour? Adding--indel_min_af 0
does not help any further.They are well covered:
I do notice the following message:
If I index my reference there is no problem:
This may or may not be related to omission of positions at the start of the reference in the VCF?
The text was updated successfully, but these errors were encountered: