Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed Variant #349

Open
leonschuetz opened this issue Dec 5, 2024 · 6 comments
Open

Missed Variant #349

leonschuetz opened this issue Dec 5, 2024 · 6 comments

Comments

@leonschuetz
Copy link

Hi,

I have a sample with a variant which was not called with clair3, but looks quiet good in IGV:
grafik

I looked in the output VCFs and the variant is called in the pileup.vcf:

chr16	1568260	.	C	T	19.57	PASS	P	GT:GQ:DP:AD:AF:PL	0/1:19:37:25,12:0.3243:26,0,62

But in the full_alignment.vcf it is called as wildtype:

chr16	1568260	.	C	.	5.32	RefCall	F	GT:GQ:DP:AD:AF:PL	0/0:5:37:25:0.6757:990

If I see it correctly, variants in the pileup.vcf has to have at least a quality of 20 to make it in the merged.vcf? Is there an option to lower this threshold?

Best,
Leon

@aquaskyline
Copy link
Member

Might need wet-lab validation to tell whether the variant is true or not. Clair3 uses more information than those shown in your screen capture. Are the reads all primary alignments? Are they with good MQ? Can they be properly phased? Are the alternative alleles gathered in a phase or spread into phases? We will need more information to tell what the culprit could be.

@leonschuetz
Copy link
Author

In this case we have also short-read data, where the variant is called, so I think it is a true variant.
The phasing looks like this:
grafik
All reads have a mapping quality of 60 and only 2 reads are supplementary, both unphased (1 with, 1 without the variant)

@aquaskyline
Copy link
Member

Interesting, would you mind sending us a minibam of the variant so we can look deeper.

@leonschuetz
Copy link
Author

I have to check if I'm allowed to share the data. I come back to you soon.

Best,
Leon

@peterk87
Copy link

Hi @aquaskyline , I am encountering a similar issue where using a very similar reference vs a more distantly related reference results in a variant not being called for Influenza A virus Nanopore sequencing data.

Below is the Clair3 pileup.vcf from using an internal reference sequence that is very closely related to the sample:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
internal_ref   393     .       C       .       22.36   RefCall P       GT:GQ:DP:AD:AF  0/0:22:227:123:0.5419
internal_ref   823     .       C       .       3.33    RefCall P       GT:GQ:DP:AD:AF  0/0:3:179:15:0.0838
internal_ref   824     .       A       .       12.42   RefCall P       GT:GQ:DP:AD:AF  0/0:12:182:156:0.8571
internal_ref   825     .       C       .       19.70   RefCall P       GT:GQ:DP:AD:AF  0/0:19:182:156:0.8571
internal_ref   980     .       G       .       18.09   RefCall P       GT:GQ:DP:AD:AF  0/0:18:201:152:0.7562

No full-alignment analysis is triggered with the internal reference.

Here is the pileup.vcf using a more distantly related reference sequence:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
... other variants
MT197130.1      393     .       C       .       21.10   RefCall P       GT:GQ:DP:AD:AF  0/0:21:226:138:0.6106
MT197130.1      426     .       C       T       23.07   PASS    P       GT:GQ:DP:AD:AF  1/1:23:226:3,220:0.9735
MT197130.1      504     .       C       A       13.42   PASS    P       GT:GQ:DP:AD:AF  1/1:13:214:1,200:0.9346
MT197130.1      588     .       C       T       23.39   PASS    P       GT:GQ:DP:AD:AF  1/1:23:239:3,218:0.9121
MT197130.1      814     .       C       T       14.39   PASS    P       GT:GQ:DP:AD:AF  1/1:14:186:6,162:0.8710
MT197130.1      823     .       C       T       4.33    PASS    P       GT:GQ:DP:AD:AF  1/1:4:178:14,145:0.8146
MT197130.1      824     .       A       .       13.29   RefCall P       GT:GQ:DP:AD:AF  0/0:13:181:156:0.8619
MT197130.1      825     .       C       .       19.07   RefCall P       GT:GQ:DP:AD:AF  0/0:19:181:155:0.8564
MT197130.1      980     .       G       .       18.09   RefCall P       GT:GQ:DP:AD:AF  0/0:18:201:152:0.7562

A full-alignment analysis is triggered for reference MT197130.1 with a number of low-quality variants detected in the pileup.

Both reference sequences have a length of 1410, yet for the internal reference, position 823 with AD=15 and AF=0.0838 is not called vs the other reference MT197130.1 that at position 823 has AD=14,145 and AF=0.8146.

It's clear from looking at the alignment against the internal reference that the variant is present.

image

The alignment looks very similar with the other reference:

image

I'll have to see if the BAM alignment and internal reference sequence could be shared.

@peterk87
Copy link

Just to follow up, adding --var_pct_phasing=1 --var_pct_full=1 --ref_pct_full=1 to the run_clair3.sh command worked for my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants