-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[User Story] Record optical duplicates in QC report #1282
Comments
I think collecting this statistic already existed for TGA-cases (panel and exome) which was using picard markdups, but was lacking for WGS because the sentieon dedup report wasn't picked up by multiqc, however in the development version after creating more unified alignment workflows where the sentieon dedup tool is used for dedup in all workflows (and where the report was modified to be acceptable by multiqc) the same duplication stats which includes the optical duplicates is reported by all workflows. However! For some reason no optical duplicates are detected in TGA... Below are the top 2 examples from TGA tumor and normal sample, and bottom an example WGS test sample run on the develop branch.
These values are also included in the multiqc_data.json including some small calculations that multiqc has done on the stats. Here's the example from the WGS case from multiqc_data.json
|
There are no percent optical duplicates however, but it can be calculated quite simply from these values. Is that sufficient to close this issue? @pbiology |
I will investigate a little why there are no optical dups in TGA first however... |
Ok...I think I see the issue...for TGA we modify the headers of the reads in the fastq by adding the extracted UMI-sequence, which creates problems for how Picard MarkDuplicates extracts the flowcell tile information. https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard-
So it seems that the problem is slightly worse than not just tracking the optical duplicates, we have probably not detected and marked them ever since the UMIs were added to the TGA. |
I think it would be good to try to fix the MarkDuplicates issue for Optical Duplicate detection in TGA as well. Hopefully it's as easy as adding a new --READ_NAME_REGEX to Sentieon Dedup for the TGA cases to adjust for the UMI-sequences in the readheader |
Blocked by this: #1291 |
Updated the issues to the User Story format |
This issue will be solved by PR #1358 |
Need
As the lab head of unit I want to be able to trend the optical duplicates in samples processed by BALSAMIC so that we can see if the levels change when we change methods an instruments in the lab. Most recently due to the new NovaSeq X.
Suggested approach
Start collecting the optical duplicates per case and record in multiQC. This should be done both for panels, WES and WGS.
Considered alternatives
None
Deviation
None
System requirements assessed
Requirements affected by this story
N/A
Risk assessment needed
Risk assessment
Gathering new metrics doesn't change the analysis in any way so no risk.
SOUPs
N/A
Can be closed when
Blockers
Fix detect optical duplicates in TGA workflow #1291
Anything else?
The text was updated successfully, but these errors were encountered: