Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tga cnvkit to gens #1448

Open
wants to merge 57 commits into
base: update_cnvkit_pons
Choose a base branch
from
Open

Conversation

mathiasbio
Copy link
Contributor

@mathiasbio mathiasbio commented Jun 14, 2024

Description

This PR adds post-processing steps to CNVkit results from TGA to facilitate upload to GENS, which has previously only been possible for WGS via post-processing of the GATK CollectReadCounts output.

As the gnomad vcf is required as well for the creation of the BAF visualisation track in GENS the config and the GENS rule assignment has been modified to make it possible to use of these rules and references in TGA as well.

And additional little script was added to massage the CNVkit file tumor.merged.cnr into a GENS accepted format with different resolutions.

This PR closes this issue:
#1385

Open question to discuss: purity adjusted log2 coverage values

I have in this GENS post-processing also decided to take as input the tumor-purity from PureCN to modify the log2 coverage values to make the fold-changes more visible in low-purity samples. I don't know if this is recommended, however CNVs in low purity samples would be quite difficult to observe without it.

This change requires further changes in CG

We need 2 changes as far as I can tell at the moment:

  1. New argument for TGA analyses: --gnomad-min-af5 to add rules for creating GENS output
  2. We need to remove / re-write the filter for GENS upload which at the moment only allows WGS to be uploaded from balsamic

PR in CG: Clinical-Genomics/cg#3361

Added

  • Script to post-process CNVkit output to GENS-format
  • DNAscope gnomad calling to TGA for GENS

Changed

  • Parsing of GENS arguments changed to account for TGA

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • [Document Name]

Tests

Feature Tests

  • N/A
  • Test [Description]
    • [Screenshot]

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified
  • TGA T/N Workflow
    • N/A
    • Verified
  • UMI T/O Workflow
    • N/A
    • Verified
  • UMI T/N Workflow
    • N/A
    • Verified
  • WGS T/O Workflow
    • N/A
    • Verified
  • WGS T/N Workflow
    • N/A
    • Verified
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

  • Atlas documentation
    • N/A
    • Updated: [Link]
  • Web portal for Clinical Genomics
    • N/A
    • Updated: [Link]

User Changes

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

@mathiasbio mathiasbio changed the base branch from master to develop June 14, 2024 16:22
@mathiasbio mathiasbio linked an issue Jun 17, 2024 that may be closed by this pull request
3 tasks
@mathiasbio mathiasbio self-assigned this Jun 17, 2024
@mathiasbio mathiasbio marked this pull request as ready for review June 19, 2024 13:41
@mathiasbio mathiasbio requested a review from a team as a code owner June 19, 2024 13:41
@mathiasbio
Copy link
Contributor Author

At the moment the pipeline is working for the TGA workflows but i haven't verified all workflows yet. So at the moment we could just view this review as a code-review.

Copy link

codecov bot commented Jun 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.49%. Comparing base (55dd650) to head (9569ebd).

Additional details and impacted files
@@                 Coverage Diff                 @@
##           update_cnvkit_pons    #1448   +/-   ##
===================================================
  Coverage               99.48%   99.49%           
===================================================
  Files                      40       40           
  Lines                    1960     1976   +16     
===================================================
+ Hits                     1950     1966   +16     
  Misses                     10       10           
Flag Coverage Δ
unittests 99.49% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathiasbio mathiasbio changed the base branch from deduplicate_with_umi to update_cnvkit_pons July 30, 2024 07:47
@mathiasbio mathiasbio mentioned this pull request Aug 2, 2024
58 tasks
@mathiasbio mathiasbio added this to the Release 16 milestone Aug 23, 2024
@mathiasbio mathiasbio mentioned this pull request Sep 2, 2024
66 tasks
Copy link

sonarcloud bot commented Sep 10, 2024

mathiasbio added a commit to Clinical-Genomics/cg that referenced this pull request Sep 30, 2024
GENS has previously only been activated for WGS in Balsamic, however with the inclusion of this PR into production Clinical-Genomics/BALSAMIC#1448 CNV and BAF profiles from TGA samples can be uploaded as well. This feature is planned for release 16.0.0 of Balsamic (somewhere around **August maybe**) and requires a couple of small changes in CG.

### Added

- gnomad-af argument to TGA samples

### Changed

- gens upload no longer filters out TGA samples
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Testing
Development

Successfully merging this pull request may close these issues.

[User Story] Integrate TGA CNV results into GENS
2 participants