Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extend bedfiles for CNV analysis #1469

Merged
merged 12 commits into from
Jul 30, 2024

Conversation

mathiasbio
Copy link
Contributor

@mathiasbio mathiasbio commented Jul 29, 2024

Description

During the work introducing GENS for TGA it was discovered that about half of all regions in the bedfile for GMCKsolid 4.1 was discarded by CNVkit during analysis. This was due to small regions in the bedfile, see issue: #1468

To fix this it was initially planned to update the bedfiles in the target capture repo: https://github.com/Clinical-Genomics/target_capture_bed/issues/133 however there was no easy solution to implement this alternate versions of the bedfiles, and in the end the best way to solve the issue seems to be to extend these short regions dynamically during runtime specifically for the CNV analysis.

This also has the benefit of leaving the SNV and InDel calling, and the QC metrics unchanged.

To achieve this a new rule was added with a extend bed-regions step, followed by bedtools sort and merge to merge potentially overlapping bed regions after the extension.

Added

  • padding of bed-regions for CNVkit to minimum 100 bases

Changed

  • [Description]

Fixed

  • [Description]

Removed

  • [Description]

Documentation

  • N/A
  • Updated Balsamic documentation to reflect the changes as needed for this PR.
    • balsamic_pon.rst updated with minimum region size information

Tests

Feature Tests

  • N/A
  • Test [Description]
    • [Screenshot]

Pipeline Integrity Tests

  • Report deliver (generation of the .hk file)
    • N/A
    • Verified
  • TGA T/O Workflow
    • N/A
    • Verified
  • TGA T/N Workflow
    • N/A
    • Verified
  • UMI T/O Workflow
    • N/A
    • Verified
  • UMI T/N Workflow
    • N/A
    • Verified
  • WGS T/O Workflow
    • N/A
    • Verified
  • WGS T/N Workflow
    • N/A
    • Verified
  • QC Workflow
    • N/A
    • Verified
  • PON Workflow
    • N/A
    • Verified

Clinical Genomics Stockholm

Documentation

Panel of Normal specific criteria

User Changes

  • N/A
  • This PR affects the output files or results.
    • User feedback is considered unnecessary because: the communication with customers is ongoing for the CNVkit to GENS PR: feat: add tga cnvkit to gens #1448 .
    • Affected users have been included in the development process and given a chance to provide feedback.

Infrastructure Changes

  • Stored files in Housekeeper
    • N/A
    • Updated: [Link]
  • CG (CLI and delivered/uploaded files)
    • N/A
    • Updated: [Link]
  • Servers (configuration files on Hasta)
    • N/A
    • Updated: [Link]
  • Scout interface
    • N/A
    • Updated: [Link]

Checklist

Important

Ensure that all checkboxes below are ticked before merging.

For Developers

  • PR Description
    • Provided a comprehensive description of the PR.
    • Linked relevant user stories or issues to the PR.
  • Documentation
    • Verified and updated documentation if necessary.
  • Tests
    • Described and tested the functionality addressed in the PR.
    • Ensured integration of the new code with existing workflows.
    • Confirmed that meaningful unit tests were added for the changes introduced.
    • Checked that the PR has successfully passed all relevant code smells and coverage checks.
  • Review
    • Addressed and resolved all the feedback provided during the code review process.
    • Obtained final approval from designated reviewers.

For Reviewers

  • Code
    • Code implements the intended features or fixes the reported issue.
    • Code follows the project's coding standards and style guide.
  • Documentation
    • Pipeline changes are well-documented in the CHANGELOG and relevant documentation.
  • Tests
    • The author provided a description of their manual testing, including consideration of edge cases and boundary
      conditions where applicable, with satisfactory results.
  • Review
    • Confirmed that the developer has addressed all the comments during the code review.

@mathiasbio mathiasbio changed the base branch from master to update_cnvkit_pons July 29, 2024 15:15
@mathiasbio mathiasbio self-assigned this Jul 29, 2024
@mathiasbio mathiasbio linked an issue Jul 29, 2024 that may be closed by this pull request
3 tasks
@mathiasbio mathiasbio added this to the Release 16 milestone Jul 29, 2024
@mathiasbio mathiasbio marked this pull request as ready for review July 29, 2024 16:21
@mathiasbio mathiasbio requested a review from a team as a code owner July 29, 2024 16:21
@mathiasbio mathiasbio requested a review from ivadym July 29, 2024 16:21
Copy link

codecov bot commented Jul 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (update_cnvkit_pons@8d6736a). Learn more about missing BASE report.

Additional details and impacted files
@@                  Coverage Diff                  @@
##             update_cnvkit_pons    #1469   +/-   ##
=====================================================
  Coverage                      ?   99.48%           
=====================================================
  Files                         ?       40           
  Lines                         ?     1960           
  Branches                      ?        0           
=====================================================
  Hits                          ?     1950           
  Misses                        ?       10           
  Partials                      ?        0           
Flag Coverage Δ
unittests 99.48% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mathiasbio
Copy link
Contributor Author

As a note! I wonder if we could skip the workflow integrity checks in this PR since it is a nested PR which I plan to test more thoroughly in the add CNVkit to GENS PR #1448 since that is where I have access to a better way of interpreting the results. So I was thinking I could run all of my tests there. Additionally all PRs will eventually end up in the #1358 PR which I will run some final tests on when everything has been merged. So we're pretty far from the develop branch

Copy link

sonarcloud bot commented Jul 29, 2024

@mathiasbio mathiasbio merged commit 2a2dcfe into update_cnvkit_pons Jul 30, 2024
7 checks passed
@mathiasbio mathiasbio deleted the extend_beds_for_cnv branch July 30, 2024 07:43
@mathiasbio mathiasbio mentioned this pull request Jul 30, 2024
72 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[User Story] Fix issue with filtered regions in CNVkit
1 participant