Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add tga cnvkit to gens #1448

Open
wants to merge 57 commits into
base: update_cnvkit_pons
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
e6516d6
add gens functionality to tga
mathiasbio Jun 14, 2024
7f6e521
add gens inputs
mathiasbio Jun 14, 2024
24e41a9
add gens gnomad af to tga
mathiasbio Jun 14, 2024
cdbe43f
black
mathiasbio Jun 14, 2024
f262919
fix
mathiasbio Jun 14, 2024
c2aafb9
fix
mathiasbio Jun 14, 2024
cc9e506
fix
mathiasbio Jun 17, 2024
d471395
fix
mathiasbio Jun 17, 2024
05bd3ae
fix
mathiasbio Jun 17, 2024
6868556
doc strings and named args
mathiasbio Jun 17, 2024
b03a9f0
black
mathiasbio Jun 17, 2024
582cfb0
changelog
mathiasbio Jun 17, 2024
e9c4c7f
bug fix
mathiasbio Jun 17, 2024
8f88da7
fix
mathiasbio Jun 17, 2024
4a4671e
lower padding
mathiasbio Jun 17, 2024
5bcf768
add tumor purity adjustment to gens cov file
mathiasbio Jun 18, 2024
acde38f
typehints
mathiasbio Jun 19, 2024
ed4c018
black
mathiasbio Jun 19, 2024
a108a0e
fix pytests
mathiasbio Jun 19, 2024
99d3d57
update purity adjustment formula
mathiasbio Jun 28, 2024
bf57f92
Merge branch 'develop' of github.com:Clinical-Genomics/BALSAMIC into …
mathiasbio Jun 28, 2024
d69c3c7
merge develop
mathiasbio Jun 28, 2024
44f25e1
m conflict
mathiasbio Jun 28, 2024
9335478
m conflict
mathiasbio Jun 28, 2024
5768116
adjust formula
mathiasbio Jul 15, 2024
6433887
add round
mathiasbio Jul 15, 2024
f3a2c66
code review
mathiasbio Jul 23, 2024
2de2692
fix merge conflicts
mathiasbio Jul 23, 2024
2b91865
fix bug
mathiasbio Jul 24, 2024
821bbd3
float
mathiasbio Jul 24, 2024
6aa5510
comment out purity and ploidy adjustment
mathiasbio Jul 24, 2024
f703189
adjust purity and ploidy
mathiasbio Jul 25, 2024
8994fc8
bug fix
mathiasbio Jul 25, 2024
11be7bf
remove purity and ploidy adjustment
mathiasbio Jul 25, 2024
3d44b03
new threads
mathiasbio Jul 29, 2024
b484670
return purity adjustment
mathiasbio Jul 31, 2024
1fac876
code review
mathiasbio Jul 31, 2024
9688e3d
new purity and ploidy formula from PureCN
mathiasbio Aug 1, 2024
8d80a75
merge pon repo
mathiasbio Aug 2, 2024
3abae1b
fix pytests
mathiasbio Aug 2, 2024
ce95786
black
mathiasbio Aug 2, 2024
d736f7c
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 8, 2024
4f7cf28
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 8, 2024
8f32eac
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 9, 2024
ca32fea
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 9, 2024
0c99f60
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 9, 2024
f5eebd6
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 9, 2024
8a40907
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 9, 2024
254c6a8
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 9, 2024
42c314a
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 9, 2024
e3f54f7
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 9, 2024
6030b87
add io test
mathiasbio Aug 13, 2024
604f059
add new test
mathiasbio Aug 13, 2024
92b5c4d
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Aug 15, 2024
2a07bb1
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Aug 15, 2024
3f6b529
Merge branch 'update_cnvkit_pons' of github.com:Clinical-Genomics/BAL…
mathiasbio Sep 10, 2024
9569ebd
Merge branch 'update_cnvkit_pons' into cnvkit_to_gens
mathiasbio Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions BALSAMIC/assets/scripts/postprocess_gens_cnvkit.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some tests for this script

Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import click
from BALSAMIC.utils.io import read_csv, write_list_of_strings

def calculate_log2_ratio(purity, log2_ratio, ploidy):
# Ensure that the inputs are within valid ranges
if not (0 <= purity <= 1):
raise ValueError("Purity must be between 0 and 1")

if ploidy <= 0:
raise ValueError("Ploidy must be a positive number")

# Ploidy and purity adjustment formula reference to PureCN issue: https://github.com/lima1/PureCN/issues/40
log2_adjusted = (
purity * log2_ratio * ploidy + 2 * (1 - purity) * log2_ratio - 2 * (1 - purity)
) / (purity * ploidy)

return log2_adjusted


@click.command()
@click.option(
"-o",
"--output-file",
required=True,
type=click.Path(exists=False),
help="Name of output-file.",
)
@click.option(
"-c",
"--normalised-coverage-path",
required=True,
type=click.Path(exists=True),
help="Input CNVkit cnr. result.",
)
@click.option(
"-p",
"--tumor-purity-path",
required=True,
type=click.Path(exists=True),
help="Tumor purity file from PureCN",
)
def create_gens_cov_file(
output_file: str, normalised_coverage_path: str, tumor_purity_path: str
):
"""Post-processes the CNVkit .cnr output for upload to GENS.

Removes Antitarget regions, adjusts for purity and ploidy and outputs the coverages in multiple resolution-formats.

Args:
output_file: Path to GENS output.cov file
normalised_coverage_path: Path to input CNVkit cnr file.
tumor_purity_path: Path to PureCN purity estimate csv file
"""
log2_data = []

# Process CNVkit file
cnr_dict_list: list[dict] = read_csv(
csv_path=normalised_coverage_path, delimeter="\t"
)

# Process PureCN purity file
purecn_dict_list: list[dict] = read_csv(csv_path=tumor_purity_path, delimeter=",")
purity = float(purecn_dict_list[0]["Purity"])
ploidy = float(purecn_dict_list[0]["Ploidy"])

for row in cnr_dict_list:
if row["gene"] == "Antitarget":
continue

# find midpoint
start: int = int(row["start"])
end: int = int(row["end"])
region_size: int = end - start
midpoint: int = start + int(region_size / 2)

# adjust log2 ratio
log2: float = float(row["log2"])
log2: float = calculate_log2_ratio(purity, log2, ploidy)
log2: float = round(log2, 4)

# store values in list
log2_data.append(f"{row['chromosome']}\t{midpoint - 1}\t{midpoint}\t{log2}")

# Write log2 data to output file
resolutions = ["o", "a", "b", "c", "d"]
resolution_log2_lines = [f"{resolution}_{line}" for resolution in resolutions for line in log2_data]
write_list_of_strings(resolution_log2_lines, output_file)


if __name__ == "__main__":
create_gens_cov_file()
2 changes: 1 addition & 1 deletion BALSAMIC/assets/scripts/preprocess_gens.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
"-s",
"--sequencing-type",
required=True,
type=click.Choice([SequencingType.WGS]),
type=click.Choice([SequencingType.WGS, SequencingType.TARGETED]),
help="Sequencing type used.",
)
@click.pass_context
Expand Down
38 changes: 16 additions & 22 deletions BALSAMIC/commands/config/case.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
get_bioinfo_tools_version,
get_panel_chrom,
get_sample_list,
get_gens_references,
)
from BALSAMIC.utils.io import read_json, write_json
from BALSAMIC.utils.utils import get_absolute_paths_dict
Expand Down Expand Up @@ -131,29 +132,22 @@ def case_config(
if cadd_annotations:
references.update(cadd_annotations_path)

if any([genome_interval, gens_coverage_pon, gnomad_min_af5]):
if panel_bed:
raise click.BadParameter(
"GENS is currently not compatible with TGA analysis, only WGS."
)
if not all([genome_interval, gens_coverage_pon, gnomad_min_af5]):
raise click.BadParameter(
"All three arguments (genome_interval gens_coverage_pon, gnomad_min_af5) are required for GENS."
)

gens_ref_files = {
"genome_interval": genome_interval,
"gens_coverage_pon": gens_coverage_pon,
"gnomad_min_af5": gnomad_min_af5,
}

references.update(
{
gens_file: path
for gens_file, path in gens_ref_files.items()
if path is not None
}
if analysis_workflow is not AnalysisWorkflow.BALSAMIC_QC:
gens_references: dict[str, str] = get_gens_references(
genome_interval=genome_interval,
gens_coverage_pon=gens_coverage_pon,
gnomad_min_af5=gnomad_min_af5,
panel_bed=panel_bed,
)
if gens_references:
# Update references dictionary with GENS reference-files
references.update(
{
gens_file: path
for gens_file, path in gens_references.items()
if path is not None
}
)

variants_observations = {
"clinical_snv_observations": clinical_snv_observations,
Expand Down
8 changes: 6 additions & 2 deletions BALSAMIC/constants/cluster_analysis.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"time": "00:15:00",
"n": 1
},
"gens_preprocessing": {
"gens_preprocess": {
"time": "01:00:00",
"n": 4
},
Expand Down Expand Up @@ -108,7 +108,7 @@
"time": "10:00:00",
"n": 5
},
"gatk_denoisereadcounts":{
"gatk_denoise_read_counts":{
"time": "10:00:00",
"n": 10
},
Expand Down Expand Up @@ -168,6 +168,10 @@
"time": "24:00:00",
"n": 36
},
"sentieon_DNAscope_gnomad_tga": {
"time": "24:00:00",
"n": 12
},
"sentieon_TNhaplotyper": {
"time": "24:00:00",
"n": 36
Expand Down
5 changes: 4 additions & 1 deletion BALSAMIC/constants/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@
"c": 1000,
"d": 100,
},
}
},
SequencingType.TARGETED: {
"BAF_SKIP_N": {"o": 0, "a": 0, "b": 0, "c": 0, "d": 0},
},
},
}
1 change: 0 additions & 1 deletion BALSAMIC/containers/varcall_py3/varcall_py3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,6 @@ dependencies:
- svdb=2.8.1
- sysroot_linux-64=2.12
- tabix=1.11
- tiddit=3.3.2
- tk=8.6.12
- tktable=2.10
- toolz=0.12.0
Expand Down
Loading
Loading