Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IBD Browser #86

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
4661aa2
wip: add IBD home page
rileyhgrant Dec 6, 2023
d80da32
wip: add IBD terms page with placeholders
rileyhgrant Dec 6, 2023
f4007aa
wip: add IBD variant filter with placeholders
rileyhgrant Dec 6, 2023
4bf276b
wip: add IBDGeneResults with placeholders
rileyhgrant Dec 6, 2023
ecb4f52
wip: add IBDBrowser.js with placeholders and todos
rileyhgrant Dec 6, 2023
7c2a968
Update gitignore to include local dev data directory
rileyhgrant Dec 6, 2023
64fdcea
wip: update webpack config to allow local running of IBD browser
rileyhgrant Dec 6, 2023
83bf7c5
Bump Hail version in data pipeline requirements
rileyhgrant Jan 4, 2024
55f3af0
Add .tool-versions file and bump python for pipeline
rileyhgrant Feb 1, 2024
76de66e
wip: data pipeline working overall, variants may need addl work
rileyhgrant Feb 1, 2024
f0b941f
chore: Update caniuse-lite to remove warning
rileyhgrant Feb 1, 2024
bc41c14
wip: Update IBD browser frontend files for demo
rileyhgrant Feb 5, 2024
2926aad
fixup: lower python version in tool versions for hail compatibility
rileyhgrant Feb 28, 2024
0ce0e75
fixup: update homepage sample gene
rileyhgrant Feb 28, 2024
b9b3bf1
fixup: render p values in exponential notation
rileyhgrant Feb 28, 2024
781b501
fixup: update home page copy
rileyhgrant Feb 28, 2024
8cce2a6
fixup: update about.md content
rileyhgrant Feb 28, 2024
4eacb52
fixup: update IBD terms page copy
rileyhgrant Feb 28, 2024
a26df27
fixup: update IBD about page copy
rileyhgrant Feb 28, 2024
1358dd3
fixup: add IBD to other studies if not on dataset
rileyhgrant Feb 28, 2024
4277b10
Add about page to IBD header
rileyhgrant Feb 28, 2024
9ef3866
Update python version in CI to stay in sync with tool versions
rileyhgrant Feb 28, 2024
628b418
Bump version of setup-python in ci.yml
rileyhgrant Feb 28, 2024
6ab0ea2
fixup: appease js linter
rileyhgrant Feb 29, 2024
aef33ba
fixup: appease python black formatter
rileyhgrant Feb 29, 2024
36c5185
fixup: appease pylint
rileyhgrant Feb 29, 2024
d81c904
fixup: appease python black formatter again
rileyhgrant Feb 29, 2024
325a3ee
DONTMERGE: temporarily swap p value and chisq columns for demo
rileyhgrant Mar 25, 2024
840aa8e
wip: update about page with contributors pre-meta-analysis
rileyhgrant Mar 25, 2024
3c629b8
fixup: adjust columns on gene results page
rileyhgrant Mar 25, 2024
43e3dcf
fixup: appease eslint again
rileyhgrant Mar 25, 2024
6ac5d7a
Update IBD variant data input source
rileyhgrant Mar 25, 2024
331f4d9
Update IBD variant pipeline to rename result groups
rileyhgrant Mar 25, 2024
747f3ec
Update IBD gene pipeline to remap results terms
rileyhgrant Mar 25, 2024
52e0afd
fix(data-pipeline): fix casing in data pipeline
rileyhgrant Jun 21, 2024
f707b63
wip(frontend): temp edits to make demo work
rileyhgrant Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ jobs:
- name: Checkout
uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v1
uses: actions/setup-python@v2
with:
python-version: 3.7
python-version: 3.11.7
- name: Use pip cache
uses: actions/cache@v2
with:
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,6 @@ __pycache__

# Hail logs
hail-*.log

# dev data
/data
3 changes: 3 additions & 0 deletions .tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
yarn 1.22.19
node 16.13.1
python 3.11.7
1 change: 1 addition & 0 deletions data_pipeline/data_pipeline/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@
for section, option in REQUIRED_CONFIGURATION:
value = pipeline_config.get(section, option)
assert value
# pylint: disable=broad-exception-raised
except (configparser.NoOptionError, AssertionError) as exc:
raise Exception(f"Missing required configuration '{section}.{option}'") from exc
9 changes: 8 additions & 1 deletion data_pipeline/data_pipeline/datasets/ibd/ibd_gene_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,14 @@ def prepare_gene_results():
final_results = final_results.annotate(
group_results=hl.dict(
final_results.group_results.map(
lambda group_result: (group_result.analysis_group, group_result.drop("analysis_group"))
lambda group_result: (
hl.switch(group_result.analysis_group)
.when("ibd", "IBD")
.when("cd", "CD")
.when("uc", "UC")
.or_missing(),
group_result.drop("analysis_group"),
)
)
)
)
Expand Down
25 changes: 18 additions & 7 deletions data_pipeline/data_pipeline/datasets/ibd/ibd_variant_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,25 +9,36 @@ def prepare_variant_results():
# Get unique variants from results table
variants = results.group_by(results.locus, results.alleles).aggregate()

# Looks like ac_control was mistakenly encoded as a string, e.g. "[83198, 0]"
# Select AC/AF numbers for the reference and alternate alleles
results = results.annotate(
# pylint: disable-next=anomalous-backslash-in-string, unnecessary-lambda
ac_control=hl.map(lambda x: hl.int(x), results.ac_control.replace("\[", "").replace("\]", "").split(", "))
ac_case=results.ac_case[1],
ac_ctrl=results.ac_control[1],
an_case=results.ac_case[0],
an_ctrl=results.ac_control[0],
)

# Select AC/AF numbers for the alternate allele
results = results.annotate(ac_case=results.ac_case[1], ac_ctrl=results.ac_control[1])
# pylint: disable=broad-exception-raised
# TODO: also, in gene results I should figure out what is going on with all the
# bajillion fields I'm returning (0_001_03, etc)
# need to check the input schema of something like Epi25 vs IBD

results = results.drop("ac_control")

results = results.filter((results.ac_case > 0) | (results.ac_ctrl > 0))

# Annotate variants with a struct for each analysis group
# Annotate variants with a struct for each analysis group, rename the analysis groups
results = results.group_by("locus", "alleles").aggregate(group_results=hl.agg.collect(results.row_value))
results = results.annotate(
group_results=hl.dict(
results.group_results.map(
lambda group_result: (group_result.analysis_group, group_result.drop("analysis_group"))
lambda group_result: (
hl.switch(group_result.analysis_group)
.when("ibd-control", "IBD")
.when("cd-control", "CD")
.when("uc-control", "UC")
.or_missing(),
group_result.drop("analysis_group"),
)
)
)
)
Expand Down
17 changes: 10 additions & 7 deletions data_pipeline/data_pipeline/pipelines/prepare_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,23 @@
def prepare_dataset(dataset_id):
output_path = pipeline_config.get("output", "staging_path")

gene_results_module = importlib.import_module(
f"data_pipeline.datasets.{dataset_id.lower()}.{dataset_id.lower()}_gene_results"
)
# gene_results_module = importlib.import_module(
# f"data_pipeline.datasets.{dataset_id.lower()}.{dataset_id.lower()}_gene_results"
# )
variant_results_module = importlib.import_module(
f"data_pipeline.datasets.{dataset_id.lower()}.{dataset_id.lower()}_variant_results"
)

gene_results = gene_results_module.prepare_gene_results()
validate_gene_results_table(gene_results)
gene_results.write(os.path.join(output_path, dataset_id.lower(), "gene_results.ht"), overwrite=True)
# gene_results = gene_results_module.prepare_gene_results()
# validate_gene_results_table(gene_results)
# gene_results.write(os.path.join(output_path, dataset_id.lower(), "gene_results.ht"), overwrite=True)

variant_results = variant_results_module.prepare_variant_results()
validate_variant_results_table(variant_results)
variant_results.write(os.path.join(output_path, dataset_id.lower(), "variant_results.ht"), overwrite=True)
variant_results.write(os.path.join(output_path, dataset_id.lower(), "variant_results.ht"), overwrite=True)

print("exiting!")
exit(0)


def main():
Expand Down
6 changes: 3 additions & 3 deletions data_pipeline/pipeline_config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ variant_annotations_path = gs://schema-browser/200911/2020-09-11_schema-browser-

[IBD]
gene_results_path = gs://ibd-browser/09-11-2023/gene_based_results.ht
variant_results_path = gs://ibd-browser/09-11-2023/variants_results.ht
variant_results_path = gs://ibd-browser/03-01-2024/variants_results.ht
variant_annotations_path = gs://ibd-browser/09-11-2023/variants_annotations.ht

[reference_data]
Expand All @@ -43,10 +43,10 @@ exac_constraint_path = gs://gcp-public-data--gnomad/legacy/exac_browser/forweb_c
[dataproc]
project = exac-gnomad
region = us-east1
zone = us-east1-d
zone = us-east1-c
# Because the data buckets are in a different project, use a service account that has access to them.
service-account = erb-data-pipeline@exac-gnomad.iam.gserviceaccount.com

[output]
# Path for intermediate Hail files.
staging_path = gs://exome-results-browsers/data/231116
staging_path = gs://exome-results-browsers/data/240325
45 changes: 16 additions & 29 deletions data_pipeline/requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,57 +1,44 @@
#
# This file is autogenerated by pip-compile with python 3.7
# To update, run:
# This file is autogenerated by pip-compile with Python 3.9
# by the following command:
#
# pip-compile requirements-dev.in
#
astroid==2.12.12
astroid==3.0.2
# via pylint
black==22.10.0
# via -r requirements-dev.in
click==8.1.3
# via black
dill==0.3.6
click==8.1.7
# via
# -c requirements.txt
# black
dill==0.3.7
# via
# -c requirements.txt
# pylint
importlib-metadata==5.0.0
# via click
isort==4.3.21
isort==5.13.2
# via pylint
lazy-object-proxy==1.8.0
# via astroid
mccabe==0.6.1
mccabe==0.7.0
# via pylint
mypy-extensions==0.4.3
mypy-extensions==1.0.0
# via black
pathspec==0.10.1
pathspec==0.12.1
# via black
platformdirs==2.5.2
platformdirs==4.1.0
# via
# black
# pylint
pylint==2.15.5
pylint==3.0.3
# via -r requirements-dev.in
tomli==2.0.1
# via
# black
# pylint
tomlkit==0.11.6
tomlkit==0.12.3
# via pylint
typed-ast==1.5.4
# via
# astroid
# black
typing-extensions==4.4.0
typing-extensions==4.9.0
# via
# -c requirements.txt
# astroid
# black
# importlib-metadata
# pylint
wrapt==1.14.1
# via
# -c requirements.txt
# astroid
zipp==3.10.0
# via importlib-metadata
2 changes: 1 addition & 1 deletion data_pipeline/requirements.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
hail
hail==0.2.126
tqdm
Loading
Loading