SFARI Readviz Pipeline

This is the calling pipeline that has been used to create de-identified bam files for variants in the SFARI browser to be visualized. It performs the follow tasks:

Selects example heterozygous, homozygous or hemizygous variants from hail matrix table.
Create per sample input files for GATK HaplotypeCaller to create bam files. This is the read data that HaplotypeCaller "sees" before calling a variant.
De-identify bam files by stripping all details from each read and then combine in individual bam files into a grouped bam file
Create an sqlite file so that variants can be searched and mapped to corresponding grouped bam file

All code is based the gnomAD readviz pipeline and scripts developed and tested for Google Cloud Platform and has been adapted to work on Institutional clusters. The gnomAD pipeline was implemented using a collection of scripts, while this implementation consolidates it into one WDL workflow.

Requirements

Java 1.8
Python
GATK (tested on 4.1.7.0)
Cromwell (tested on v56)
Sqlite3

Python libraries required

hail
peewee
pysam
tqdm

Note on environment

To ensure that all the Requirements was in place, a conda environment was created. The command line

conda activate hail_jupyter

was used in some WDL task functions before python script was called.

Running WDL using Cromwell

The Readviz WDL pipeline is run using cromwell using the following command line.

java -Dconfig.file=slurm.conf -jar \
/home/ml2529/shared/tools/jars/cromwell-56.jar run \
MultiSampleReadViz.wdl \
-i readviz_inputs.json  \
-o workflow.options

The launch.sh file is a batch script that can be used to launch in Slurm by the following command

sbatch launch.sh

Advanced Cromwell/WDL configuration

For details, please see SFARI mito call repo.

Inputs

The main input file is specified in readviz_inputs.json. In summary this contains the location of the follow input files:

Hail matrix table of variant calls that variants will be sampled
sample IDs to include
Location of bam/cram files for each sample
hg38 reference sequence and associated index files
location of required Python scripts

Ouputs

There a two major outputs that can then be used by the SFARI browser (in the IGV window):

Various group BAM files with read data for each variant. This can be served as a static file by the web server.
Sqlite database file for each chromosome. This contains information on which file(s) a variant can be found, which can be queried by the GraphQL API.

Connecting to the GraphQL API code - example

Will come back and write :o)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
scripts		scripts
yale_slurm_wdl		yale_slurm_wdl
.gitignore		.gitignore
MultiChromReadViz.wdl		MultiChromReadViz.wdl
MultiSampleReadViz.wdl		MultiSampleReadViz.wdl
MultiSampleReadViz_consolidated.wdl		MultiSampleReadViz_consolidated.wdl
MultiSampleReadViz_full.wdl		MultiSampleReadViz_full.wdl
README.md		README.md
SubmitViaDB_2024425.sh		SubmitViaDB_2024425.sh
clean_all.sh		clean_all.sh
hail_conda.yml		hail_conda.yml
local.conf		local.conf
mk-slurm.conf		mk-slurm.conf
readviz_inputs.json		readviz_inputs.json
readviz_inputs.json.bak		readviz_inputs.json.bak
readviz_inputs_consolidated.json		readviz_inputs_consolidated.json
readviz_inputs_full.json		readviz_inputs_full.json
readviz_inputs_old.json		readviz_inputs_old.json
run.sh		run.sh
slurm.conf		slurm.conf
slurmTest.sh		slurmTest.sh
slurmTest.sh.bak		slurmTest.sh.bak
workflow.options		workflow.options
workflow.options.bak		workflow.options.bak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SFARI Readviz Pipeline

Requirements

Python libraries required

Note on environment

Running WDL using Cromwell

Advanced Cromwell/WDL configuration

Inputs

Ouputs

Connecting to the GraphQL API code - example

About

Releases

Packages

Languages

leklab/sfari-readviz

Folders and files

Latest commit

History

Repository files navigation

SFARI Readviz Pipeline

Requirements

Python libraries required

Note on environment

Running WDL using Cromwell

Advanced Cromwell/WDL configuration

Inputs

Ouputs

Connecting to the GraphQL API code - example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages