WES-WGS

Pipelines and tools for WES and WGS data processing

pjob_bwa_gatk.sh

Script creates and submits PBS batch script

Script description

Create submission script for rsync of .fastq files from RDS to HPC
Creates submission script for bwa_gatk.sh
Creates submission script for rsync of bwa_katk.sh results from HPC to RDS. Script will remove -t folder after rsync results.
Limit of parallel job to run concurrently with -p option. Default is 1. I.e. job will run sequentially.
Option to define path for the result folder with -o option.
Option to define path for storing the fastq processed with -kfq option. This is useful for converting bam to fastq.gz for reference.
Script support for fastq to vcf and bam to fastq to vcf flow.

bwa_gatk.sh

Align WGS or WES using BWA-MEM and call variants using GATK

Script description

Index the reference FASTA sequence for use with BWA
Align FASTQ reads with BWA-MEM to hg38 with ATL-contigs
Add read group information, preprocess to make a clean BAM and call variants
Create unmapped uBAM- This step uses Picard tools RevertSam function to revert sam file produced during alignment to previous state. We clear read attributes that have prevented alignments (XA: Alternative hits; format: (chr,pos,CIGAR,NM;) & XS:Suboptimal alignment score)
Add read group information to uBAM
Merge uBAM with aligned BAMi) Performed with Picard MergeBamAlignment. Needs to have FASTA dictionary file (.dic) in same directory as reference sequence.ii) Set UNMAP_CONTAMINANT_READS (Boolean) to “true” which enables detection of reads originating from foreign organisms (e.g. bacterial DNA in a non-bacterial sample), and unmap + label those reads accordingly. iii) When UNMAP_CONTAMINANT_READS is set, then we must setMIN_UNCLIPPED_BASES (Integer), default = 32

7. Base Recalibrator Report & 8. Perform Base Quality Score Recalibration (BQSR) - Currently uncommented as previous studies by Tian et al 2016 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048557/) found BQSR had virtually negligible effect on INDEL calling and generally reduced sensitivity for SNP calling and reduced the SNP calling sensitivity but improved the precision when the coverage is insufficient. However, in regions of high divergence (e.g., the HLA region), BQSR reduced the sensitivity of callers. There is currently not a good argument to support its use if we are using alt-aware alignment. # - This step attempts performs recalibration of base score qualities using machine learning. Read Group information is used to identify the same sample/ run therefore this must be performed prior to changing of read group information.

Flag duplicate reads- The step flags duplicate reads. OPTICAL_DUPLICATE_PIXEL_DISTANCE; The maximum offset between two duplicate clusters in order to consider them optical duplicates. The default is appropriate for unpatterned versions of the Illumina platform. For the patterned flowcell models, 2500 is moreappropriate. For other platforms and models, users should experiment to find what works best. Default value: 100. This option can be set to 'null' to clear the default value.
Coordinate sort, fix NM and UQ tags and index for clean BAMi) Picard.SortSam; Sort BAM file using coordinatesii) Picard. SetNmAndUqTags; - NM:i: Edit distance to the reference, including ambiguous bases but excluding clipping. i.e. mismatches to reference- UQ:i: Phred likelihood of the segment, conditional on the mapping being correct. i.e. quality score of the mapping- NOTE; Here we are using picard 2.7.1. With picard 2.10.6 and higher command is SetNmMDAndUqTags however seems to have piping issues at the moment https://gatkforums.broadinstitute.org/gatk/discussion/10104/picard-2-10-7-fails-pipelining-sortsam-and-setnmanduqtags
Call SNP and indel variants in reference confidence (ERC) mode per sample using HaplotypeCaller

annovar.sh

Annotates .vcf files from gatk using ANNOVAR

Script description

ANNOVAR .vcf file annotioans with currently implemented annnotations that include; refGene,cytoBand,exac03,exac03nontcga,exac03nonpsych,avsnp147,dbnsfp33a,dbscsnv11,cosmic70,esp6500siv2_ea,esp6500siv2_aa,esp6500siv2_all,gnomad_exome,gnomad_genome,AFR.sites.2015_08,ALL.sites.2015_08,AMR.sites.2015_08,mcap,revel,clinvar_20170130 NOTE: The hg38_avsnp147.txt annotations contained irregular lines. Lines were removed with following script;

cp hg38_avsnp147.txt old_hg38_avsnp147.txt

awk -F'\t' 'NF==6 {print}' hg38_avsnp147.txt > out_4tab_hg38_avsnp147.txt

cp out_4tab_hg38_avsnp147.txt hg38_avsnp147.txt

annotations to change - remove exac03,exac03nontcga,exac03nonpsych as gnomad is successor.

RUN SCRIPT

~/scripts/annovar.sh CDS0024.20K_ /project/RDS-SMS-FFbigdata-RW/Genetics/Genomes/whole_genome_sequencing_2012/FASTQ/bwagatk_test/test_271117

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md
annovar.sh		annovar.sh
bwa_gatk.sh		bwa_gatk.sh
pjob_bwa_gatk.sh		pjob_bwa_gatk.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WES-WGS

pjob_bwa_gatk.sh

Script creates and submits PBS batch script

Script description

bwa_gatk.sh

Align WGS or WES using BWA-MEM and call variants using GATK

Script description

annovar.sh

Annotates .vcf files from gatk using ANNOVAR

Script description

cp hg38_avsnp147.txt old_hg38_avsnp147.txt

awk -F'\t' 'NF==6 {print}' hg38_avsnp147.txt > out_4tab_hg38_avsnp147.txt

cp out_4tab_hg38_avsnp147.txt hg38_avsnp147.txt

annotations to change - remove exac03,exac03nontcga,exac03nonpsych as gnomad is successor.

About

Releases

Packages

Contributors 4

Languages

zchatt/WES-WGS

Folders and files

Latest commit

History

Repository files navigation

WES-WGS

pjob_bwa_gatk.sh

Script creates and submits PBS batch script

Script description

bwa_gatk.sh

Align WGS or WES using BWA-MEM and call variants using GATK

Script description

annovar.sh

Annotates .vcf files from gatk using ANNOVAR

Script description

cp hg38_avsnp147.txt old_hg38_avsnp147.txt

awk -F'\t' 'NF==6 {print}' hg38_avsnp147.txt > out_4tab_hg38_avsnp147.txt

cp out_4tab_hg38_avsnp147.txt hg38_avsnp147.txt

annotations to change - remove exac03,exac03nontcga,exac03nonpsych as gnomad is successor.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages