Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vcf2maf to work with later Ensembl vep releases #339

Open
tamuanand opened this issue Mar 1, 2023 · 2 comments
Open

Update vcf2maf to work with later Ensembl vep releases #339

tamuanand opened this issue Mar 1, 2023 · 2 comments

Comments

@tamuanand
Copy link

Hi team

Thanks for the great toolkit. I was wondering if this could be updated to work with later Ensembl vep releases (the latest as of Feb 2023 is 109.3 - https://github.com/Ensembl/ensembl-vep/releases/tag/release%2F109.3)

Thanks

@coonya
Copy link

coonya commented Apr 19, 2023

####### VEP 109.3
Ensembl's VEP (Variant Effect Predictor) is popular for how it picks a single effect per gene as detailed here, its CLIA-compliant HGVS variant format, and Sequence Ontology nomenclature for variant effects.

Instead of the official instructions, we will use mamba (conda, but faster) to install VEP and its dependencies. If you don't already have mamba, use these steps to download and install it into $HOME/mambaforge, then run a script that adds it to your $PATH:

curl -L https://github.com/conda-forge/miniforge/releases/download/4.12.0-0/Mambaforge-Linux-x86_64.sh -o /tmp/mambaforge.sh
sh /tmp/mambaforge.sh -bfp $HOME/mambaforge && rm -f mambaforge.sh
. $HOME/mambaforge/etc/profile.d/conda.sh

You can add the following to your ~/.bashrc file to add mamba and conda to your $PATH whenever you login:

if [ -f "$HOME/mambaforge/etc/profile.d/conda.sh" ]; then
    . $HOME/mambaforge/etc/profile.d/conda.sh
fi

Use mamba to create and activate a conda environment with VEP, its dependencies, and other related tools:

mamba create -n vep
conda activate vep
mamba install -y -c conda-forge -c bioconda -c defaults ensembl-vep==109.3 htslib==1.14 bcftools==1.14 samtools==1.14 ucsc-liftover==377
cd {home of vep environment}/share
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf
chown 777 *.pl
cp *.pl ../../bin

change the vcf2maf.pl code:

$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_esp --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );

-->

$vep_cmd .= ( $vep_script =~ m/vep$/ ? " --af_1kg --af_gnomad" : " --maf_1kg --maf_esp" ) unless( $online );


my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
    Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
    SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
    EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
    PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
    HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomAD_AF gnomAD_AFR_AF
    gnomAD_AMR_AF gnomAD_ASJ_AF gnomAD_EAS_AF gnomAD_FIN_AF gnomAD_NFE_AF gnomAD_OTH_AF gnomAD_SAS_AF );

-->

my @ann_cols = qw( Allele Gene Feature Feature_type Consequence cDNA_position CDS_position
    Protein_position Amino_acids Codons Existing_variation ALLELE_NUM DISTANCE STRAND_VEP SYMBOL
    SYMBOL_SOURCE HGNC_ID BIOTYPE CANONICAL CCDS ENSP SWISSPROT TREMBL UNIPARC RefSeq SIFT PolyPhen
    EXON INTRON DOMAINS AF AFR_AF AMR_AF ASN_AF EAS_AF EUR_AF SAS_AF AA_AF EA_AF CLIN_SIG SOMATIC
    PUBMED MOTIF_NAME MOTIF_POS HIGH_INF_POS MOTIF_SCORE_CHANGE IMPACT PICK VARIANT_CLASS TSL
    HGVS_OFFSET PHENO MINIMISED GENE_PHENO FILTER flanking_bps vcf_id vcf_qual gnomADe_AF gnomADe_AFR_AF
    gnomADe_AMR_AF gnomADe_ASJ_AF gnomADe_EAS_AF gnomADe_FIN_AF gnomADe_NFE_AF gnomADe_OTH_AF gnomADe_SAS_AF );

Download VEP's offline cache for GRCh38, and the reference FASTA:

mkdir -p $HOME/.vep/homo_sapiens/109_GRCh38/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh38.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh38.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/fasta/homo_sapiens/dna_index/ $HOME/.vep/homo_sapiens/109_GRCh38/

(Optional) Download VEP's offline cache for GRCh37, and the reference FASTA which we must bgzip instead of gzip:

mkdir -p $HOME/.vep/homo_sapiens/109_GRCh37/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/release-109/variation/indexed_vep_cache/homo_sapiens_vep_109_GRCh37.tar.gz $HOME/.vep/
tar -zxf $HOME/.vep/homo_sapiens_vep_109_GRCh37.tar.gz -C $HOME/.vep/
rsync -avr --progress rsync://ftp.ensembl.org/ensembl/pub/grch37/release-109/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.toplevel.fa.gz $HOME/.vep/homo_sapiens/109_GRCh37/
gzip -d $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz
bgzip -i $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa
samtools faidx $HOME/.vep/homo_sapiens/109_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Test running VEP in offline mode on a GRCh38 VCF:

curl -sLO https://raw.githubusercontent.com/Ensembl/ensembl-vep/release/109/examples/homo_sapiens_GRCh38.vcf
vep --species homo_sapiens --assembly GRCh38 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $HOME/.vep --fasta $HOME/.vep/homo_sapiens/109_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz --input_file homo_sapiens_GRCh38.vcf --output_file homo_sapiens_GRCh38.vep.vcf --polyphen b --af --af_1kg --af_esp --regulatory

@tamuanand
Copy link
Author

Hi @coonya

Thanks a lot for your suggestions to change vcf2maf.pl.

Would you know what other changes would be needed for say maf2maf.pl, maf2vcf.pl, vcf2vcf.pl.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants