Skip to content

A fast and flexible program to annotate/interpret genetic variants in VCF/BCF file

Notifications You must be signed in to change notification settings

shiquan/bcfanno

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Getting started with BCFANNO

bcfanno was designed to annotate genetic variants with various biological databases and predict the variant types and HGVS names quickly. Here are some easy instructions to get you up and running toy data.

Installing bcfanno

Download the source code or binary distribution from https://github.com/shiquan/bcfanno/releases

Or cloning from github, get the most updated version :

git clone https://github.com/shiquan/bcfanno
cd bcfanno && make

Test toy data

Run bcfanno with toy data :

# stdout output
./bcfanno -c toy.json example/toy.vcf.gz

# Or save results to a compressed VCF :
./bcfanno -c toy.json example/toy.vcf.gz -O z -o results.vcf.gz

# Or save as bcf :
./bcfanno -c toy.json example/toy.vcf.gz -O b -o results.bcf

Select annotations from result

vcf2tsv is an additional program in bcfanno package, use to convert VCF/BCF file to user-friendly tab-seperated-variants file.

./bcfanno -c toy.json example/toy.vcf.gz -q | ./vcf2tsv -f CHROM,POS,CytoBand,REF,ALT,GT,SAMPLE,RS,MolecularConsequence,Gene,HGVSnom,ExonIntron,AAlength,HGMD_disease,HGMD_tag,HGMD_pmid

## Another usage
./bcfanno -c toy.json example/toy.vcf.gz -q | ./vcf2tsv -f BED,CytoBand,TGT,SAMPLE,RS,MolecularConsequence,Gene,HGVSnom,ExonIntron,AAlength,HGMD_disease,HGMD_tag,HGMD_pmid

(Optional) View annotations with Microsoft excel

This step is optional, need to install my another program tsv2excel first.

./bcfanno -c toy.json example/toy.vcf.gz -q | ./vcf2tsv -f BED,CytoBand,TGT,SAMPLE,RS,MolecularConsequence,Gene,HGVSnom,ExonIntron,AAlength,HGMD_disease,HGMD_tag,HGMD_pmid | tsv2excel -o toy.xlsx

Additional programs included in bcfanno package

Beside the core program bcfanno, belowed programs will also be generated in the package.

  • tsv2vcf , generate VCF databases from tab-seperated file
  • vcf2tsv, convert VCF file to tab-separated file with selected tags
  • vcf_rename_tags, rename tags or contig names in the VCF file, usually used to format the databases
  • GenePredExtGen Generate genepredext format with genome annotation and reference databases.

Notice

  • I recommend using UCSC released genomes so that you can easily analysis your data later with the UCSC Genome Browser and datasets. Please notice that chr1 != Chr1 != 1. It may not easy for user to generate annotation database manually, I have built GEA databases of hg19 and GRCh38. Please find them at below section.

Download databases from server

Please download the GEA databases from

** A lot of databases could be download from different institute freely, we are not plan to redistribute these datasets (And of course, some license also require us not to do that). But if you have the database problem with bcfanno, please feel free to give me an email so that we can give you possible assistance. **

Update

  • 2018/09/07, Update mitochondria records in GEA database, and update genetic codon map for mitochondria. --mito parameter is also added.

Bug report or suggestions

Kindly report bugs and suggestions through github.

LICENSE

The full package of bcfanno is distributed by MIT License, copyright 2016-2018 BGI Research.

Belowed package or source code used in bcfanno copyrighted by other institution.

  • htslib1.6 The MIT/Expat License, Copyright (C) 2012-2014 Genome Research Ltd.
  • thread_pool.[ch] The MIT/Expat License, Copyright (c) 2013-2017 Genome Research Ltd.

How to cite bcfanno

Please cite https://github.com/shiquan/bcfanno.

Reference

  1. HGVS nomenclature
  2. VCF format

About

A fast and flexible program to annotate/interpret genetic variants in VCF/BCF file

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages