Skip to content

medema-group/BiG-SCAPE

Repository files navigation

License Github downloads Conda downloads Test workflow Coverage Pylint

Note: BiG-SCAPE 2.0 is still in beta. Please submit an issue if you find anything wrong with this release!

BiG-SCAPE

BiG-SCAPE (Biosynthetic Gene Similarity Clustering and Prospecting Engine) is a software package, written in Python, that constructs sequence similarity networks of Biosynthetic Gene Clusters (BGCs) and groups them into Gene Cluster Families (GCFs). BiG-SCAPE does this by calculating pairwise distances between gene clusters based on a comparison of their protein domain content, order, copy number and sequence identity.

BiG-SCAPE uses antiSMASH processed GenBank files, i.e. BGC predictions, as well as reference BGC GenBank files (user-defined and/or MIBiG repository). BiG-SCAPE outputs tab-delimited output files, a comprehensive SQLite database which stores all the generated results, and a rich HTML visualization that includes the BGC similarity network and CORASON-like, multi-locus phylogenies of each Gene Cluster Family.

In principle, BiG-SCAPE can also be used on any other gene clusters, such as pathogenicity islands, secretion system-encoding gene clusters, or even whole viral genomes.

For installation instructions, see here.

Learn more about BiG-SCAPE in the wiki.

BiG-SCAPE workflow

If you find BiG-SCAPE useful, please cite us:

A computational framework to explore large-scale biosynthetic diversity