This is a MIT licensed C and Python library with a CLI for manipulating/reading/writing TAF (described below) and MAF format multiple sequence alignments. It allows conversion between the formats and manipulation of the alignments with a number of useful utilities for preparing them for different use cases. The Python library is built on top of the C library and is therefore quite fast.
See the Taf format page for a specification of the taf format and example.
See C/CLI Install for how to build and install this source for using the C library and CLI utilities.
See Python install for how to install the Python library.
See taffy utilities for a description of the many useful taffy utilities, including:
- view - MAF / TAF conversion and region extraction
- norm - normalize TAF blocks
- add-gap-bases - add sequences from HAL or FASTA files into TAF gaps
- index - create a .tai index (required for region extraction)
- sort - sort the rows of a TAF file to a desired order
- stats - print statistics of a TAF file
- coverage - print coverage statistics of a given genome in a TAF file
See taffy scripts for a description of useful Python scripts, including:
- alignment plot - A (relatively) fast MSA visualization, with coverage, copy-number, identity, and dotplot options.
See using the Python API for how to work with MAF/TAF alignments using a convenient Python API designed to complement the CLI.
See the example notebook for a quick worked example of using the Python API for machine learning with PyTorch.
There is also a simple C library for working with taf/maf files. See taf.h in the inc directory.