LinguaPhylo: Communicating and reproducing probabilistic models for phylogenetic analysis

A new paradigm for scientific computing and data science has begun to emerged in the last decade. A recent example is the publication of the first "computationally reproducible article" using eLife's Reproducible Document Stack which blends features of a traditional manuscript with live code, data and interactive figures.

Although standard tools for statistical phylogenetics provide a degree of reproducibility and reusability through popular open-source software and computer-readable data file formats, there is still much to do. The ability to construct and accurately communicate probabilistic models in phylogenetics is frustratingly underdeveloped. There is low interoperability between different inference packages (e.g. BEAST1, BEAST2, MrBayes, RevBayes), and the file formats that these software use have low readability for researchers.

This tool contains two related projects: LinguaPhylo (LPhy for short) and LPhyBEAST.

LinguaPhylo (LPhy for short - pronounced el-fee)

In this project we aim to develop a model specification language to concisely and precisely define probabilistic phylogenetic models. The aim is to work towards a lingua franca for probabilistic models of phylogenetic evolution. This language should be readable by both humans and computers. Here is a full example:

Each line in this model block expresses how a random variable (left of the tilde) is generated from a generative distribution.

The first line creates a random variable, λ, that is log-normally distributed. The second line creates a tree, ψ, with 16 taxa from the Yule process with a lineage birth rate equal to λ. The third line produces a multiple sequence alignment with a length of 200, by simulating a Jukes Cantor model of sequence evolution down the branchs of the tree ψ. Each random variable depends on the previous, so this is a hierarchical model that ultimately defines a probability distribution of sequence alignments of size 16 x 200.

Language features

The LPhy language features are described at https://linguaphylo.github.io/features/.

ANTLR parse tree

The parse tree to show how the above lphy script to be parsed by ANTLR grammar:

Tree generative distributions

More details on the available tree generative distributions can be found here:

Models of evolutionary rates and sequence evolution

You can read more details about the PhyloCTMC generative distribution and how to specify substitution models, site rates and branch rates here:

PhyloCTMC generative distribution

LinguaPhylo Studio

Along with the language definition, we also provide software to specify and visualise models as well as simulate data from models defined in LPhy.

This software will also provide the ability for models specified in the LPhy language to be applied to data using standard inference tools such as MrBayes, RevBayes, BEAST1 and BEAST2. This will require software that can convert an LPhy specification into an input file that these inference engines understand. The first such software converter is LPhyBEAST described below.

LPhyBEAST (pronounced el-fee-beast)

LPhyBEAST is a command-line program that takes an LPhy model specification, and a data block and produces a BEAST 2 XML input file. It therefore enables LPHY as an alternative way to succinctly express and communicate BEAST2 analyses.

The source can be found here: https://github.com/LinguaPhylo/LPhyBeast

Homepage and tutorials

https://linguaphylo.github.io/

Developer note

License

This software is licensed under the GNU Lesser General Public License v3.0

The toolbar icon art is licensed under the Oracle Software Icon License

Also see https://www.oracle.com/a/tech/docs/software-icon-license-943-2012.html

Name		Name	Last commit message	Last commit date
Latest commit History 2,415 Commits
.github/workflows		.github/workflows
IntelliJ/.idea		IntelliJ/.idea
bin		bin
docs		docs
examples		examples
figs		figs
lphy-base		lphy-base
lphy-lightweight/src/main/java/lphy		lphy-lightweight/src/main/java/lphy
lphy-studio		lphy-studio
lphy		lphy
manuscript		manuscript
tutorials		tutorials
.gitignore		.gitignore
DEV_NOTE.md		DEV_NOTE.md
DEV_NOTE1.md		DEV_NOTE1.md
DEV_NOTE2.md		DEV_NOTE2.md
DEV_NOTE3.md		DEV_NOTE3.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
GraphicalModelNode.png		GraphicalModelNode.png
LICENSE		LICENSE
README.md		README.md
jc-yule.png		jc-yule.png
language_specification.md		language_specification.md
lphystudio.png		lphystudio.png
parseTree.png		parseTree.png
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinguaPhylo: Communicating and reproducing probabilistic models for phylogenetic analysis

LinguaPhylo (LPhy for short - pronounced el-fee)

Language features

ANTLR parse tree

Tree generative distributions

Models of evolutionary rates and sequence evolution

LinguaPhylo Studio

LPhyBEAST (pronounced el-fee-beast)

Homepage and tutorials

Developer note

License

About

Releases 20

Packages

Contributors 11

Languages

License

LinguaPhylo/linguaPhylo

Folders and files

Latest commit

History

Repository files navigation

LinguaPhylo: Communicating and reproducing probabilistic models for phylogenetic analysis

LinguaPhylo (LPhy for short - pronounced el-fee)

Language features

ANTLR parse tree

Tree generative distributions

Models of evolutionary rates and sequence evolution

LinguaPhylo Studio

LPhyBEAST (pronounced el-fee-beast)

Homepage and tutorials

Developer note

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 20

Packages 0

Contributors 11

Languages

Packages