Skip to content

Implementation of algorithms in the RNNI space, e.g. FINDPATH

License

Notifications You must be signed in to change notification settings

bioDS/treeOclock

Repository files navigation

Computing nearest neighbour interchange distances between ranked phylogenetic trees - Code Repository

This repository contains the code for computing the Ranked Nearest Neighbour Interchange (RNNI) distance using the algorithm FindPath from the paper Computing nearest neighbour interchange distances between ranked phylogenetic trees by Lena Collienne and Alex Gavryushkin.

An R package for computing the RNNI distance can be found here.

Rooted trees with branch lengths (given in newick format) can be read through a Python interface and converted to ranked trees. Then distances and shortest paths between those ranked trees in the RNNI space can be computed.

Compile

Download repository and compile:

git clone https://github.com/bioDS/treeOclock.git
cd treeOclock
make

Functions executable from Python

Function Return value
Reading trees
read_nexus(filename) Tree_Array containing all trees from nexus file
read_newick(newick_string) Tree given by newick_string
Writing Trees
tree_to_cluster_string(tree) string of cluster representation of input Tree
RNNI
rnni_distance(tree1, tree2) RNNI distance between Trees tree1 and tree2
findpath(tree1, tree2) Tree_Array containing all trees on shortest path from Tree tree1 to tree2 computed by FindPath

Example

from tree_parser.tree_io import *
from tree_functions import *

tree1 = read_newick("((1:1,2:1):2,(3:2,4:2):1);")
tree2 = read_newick("(((3:1,4:1):1,2:2):1,1:3);")

print(rnni_distance(tree1, tree2))

path = findpath(tree1, tree2)
for i in range(0, path.num_trees):
    print(tree_to_cluster_string(path.trees[i]))

Cluster Representation

A cluster of a tree is a set of leaves descending from an internal node. A ranked tree can be represented as a list of all its clusters, ordered according to ranks. When using tree_to_cluster_string, the leaf labels in the returned cluster representation are integers from 1 to n. These integers are assigned as leaf labels according to the lexicographic order of leaf labels in the input newick string. Additionally, ranks of internal nodes are added to each cluster.

For example, (((A:1,B:1):2,(C:2,D:2):1):1,E:4); in cluster representation is [{1,2}:1,{3,4}:2,{1,2,3,4}:3,{1,2,3,4,5}:4].

Data Structures

We use our own data structure for ranked trees. The following is a summary of all structures defined in tree.h and rnni.h.

Struct Members Description
struct Node long parent parent (-1 if node root)
long children[2] two children (-1 if node leaf)
long time time of this node (=rank for ranked trees); leaves have time 0
struct Tree Node* node_array array containing 2 * num_leaves - 1 nodes;
first num_leaves nodes are leaves, last num_leaves - 1 nodes are internal nodes
long num_leaves number of leaves
struct Tree_Array Tree* trees array of trees
long num_trees number of trees
struct Path long** moves encoding RNNI moves in a matrix where each moves[i] is one move;
moves[i][0]: rank of lower node of interval on which move is performed
moves[i][1]: 0 -> rank move, 1 -> NNI move where children[0] moves up, 2-> NNI move where children[1] moves up
long length number of moves

Most important C functions

This is a list of the (probably) most important functions in C code.

Function Return value
tree.c
void print_tree(Tree* tree) prints parent, children, and time for every node in tree.node_array
int same_tree(Tree* tree1, Tree* tree2) returns 1 if tree1 and tree2 are isomorphic
rnni.c
Tree_Array rnni_neighbourhood(Tree* tree) returns Tree_Array containing all RNNI neighbours of tree
void uniform_neighbour(Tree* tree) performs RNNI move on tree, uniformly chosen from all possible moves
long rnni_distance(Tree* start_tree, Tree* dest_tree) returns RNNI distance between start_tree and dest_tree
Path findpath_moves(Tree* start_tree, Tree* dest_tree) returns FindPath path in matrix encoding (Path) -- preserves running time O(n^2) while saving all moves
Tree_Array findpath(Tree* start_tree, Tree* dest_tree) returns Tree_Array of all trees on FindPath path -- running time in O(n^3)
exploring_rnni.c
long random_walk(Tree* tree, long k) Performs k RNNI moves (uniformly chosen among all possible ones in each step) and returns RNNI distance between initial tree and tree after k moves

About

Implementation of algorithms in the RNNI space, e.g. FINDPATH

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published