Skip to content

Data and analysis scripts to accompany BEAST phylonco

Notifications You must be signed in to change notification settings

bioDS/beast-phylonco-paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BEAST phylonco paper

This repository contains data and analysis scripts that accompany the Beast phylonco paper below.

Paper: Chen, K., Moravec, J. C., Gavryushkin, A., Welch, D., & Drummond, A. J. (2022). Accounting for errors in data improves divergence time estimates in single-cell cancer evolution. Molecular biology and evolution, 39(8), msac143.

Beast phylonco software: A BEAST2 package for single-cell phylogenetic analysis of cancer evolution.

Software requirements

Java 8 and BEAST v2.5

We provide a bundled jar version of BEAST2.5 with Phylonco in beast-phylonco.jar, see analysis section.

Python 3 and packages:

DendroPy~=4.5.2
lxml~=4.8.0
matplotlib~=3.4.3
numpy~=1.21.2
seaborn~=0.11.2

R language, tracerR and packages:

ape
expm
ggtree
ggplot2
tools
treeio
TreeSimGM

(Optional) Simulating new GT16 datasets additionally requires Java 16, LPhy and LPhyBeast and Phylonco-LPhyBeast.

See LPhy setup instructions here.

Datasets

Simulated datasets:

Simulated datasets and parameters are in the directories sim1/data to sim7/data.

True simulation parameters are stored in the files *_true.csv or *_true.log and true trees are stored in the files *._true.trees.

Beast analysis XML files are in sub-directories sim1/data/*.xml to sim7/data/*.xml for each dataset.

Real datasets:

Real datasets are available in FASTA format (with GT16 encoding) in E15/data and L86/data.

Beast analysis XML files are in E15/data/*.xml and L86/data/*.xml

Simulating new datasets

Binary datasets:

Go to the sim1/scripts sub-directory

Run simulate_binary.sh

Run python3 binary_xml_transformer.py

GT16 datasets:

Setup instructions: https://github.com/bioDS/beast-phylonco/releases/tag/v0.0.6 and https://linguaphylo.github.io/setup/

Run LPhyBeast with arguments -l <chain length> -r <num repeats> <path to lphy script>

  • chain length: length of the mcmc chain

  • num repeats: number of experimental repeats, e.g. -r 10 for 10 repeats

  • path to lphy script: lphy scripts are in sim3/scripts/*.lphy and sim7/scripts/*.lphy

Example command:

$BEAST_DIR/bin/lphybeast -l 10000000 -r 10 sim7/scripts/gt16_delta_0.lphy

Running the analysis

Running BEAST2:

We provide a bundled jar version of BEAST2 with Phylonco and related packages. This does not require a separate BEAST2 install.

To run the analysis, use java -jar beast-phylonco.jar <path to xml>.

Substitute <path to xml> with the file path to the Beast XML file.

Example command:

java -jar beast-phylonco.jar sim1/data/binary_yule_n30_L400_0.xml

Post-processing:

Beast log stats: from R run mcmc_stats.r (edit "mcmc_path" to point to your beast logs directory).

Beast log viewer: logs can be viewed using Tracer.

Beast tree stats: trees can be summarized using TreeAnnotator that is bundled with Beast software.

Beast tree viewer: trees can be viewed using Figtree or any compatible beast tree visualization software.

Visualizing output

Beast logs:

Beast logs for sim1 to sim6 are available in the sim1/beast to sim6/beast sub-directories on github

Beast logs for sim7, E15 and L86 are available on Google Drive https://drive.google.com/drive/folders/1vQ6xvs3qq4vJtiI7aDjqBP8xPF__VXAH?usp=sharing

Unzip the downloaded beast logs archive .zip inside the dataset directory (e.g., E15 or L86)

Generating figures:

Coverage plots: run python3 plot_coverage.py from the scripts sub-directory.

Tree statistics plots: run python3 plot_tree_stats.py from the scripts sub-directory.

Summary tree plots: run plot_tree_*.py from the scripts sub-directory.

Extra supplementary plots: run python3 plot_*.py from the scripts sub-directory.

Citations

Software and models:

Datasets:

About

Data and analysis scripts to accompany BEAST phylonco

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published