Skip to content

Commit

Permalink
Version 1.4 release. This release includes major improvements to RNA …
Browse files Browse the repository at this point in the history
…signal processing (via resquiggle command), a major re-write of the Tombo python API, and use of the canonical model as a prior for control sample comparison modified base detection. The RNA updates fix a number of github issues (Thank you for all user reports! fixes #98, fixes #82, fixes #79, fixes #68, fixes #64, fixes #55). The API re-write addresses several github issues (fixes #99, fixes #76; fixes #66, fixes #25). A change in the statistics file format has been resolved (fixes #80). Several other issues resolved as well (fixes #96, fixes #91, fixes #73, fixes #70).
  • Loading branch information
marcus1487 committed Aug 1, 2018
1 parent 32f590b commit 7c768b7
Show file tree
Hide file tree
Showing 52 changed files with 7,450 additions and 5,275 deletions.
210 changes: 19 additions & 191 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,9 @@ Basic tombo installation (python 2.7 and 3.4+ support)
Quick Start
===========

Call 5mC and 6mA sites from raw nanopore read files. Then output genome browser `wiggle format file <https://genome.ucsc.edu/goldenpath/help/wiggle.html>`_ for 5mA calls and plot raw signal around most significant 6mA sites.
Re-squiggle raw nanopore read files and call 5mC and 6mA sites.

Then, for 5mA calls, output genome browser `wiggle format file <https://genome.ucsc.edu/goldenpath/help/wiggle.html>`_ and, for 6mA calls, plot raw signal around most significant locations.

::

Expand All @@ -47,22 +49,22 @@ Call 5mC and 6mA sites from raw nanopore read files. Then output genome browser
--fastq-filenames basecalls1.fastq basecalls2.fastq \
--sequencing-summary-filenames seq_summary1.txt seq_summary2.txt \
--processes 4
tombo resquiggle path/to/fast5s/ genome.fasta --processes 4

tombo resquiggle path/to/fast5s/ genome.fasta --processes 4 --num-most-common-errors 5
tombo detect_modifications alternative_model --fast5-basedirs path/to/fast5s/ \
--statistics-file-basename sample.alt_modified_base_detection \
--per-read-statistics-basename sample.alt_modified_base_detection \
--alternate-bases 5mC 6mA --processes 4

# produces "estimated fraction of modified reads" genome browser files
# for 5mC testing
tombo text_output browser_files --statistics-filename sample.alt_modified_base_detection.5mC.tombo.stats \
--file-types dampened_fraction --browser-file-basename sample.alt_modified_base_detection.5mC
# and 6mA testing (along with coverage bedgraphs)
tombo text_output browser_files --statistics-filename sample.alt_modified_base_detection.6mA.tombo.stats \
--fast5-basedirs path/to/fast5s/ --file-types dampened_fraction coverage\
--fast5-basedirs path/to/fast5s/ --file-types dampened_fraction coverage \
--browser-file-basename sample.alt_modified_base_detection.6mA

# plot raw signal at most significant 6mA locations
tombo plot most_significant --fast5-basedirs path/to/fast5s/ \
--statistics-filename sample.alt_modified_base_detection.6mA.tombo.stats \
Expand All @@ -73,19 +75,23 @@ Detect any deviations from expected signal levels for canonical bases to investi

::

tombo resquiggle path/to/fast5s/ genome.fasta --processes 4
tombo resquiggle path/to/fast5s/ genome.fasta --processes 4 --num-most-common-errors 5
tombo detect_modifications de_novo --fast5-basedirs path/to/fast5s/ \
--statistics-file-basename sample.de_novo_modified_base_detection \
--per-read-statistics-basename sample.de_novo_modified_base_detection \
--processes 4

# produces "estimated fraction of modified reads" genome browser files from de novo testing
tombo text_output browser_files --statistics-filename sample.de_novo_modified_base_detection.tombo.stats \
--browser-file-basename sample.de_novo_modified_base_detection --file-types dampened_fraction

..
All of these commands work for RNA data as well, but a transcriptome reference sequence must be provided for spliced transcripts.
===
RNA
===

All Tombo commands work for RNA data as well, but a transcriptome reference sequence must be provided for spliced transcripts.

The reasons for this decision and other tips for processing RNA data within the Tombo framework can be found in the `RNA section <https://nanoporetech.github.io/tombo/rna.html>`_ of the detailed Tombo documentation.

=====================
Further Documentation
Expand All @@ -95,182 +101,6 @@ Run ``tombo -h`` to see all Tombo command groups and run ``tombo [command-group]

Detailed documentation for all Tombo commands and algorithms can be found at https://nanoporetech.github.io/tombo/

==============
Tombo Commands
==============

Re-squiggle (Raw Data to Genome Alignment)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``resquiggle`` algorithm is the central point for the Tombo tookit. For each nanopore read, this command takes basecalled sequence and the raw nanopore signal values. The basecalled sequence is mapped to a genomic or transcriptomic reference. The raw nanopore signal is assigned to the mapped genomic or transcriptomic sequence based on expected signal levels from an included canonical base model. This anchors each raw signal observation from a read to a genomic position. This information is then leveraged to gain information about the potential location of modified nucleotides either within a single read or across a group of reads from a sample of interest.

::

tombo resquiggle path/to/fast5s/ reference.fasta --processes 4

..
- Only R9.4 and R9.5 data is supported at this time (including R9.*.1).
- DNA or RNA sample type is automatically detected from FAST5s (set explicitly with ``--dna`` or ``--rna``).
- FAST5 files need not contain ``Events`` data, but must contain ``Fastq`` slot containing basecalls. See ``preprocess annotate_raw_with_fastqs`` for pre-processing of raw FAST5s with basecalled reads.
- The reference sequence file can be a genome/transcriptome FASTA file or a minimap2 index file.
- The ``resquiggle`` command must be run before testing for modified bases.

Detect Modified Bases
^^^^^^^^^^^^^^^^^^^^^

There are three methods provided with Tombo to identify modified bases.

For more information on these methods see the `Tombo documentation here <https://nanoporetech.github.io/tombo/modified_base_detection.html>`_.

::

# Identify deviations from the canoncial expected signal levels that specifically match the
# expected levels from an alternative base e.g.5mC or 6mA (recommended method)
tombo detect_modifications alternative_model --fast5-basedirs path/to/native/dna/fast5s/ \
--alternate-bases 5mC 6mA --statistics-file-basename sample.alt_testing

# Identify any deviations from the canonical base model
tombo detect_modifications de_novo --fast5-basedirs path/to/native/dna/fast5s/ \
--statistics-file-basename sample.de_novo_testing --processes 4

# comparing to a control sample (e.g. PCR)
tombo detect_modifications sample_compare --fast5-basedirs path/to/native/dna/fast5s/ \
--control-fast5-basedirs path/to/amplified/dna/fast5s/ \
--statistics-file-basename sample.compare_testing

..
Must run ``resquiggle`` on reads before testing for modified bases.

All ``detect_modifications`` commands produce a binary Tombo statistics file. For use in text output or plotting region selection see ``text_output browser_files`` or ``plot most_significant`` Tombo commands.

Specify the ``--per-read-statistics-basename`` option to save per-read statistics for plotting or further processing (acces via the Tombo API).

Text Output
^^^^^^^^^^^

::

# output estimated fraction of reads modified at each genomic base and
# valid coverage (after failed reads, filters and testing threshold are applied) in wiggle format
tombo text_output browser_files --file-types dampened_fraction --statistics-filename sample.alt_testing.5mC.tombo.stats
# output read coverage depth (after failed reads and filters are applied) in bedgraph format
tombo text_output browser_files --file-types coverage --fast5-basedirs path/to/native/dna/fast5s/

..
For more text output commands see the `Tombo text output documentation here <https://nanoporetech.github.io/tombo/text_output.html>`_.

Raw Signal Plotting
^^^^^^^^^^^^^^^^^^^

::

# plot raw signal with standard model overlay at reions with maximal coverage
tombo plot max_coverage --fast5-basedirs path/to/native/rna/fast5s/ --plot-standard-model
# plot raw signal along with signal from a control (PCR) sample at locations with the AWC motif
tombo plot motif_centered --fast5-basedirs path/to/native/rna/fast5s/ \
--motif AWC --genome-fasta genome.fasta --control-fast5-basedirs path/to/amplified/dna/fast5s/
# plot raw signal at genome locations with the most significantly/consistently modified bases
tombo plot most_significant --fast5-basedirs path/to/native/rna/fast5s/ \
--statistics-filename sample.alt_testing.5mC.tombo.stats --plot-alternate-model 5mC
# plot per-read test statistics using the 6mA alternative model testing method
tombo plot per_read --per-read-statistics-filename sample.alt_testing.6mA.tombo.per_read_stats \
--genome-locations chromosome:1000 chromosome:2000:- --genome-fasta genome.fasta

..
For more plotting commands see the `Tombo plotting documentation here <https://nanoporetech.github.io/tombo/plotting.html>`_.

Read Filtering
^^^^^^^^^^^^^^

::

# filter reads to a specific genomic location
tombo filter genome_locations --fast5-basedirs path/to/native/rna/fast5s/ \
--include-regions chr1:0-10000000

# apply a more strigent raw signal matching threshold
tombo filter --fast5-basedirs path/to/native/rna/fast5s/ \
--signal-matching-score 1.0

..
For more read filtering commands see the `Tombo filter documentation here <https://nanoporetech.github.io/tombo/filtering.html>`_.

Hint: Save a set of filters for later use by copying the Tombo index file: ``cp path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index save.native.tombo.index``. To re-set to a set of saved filters after applying further filters simply replace the index file: ``cp save.native.tombo.index path/to/native/rna/.fast5s.RawGenomeCorrected_000.tombo.index``.

====================
Note on Tombo Models
====================

Tombo is currently provided with two canonical models (for DNA and RNA data) and three alternative models (DNA::5mC, DNA::6mA and RNA::5mC).

These models are used by default in the re-squiggle and modified base detection commands. The correct canonical model is automatically selected for DNA or RNA based on the contents of each FAST5 file and processed accordingly.

Additional models will be added in future releases.

=========================
Installation Requirements
=========================

python Requirements (handled by conda or pip):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- numpy
- scipy
- h5py
- cython
- mappy>=2.10
- tqdm

Optional packages (handled by conda, but not pip):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- Plotting Packages (R and rpy2 must be linked during installation)

+ R
+ rpy2
+ ggplot2
+ gridExtra (required for ``plot_motif_with_stats`` and ``plot_kmer`` subcommands)

- On-disk Random Fasta Access

+ pyfaidx

Advanced Installation Instructions
----------------------------------

Minimal tombo installation without optional dependencies (enables re-squiggle, all modified base testing methods and text output)

::

pip install ont-tombo

Install current github version of tombo

::

pip install git+https://github.com/nanoporetech/tombo.git

Download and install github version of tombo

::

git clone https://github.com/nanoporetech/tombo.git
cd tombo
pip install -e .

# to update, run:
git pull
pip install -I --no-deps -e .

========
Citation
========
Expand All @@ -283,15 +113,13 @@ http://biorxiv.org/content/early/2017/04/10/094672
Known Issues
============

- When running the ``detect_modifications`` commands on large genomes, the computational memory usage can become very high. It is currently recommended to processes smaller regions using the ``tombo filter genome_locations`` command (with saved Tombo index hint above). This problem is being addressed and will be resolved in a later release.

- The Tombo conda environment (especially with python 2.7) may have installation issues.

+ Tombo works best in python 3.4+, so many problems can be solved by upgrading python.
+ If installed using conda:

- Ensure the most recent version of conda is installed (``conda update -n root conda``).
- It is recommended to set conda channels as described for `bioconda <https://bioconda.github.io>`_.
- It is recommended to set conda channels as described for `bioconda <https://bioconda.github.io/#set-up-channels>`_.
- Run ``conda update --all``.
+ In python 2.7 there is an issue with the conda scipy.stats package. Down-grading to version 0.17 fixes this issue.
+ In python 2.7 there is an issue with the conda h5py package. Down-grading to version <=2.7.0 fixes this issue.
Binary file modified docs/_images/adaptive_forward_pass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/adaptive_half_z_scores.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/alt_model_comp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/begin_forward_pass.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/begin_half_z_scores.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/model_comp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/per_read_do_novo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/roc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/sample_comp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_images/testing_method_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 23 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,29 @@

# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode', 'sphinx.ext.intersphinx',
'sphinx.ext.mathjax', 'sphinxarg.ext']
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode',
'sphinx.ext.intersphinx', 'sphinx.ext.mathjax', 'sphinxarg.ext',
'sphinx.ext.napoleon',]
mathjax_path = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"

# don't include class inheritence in docs: https://stackoverflow.com/questions/46279030/how-can-i-prevent-sphinx-from-listing-object-as-a-base-class
from sphinx.ext.autodoc import ClassDocumenter, _
add_line = ClassDocumenter.add_line
def add_line_no_bases(self, text, *args, **kwargs):
if text.strip().startswith('Bases: '):
return
add_line(self, text, *args, **kwargs)

add_directive_header = ClassDocumenter.add_directive_header
def add_directive_header_no_bases(self, *args, **kwargs):
self.add_line = add_line_no_bases.__get__(self)
result = add_directive_header(self, *args, **kwargs)
del self.add_line
return result

ClassDocumenter.add_directive_header = add_directive_header_no_bases


# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand All @@ -45,7 +64,8 @@
copyright = u'2017-18, Oxford Nanopore Technologies'

# Generate API documentation:
if subprocess.call(['sphinx-apidoc', '-o', './', "../{}".format(__pkg_name__)]) != 0:
if subprocess.call(['sphinx-apidoc', '--module-first', '--no-toc',
'-f', '-o', './', "../{}".format(__pkg_name__)]) != 0:
sys.stderr.write('Failed to generate API documentation!\n')

# The version info for the project you're documenting, acts as replacement for
Expand Down
Loading

0 comments on commit 7c768b7

Please sign in to comment.