Modena is a nanopore-based computational method for detecting a wide spectrum of epigenetic and epitranscriptomic modifications.
It uses an unsupervised learning approach, namely resampling of nanopore signals followed by the Kuiper test. Unlike other unsupervised tools, classification is performed by 1D clustering of scores into two groups.
Important
This version of Modena is v2 beta. To find the stable v1 version of Modena, visit the v1.0.0 git tag.
To install and use Modena, you need at least Python 3.10 and the Poetry package manager. Then run the following commands:
$ git clone https://github.com/sbidin/modena.git
$ cd modena
$ poetry install
$ poetry run python -m modena --help # See options.
$ poetry --directory path/to/modena/dir/ run python -m modena # Run outside modena dir.
Both datasets need to be supplied in blow5
or slow5
format, alongside their
f5c resquiggle
output tsv
files. If your dataset is in single/multi fast5
format, or pod5
format, you can apply conversions using one of the following
tools:
- single-
fast5
to multi-fast5
: ont_fast5_api - multi-
fast5
toblow5
/slow5
: slow5tools pod5
toblow5
/slow5
: blue-crab
To resquiggle your data with f5c
, install f5c and run the resquiggle command:
$ f5c resquiggle data.fastq data.blow5 > resquiggled.tsv
Both datasets (in this case a
and b
) need a blow5
or slow5
file and a
corresponding f5c
-resquiggled tsv
file.
$ poetry run python -m modena -a a.blow5 -ax a.tsv -b b.blow5 -bx b.tsv -o out.tsv
$ poetry run python -m modena --help # See here for more options.
Modena outputs a simple tsv
file with four columns:
- position,
int
, 1-based - coverage,
int
, a count of all reads that contributed to the signal - distance,
float
, a two-sample Kuiper-test-based measure (a distance sum) - label,
str
,"pos"
or"neg"
, separating positions into two clusters