POLISHER UPDATE: P.E.P.P.E.R.

We have released a new polisher PEPPER that replaces MarginPolish-HELEN. If you have newer data Guppy >= 3.0.5 please use PEPPER instead of MarginPolish-HELEN. PEPPER is fully supported by our team.

H.E.L.E.N.

H.E.L.E.N. (Homopolymer Encoded Long-read Error-corrector for Nanopore)

HELEN is published in Nature Biotechnology:

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

Overview

HELEN uses a Recurrent-Neural-Network (RNN) based Multi-Task Learning (MTL) model that can predict a base and a run-length for each genomic position using the weights generated by MarginPolish.

© 2020 Kishwar Shafin, Trevor Pesout, Benedict Paten.
Computational Genomics Lab (CGL), University of California, Santa Cruz.

Why MarginPolish-HELEN ?

MarginPolish-HELEN outperforms other graph-based and Neural-Network based polishing pipelines.
Simple installation steps.
HELEN can use multiple GPUs at the same time.
Highly optimized pipeline that is faster than any other available polishing tool.
We have sequenced-assembled-polished 11 samples to ensure robustness, runtime-consistency and cost-efficiency.
We tested GPU usage on Amazon Web Services (AWS) and Google Cloud Platform (GCP) to ensure scalability.
Open source (MIT License).

Walkthrough

Installation

MarginPolish-HELEN is supported on Ubuntu 16.10/18.04 or any other Linux-based system. Â

Install prerequisites

Before you follow any of the methods, make sure you install all the dependencies:

sudo apt-get -y install git cmake make gcc g++ autoconf bzip2 lzma-dev zlib1g-dev \
libcurl4-openssl-dev libpthread-stubs0-dev libbz2-dev liblzma-dev libhdf5-dev \
python3-pip python3-virtualenv virtualenv

Method 1: Install MarginPolish-HELEN from GitHub

You can install from the GitHub repository:

git clone https://github.com/kishwarshafin/helen.git
cd helen
make install
. ./venv/bin/activate

helen --help
marginpolish --help

Each time you want to use it, activate the virtualenv:

. <path/to/helen/venv/bin/activate>

Method 2: Install using PyPi

Install prerequisites and the install MarginPolish-HELEN using pip:

python3 -m pip install helen --user

python3 -m helen.helen --help
python3 -m helen.marginpolish --help

Update the installed version:

python3 -m pip install update pip
python3 -m pip install helen --upgrade

You can also add module locations to path:

echo 'export PATH="$(python3 -m site --user-base)/bin":$PATH' >> ~/.bashrc
source ~/.bashrc

marginpolish --help
helen --help

Method 3: Use docker image

CPU based docker:

# SEE CONFIGURATION
docker run --rm -it --ipc=host kishwars/helen:latest helen --help
docker run --rm -it --ipc=host kishwars/helen:latest marginpolish --help

docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/helen:latest \
helen --help

GPU based docker:

sudo apt-get install -y nvidia-docker2
# SEE CONFIGURATION
nvidia-docker run -it --ipc=host kishwars/helen:latest helen torch_stat
nvidia-docker run -it --ipc=host kishwars/helen:latest helen --help
nvidia-docker run -it --ipc=host kishwars/helen:latest marginpolish --help

# RUN HELEN
nvidia-docker run -it --ipc=host --user=`id -u`:`id -g` --cpus="16" \
-v </directory/with/inputs_outputs>:/data kishwars/helen:latest \
helen --help

Usage

MarginPolish requires a draft assembly and a mapping of reads to the draft assembly. We commend using Shasta as the initial assembler and MiniMap2 for the mapping.

Step 1: Generate an initial assembly

Generate an assembly using one of the ONT assemblers:

Step 2: Create an alignment between reads and shasta assembly

We recommend using MiniMap2 to generate the mapping between the reads and the assembly. You don't have to follow these exact commands.

minimap2 -ax map-ont -t 32 shasta_assembly.fa reads.fq | samtools view -hb -F 0x904 > unsorted.bam;
samtools sort -@32 -o reads_2_assembly.0x904.bam unsorted.bam;
samtools index -@32 reads_2_assembly.0x904.bam

Step 3: Generate images using MarginPolish

Download Model

helen download_models \
--output_dir <path/to/mp_helen_models/>

Run MarginPolish

You can generate images using MarginPolish by running:

marginpolish reads_2_assembly.bam \
Assembly.fa \
</path/to/model_name.json> \
-t <number_of_threads> \
-o <path/to/marginpolish_images> \
-f

You can find the models by downloading them.

Step 4: Run HELEN

Next, run HELEN to polish using a RNN.

helen polish \
--image_dir </path/to/marginpolish_images/> \
--model_path </path/to/model.pkl> \
--batch_size 256 \
--num_workers 4 \
--threads <num_of_threads> \
--output_dir </path/to/output_dir> \
--output_prefix <output_filename.fa> \
--gpu_mode

If you are using CPUs then remove the --gpu_mode argument.

Help

Please open a github issue if you face any difficulties.

Acknowledgement

We are thankful to Segey Koren and Karen Miga for their help with CHM13 data and evaluation.

We downloaded our data from Telomere-to-telomere consortium to evaluate our pipeline against CHM13.

We acknowledge the work of the developers of these packages:

Fun Fact

The name "HELEN" is inspired from the A.I. created by Tony Stark in the Marvel Comics (Earth-616). HELEN was created to control the city Tony was building named "Troy" making the A.I. "HELEN of Troy".

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
Dockerfile		Dockerfile
docs		docs
helen		helen
img		img
masked_regions		masked_regions
.gitignore		.gitignore
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docker_test		docker_test
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POLISHER UPDATE: P.E.P.P.E.R.

H.E.L.E.N.

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

Overview

Why MarginPolish-HELEN ?

Walkthrough

Installation

Install prerequisites

Method 1: Install MarginPolish-HELEN from GitHub

Method 2: Install using PyPi

Method 3: Use docker image

CPU based docker:

GPU based docker:

Usage

Step 1: Generate an initial assembly

Step 2: Create an alignment between reads and shasta assembly

Step 3: Generate images using MarginPolish

Download Model

Run MarginPolish

Step 4: Run HELEN

Help

Acknowledgement

Fun Fact

About

Releases 1

Packages

Contributors 4

Languages

License

kishwarshafin/helen

Folders and files

Latest commit

History

Repository files navigation

POLISHER UPDATE: P.E.P.P.E.R.

H.E.L.E.N.

Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes

Overview

Why MarginPolish-HELEN ?

Walkthrough

Installation

Install prerequisites

Method 1: Install MarginPolish-HELEN from GitHub

Method 2: Install using PyPi

Method 3: Use docker image

CPU based docker:

GPU based docker:

Usage

Step 1: Generate an initial assembly

Step 2: Create an alignment between reads and shasta assembly

Step 3: Generate images using MarginPolish

Download Model

Run MarginPolish

Step 4: Run HELEN

Help

Acknowledgement

Fun Fact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Languages

Packages