Please note:
Java version of HMMRATAC is no longer maintained or supported. Please use hmmratac
subcommand in macs3
: https://github.com/macs3-project/MACS
Quick Start
Assume that you have a BAM file from aligner such as bwa mem
named ATACseq.bam
.
-
Sort the BAM file to get a
ATACseq.sorted.bam
file:samtools sort ATACseq.bam -o ATACseq.sorted.bam
-
Make index from the BAM file to get a
ATACseq.sorted.bam.bai
file:samtools index ATACseq.sorted.bam ATACseq.sorted.bam.bai
-
Make genome information (chromosome sizes) from the BAM file to get a
genome.info
file:samtools view -H ATACseq.sorted.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > genome.info
-
Run HMMRATAC on the sorted BAM
ATACseq.sorted.bam
, the BAM index fileATACseq.sorted.bam.bai
, and the genome information filegenome.info
:java -jar HMMRATAC_V1.2.4_exe.jar -b ATACseq.sorted.bam -i ATACseq.sorted.bam.bai -g genome.info
-
Filter HMMRATAC output by the score, if desired. Score threshold will depend on dataset, score type and user preference. A threshold of 10 would be:
awk -v OFS="\t" '$13>=10 {print}' NAME_peaks.gappedPeak > NAME.filteredPeaks.gappedPeak
To filter the summit file by the same threshold:
awk -v OFS="\t" '$5>=10 {print}' NAME_summits.bed > NAME.filteredSummits.bed
NOTE: HMMRATAC will report all peaks that match the structure defined by the model, including weak peaks. Filtering by score can be used to retain stronger peaks. Lower score = higher sensitivity and lower precision, Higher score = lower sensitivity and higher precision.
Samtools can be downloaded here: http://www.htslib.org/download/
Be sure to run HMMRATAC using the executable file, found here: https://github.com/LiuLabUB/HMMRATAC/releases For details on HOW to run HMMRATAC, see HMMRATAC_Guide.md, which contains a thorough runthrough of all parameters, output files and input requirements and troubleshooting.
HMMRATAC requires paired-end data. Single-end data will not work. HMMRATAC is designed to process ATAC-seq data that hasn't undergone
any size selection, either physical or in silico. This should be standard practice for any ATAC-seq analysis. Size selected data can be
processed by HMMRATAC (see HMMRATAC_Guide.md on --trim
option).
If you use HMMRATAC in your research, please cite the following paper:
- Evan D. Tarbell and Tao Liu, HMMRATAC: a Hidden Markov ModeleR for ATAC-seq, Nucleic Acids Res. 2019 Jun 14. doi: 10.1093/nar/gkz533