Interactive Visualization Browser for Transposable Elements (TEs) community
Current Version: rb_v5.0
Last update: 2023.03.11
Maintained by Wang Lab at Washington University School of Medicine
For any question, please contact Dr. Daofeng Li dli23@wustl.edu 👈
For easy and reproducible usage, we have compacted the data processing pipeline into Docker and Singularity images. We highly recommend using Singularity image for non-root users.
The singularity image has to be run with Singularity version3+, you could follow this instruction if you haven`t install Singularity3.
Please click here
(You will need sudo permission to properlly install and configure it, but you can run it without sudo after installation:smiley:)
Step1 Download the singularity image and reference files (you only need download them ONCE, then you can use them directly), if there is any update, you may need to download a new image, but reference files are usually NOT changed:
- Download the singularity image:
wget https://wangftp.wustl.edu/~scheng/repeat_browser/rb_v5.0.simg
- Download the reference files of different genome:
wget https://wangftp.wustl.edu/~scheng/repeat_browser/Genome/hg38.tar.gz
You can also find more genome builds: click here . Currently we have: mm9/10/39, hg19/38, danRer10/11, rn6 and dm6.
- Decompress the reference files and put to your own folder:
tar -zxvf hg38.tar.gz
Step2 Process data by the singularity image:
cd /home/example
first. The location of image and reference files is up to you.
singularity run -B ./:/home -B <path-to-parent-folder-of-ref-file>:/zarr_generation/Genome <path-to-downloaded-image> \
-d <fastq/BAM> -g <hg38/mm10 etc. > -r <PE/SE> \
--length <read length of fastq file> \
--assay <DNA-seq/CAGE-seq/Chip-seq> \
-o <experimental_read_file1/BAM_file> -O <experimental read file2> \
-i <IgG_control read file1/BAM file> -I <IgG_control read file2> \
--local (use this option when you want to generate the .zarr file locally) \
--s3_path <s3 path> (use this option when you want to upload the .zarr file to Amazon S3 bucket)
For example, if
a) you download the image on /home/image/rb_v5.0.simg
b) the reference file on /home/src/hg38
c) and your data type is Chip-seq FASTQ data with the length as 50 bp
d) and the experiment data is read1.fastq.gz and read2.fastq.gz on folder /home/data
e) and input data is igg_1.fastq.gz and igg_2.fastq.gz on folder /home/data
f) and you want to generate the .zarr file locally
Then you need to:
cd /home/data
singularity run -B ./:/home -B /home/src:/zarr_generation/Genome /home/image/rb_v5.0.simg -d fastq -g mm10 -r PE --length 50 --assay Chip-seq -o read1.fastq.gz -O read2.fastq.gz -i igg_1.fastq.gz -I igg_2.fastq.gz --local
1. For CAGE-seq data, you can ONLY use FASTQ file as input to generate .zarr file. For Chip-seq and DNA-seq (ATAC-seq, DNase-seq etc. ) data, we support both FASTQ and BAM files as input file.
2. For Chip-seq data, you have to have the same format of input files of Chip-seq data (either all FASTQ files or all BAM files)
3. Please notice you could only choose one between --s3_path
and --local
, and there will be a local back-up .Zarr file even you have selected --s3_path
option
-h
: help information
-d
: input file format. fastq for FASTQ file, BAM for BAM file
-g
: genome reference. For now the supported genoms are: <mm39/mm10/mm9/hg38/hg19/danRer11/danRer10/rn6/dm6>
-r
: SE for single-end, PE for paired-end
--length
: length of raw FASTQ file, mapped read length of BAM file
--assay
: DNA-seq for DNA-seq (ATAC-seq, DNase-seq etc.), CAGE-seq, Chip-seq
-o
: experiment fastq/BAM file 1 or the SE fastq/BAM file, must be ended by .fastq or .fastq.gz or .bam
-O
: experiment fastq/BAM file 2 if input PE data, must be ended by .fastq or .fastq.gz or .bam
-i
: input control fastq/BAM file 1 or the SE fastq/BAM file (only be used when assay type is Chip-seq), must be ended by .fastq or .fastq.gz or .bam
-I
: input control fastq/BAM file 2 (only be used when assay type is Chip-seq)
--local
: create the .zarr file locally (Please notice you could only choose one between --s3_path and --local)
--s3_path
: s3 path for AWS S3 bucket
--id
: (optional) ID information (default: unknown)
--biosample
: (optional) Biosample information, same as ENCODE website (default: unknown)
--tissue
: (optional) Tissue information, same as ENCODE website (default: unknown)
--experiment
: (optional) Experiment ID information (default: unknown)
Step3 Upload the generated .zarr file to WashU Repeat Browser [sample image here]
There is one Single-end hg38 Chip-seq data for test purpose, they can be downloaded by:
wget https://wangftp.wustl.edu/~scheng/repeat_browser/sample_data/chip-seq/hg38_chipseq_signal_SE_50.fastq.gz
wget https://wangftp.wustl.edu/~scheng/repeat_browser/sample_data/chip-seq/hg38_chipseq_input_SE_50.fastq.gz
We also provide well-compacted Docker image as alternative choice, which can be downloaded by Docker:
docker pull sycheng99/repeatbrowser:5.0