Skip to content

Interactive Visualization Browser for Transposable Elements (TEs) community

Notifications You must be signed in to change notification settings

jamesc99/WashU_RepeatBrowser

Repository files navigation

WashU Repetitive Element Browser


Interactive Visualization Browser for Transposable Elements (TEs) community

Current Version: rb_v5.0 Last update: 2023.03.11

Maintained by Wang Lab at Washington University School of Medicine

For any question, please contact Dr. Daofeng Li dli23@wustl.edu 👈

WashU Repetitive Element Browser


Data Processing Pipeline:

For easy and reproducible usage, we have compacted the data processing pipeline into Docker and Singularity images. We highly recommend using Singularity image for non-root users.

Singularity3 Installation

The singularity image has to be run with Singularity version3+, you could follow this instruction if you haven`t install Singularity3.
Please click here
(You will need sudo permission to properlly install and configure it, but you can run it without sudo after installation:smiley:)

Run Singularity image

Step1 Download the singularity image and reference files (you only need download them ONCE, then you can use them directly), if there is any update, you may need to download a new image, but reference files are usually NOT changed:

  1. Download the singularity image:
wget https://wangftp.wustl.edu/~scheng/repeat_browser/rb_v5.0.simg
  1. Download the reference files of different genome:
wget https://wangftp.wustl.edu/~scheng/repeat_browser/Genome/hg38.tar.gz

You can also find more genome builds: click here . Currently we have: mm9/10/39, hg19/38, danRer10/11, rn6 and dm6.

  1. Decompress the reference files and put to your own folder:
tar -zxvf hg38.tar.gz

Step2 Process data by the singularity image:

‼️Please run the cmd on the same directory of your data, if your data is on /home/example, then you may need cd /home/example first. The location of image and reference files is up to you.

singularity run -B ./:/home -B <path-to-parent-folder-of-ref-file>:/zarr_generation/Genome <path-to-downloaded-image> \
-d <fastq/BAM> -g <hg38/mm10 etc. > -r <PE/SE> \ 
--length <read length of fastq file> \
--assay <DNA-seq/CAGE-seq/Chip-seq> \
-o <experimental_read_file1/BAM_file> -O <experimental read file2> \
-i <IgG_control read file1/BAM file> -I <IgG_control read file2> \
--local (use this option when you want to generate the .zarr file locally) \
--s3_path <s3 path> (use this option when you want to upload the .zarr file to Amazon S3 bucket)

For example, if
a) you download the image on /home/image/rb_v5.0.simg
b) the reference file on /home/src/hg38
c) and your data type is Chip-seq FASTQ data with the length as 50 bp
d) and the experiment data is read1.fastq.gz and read2.fastq.gz on folder /home/data
e) and input data is igg_1.fastq.gz and igg_2.fastq.gz on folder /home/data
f) and you want to generate the .zarr file locally

Then you need to:

  1. cd /home/data
  2. singularity run -B ./:/home -B /home/src:/zarr_generation/Genome /home/image/rb_v5.0.simg -d fastq -g mm10 -r PE --length 50 --assay Chip-seq -o read1.fastq.gz -O read2.fastq.gz -i igg_1.fastq.gz -I igg_2.fastq.gz --local

‼️Note:
1. For CAGE-seq data, you can ONLY use FASTQ file as input to generate .zarr file. For Chip-seq and DNA-seq (ATAC-seq, DNase-seq etc. ) data, we support both FASTQ and BAM files as input file.
2. For Chip-seq data, you have to have the same format of input files of Chip-seq data (either all FASTQ files or all BAM files)
3. Please notice you could only choose one between --s3_path and --local, and there will be a local back-up .Zarr file even you have selected --s3_path option

Parameters of processing pipeline:

-h: help information
-d: input file format. fastq for FASTQ file, BAM for BAM file
-g: genome reference. For now the supported genoms are: <mm39/mm10/mm9/hg38/hg19/danRer11/danRer10/rn6/dm6>
-r: SE for single-end, PE for paired-end
--length: length of raw FASTQ file, mapped read length of BAM file
--assay: DNA-seq for DNA-seq (ATAC-seq, DNase-seq etc.), CAGE-seq, Chip-seq
-o: experiment fastq/BAM file 1 or the SE fastq/BAM file, must be ended by .fastq or .fastq.gz or .bam
-O: experiment fastq/BAM file 2 if input PE data, must be ended by .fastq or .fastq.gz or .bam
-i: input control fastq/BAM file 1 or the SE fastq/BAM file (only be used when assay type is Chip-seq), must be ended by .fastq or .fastq.gz or .bam
-I: input control fastq/BAM file 2 (only be used when assay type is Chip-seq)
--local: create the .zarr file locally (Please notice you could only choose one between --s3_path and --local)
--s3_path: s3 path for AWS S3 bucket
--id: (optional) ID information (default: unknown)
--biosample: (optional) Biosample information, same as ENCODE website (default: unknown)
--tissue: (optional) Tissue information, same as ENCODE website (default: unknown)
--experiment: (optional) Experiment ID information (default: unknown)

Step3 Upload the generated .zarr file to WashU Repeat Browser [sample image here]

Test Chip-seq data

There is one Single-end hg38 Chip-seq data for test purpose, they can be downloaded by:

wget https://wangftp.wustl.edu/~scheng/repeat_browser/sample_data/chip-seq/hg38_chipseq_signal_SE_50.fastq.gz
wget https://wangftp.wustl.edu/~scheng/repeat_browser/sample_data/chip-seq/hg38_chipseq_input_SE_50.fastq.gz

Docker image

We also provide well-compacted Docker image as alternative choice, which can be downloaded by Docker:

docker pull sycheng99/repeatbrowser:5.0

About

Interactive Visualization Browser for Transposable Elements (TEs) community

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published