Tsai lab RNA-Seq script
RNA-Seq runs transcriptomic analysis. The input of this pipeline is a csv file. So to begin, prepare a table with the schema like sample_table.csv
. This script will then run the RNA-Seq pipeline on the samples from different species in the table.
Here's how the pipeline flows:
It will create folder structure like:
RNA-Seq/data
|
----- {species_id}
|
----- fastq (raw fastq files)
----- reference (genome reference from Phytozome or other sources)
|
----- species_genome.fasta -> genome.fa
----- species_annotation.gff3 -> gene.gtf
----- clean (filtered reads)
----- map (alignment maps)
----- count (quantification results for further analysis)
----- miscelaneous
|
----- *.txt
----- log....
----- file.list
----- start.sh
----- map.sh
----- get_trim_sum.py
- clone this repo
git clone https://github.com/TsailabBioinformatics/RNA-Seq.git;cp RNA-Seq/* .
- if cloning returned
fatal: Authentication failed
, then try again using a github personal access token. instructions to do so can be found here: https://docs.github.com/en/enterprise-server@3.1/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
- create an input csv file (like
sample_table.csv
), name itinput.csv
, and add it toRNA-Seq/input/
- check to make sure
input.csv
exists within the folderRNA-Seq/input
- change directories into
RNA-Seq
- load modules
python-utils
andpandas
with commands:
ml python-utils
ml pandas
- run the pipeline
python3 pipeline.py
- browse specific species' data by changing directories into a species folder:
cd data/{species_id}
TODO - implement visualization --> add instructions to run visualization