Skip to content

TsailabBioinformatics/RNA-Seq

Repository files navigation

RNA-Seq analysis on sapelo2 at UGA

Tsai lab RNA-Seq script

Description

RNA-Seq runs transcriptomic analysis. The input of this pipeline is a csv file. So to begin, prepare a table with the schema like sample_table.csv. This script will then run the RNA-Seq pipeline on the samples from different species in the table.

Here's how the pipeline flows:

workflow

It will create folder structure like:

RNA-Seq/data
|
 ----- {species_id}
        |
         ----- fastq (raw fastq files)
         ----- reference (genome reference from Phytozome or other sources)
               |
                ----- species_genome.fasta -> genome.fa
                ----- species_annotation.gff3 -> gene.gtf
        ----- clean (filtered reads)
        ----- map (alignment maps)
        ----- count (quantification results for further analysis)
        ----- miscelaneous
              |
               ----- *.txt
               ----- log....
        ----- file.list
        ----- start.sh
        ----- map.sh
        ----- get_trim_sum.py

Instructions to Run

  1. clone this repo
git clone https://github.com/TsailabBioinformatics/RNA-Seq.git;cp RNA-Seq/* .
  1. create an input csv file (like sample_table.csv), name it input.csv, and add it to RNA-Seq/input/
  • check to make sure input.csv exists within the folder RNA-Seq/input
  1. change directories into RNA-Seq
  2. load modules python-utils and pandas with commands:
ml python-utils
ml pandas
  1. run the pipeline
python3 pipeline.py
  1. browse specific species' data by changing directories into a species folder:
cd data/{species_id}

TODO - implement visualization --> add instructions to run visualization

Releases

No releases published

Packages

No packages published