Skip to content

Suite of programs for initial analysis and QC of RNA-seq data

Notifications You must be signed in to change notification settings

edsgard/rrnaseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

INTRODUCTION

rrnaseq provides a suite of programs to generate basic plots as well as QC-filtering of RNA-seq data. The programs are written in R and are executable from the command-line. It also provides a script that can run the whole suite of programs, called rqc. All programs can be found in the 'bin' sub-directory.

Currently the starting input is a tab-separated file with RPKM values and raw read counts output by rpkmforgenes.py. Most programs also require a file with meta-information about the samples, which can be generated by running 'make_summary_starlog.sh', see the "HOW TO RUN" section below.

INSTALLATION

  1. The latest stable release can be found here.

  2. Install R dependencies with install.packages or via biocLite. In R:

     pkgs = c('DESeq2', 'genefilter', 'statmod', 'gplots',
     'RColorBrewer', 'impute', 'moduleColor', 'graphics', 'getopt')
     source('http://www.bioconductor.org/biocLite.R')
     biocLite(pkgs)
    
  3. Add the directory with binaries to your shell path (to for example .profile on OS X or .bashrc in Linux):

export PATH="/home/user/prg/rrnaseq/bin:$PATH"

HOW TO RUN

Below you find an example of how to generate a script of rrnaseq commands. If you've set your directory names under "#IN" correctly, it should all work. The program 'make_summary_starlog.sh' generates a matrix with sample annotation, one row per sample, based on read alignment metrics output by STAR. The program 'get_expr' assumes a format exactly as that generated by 'rpkmforgenes.py', as to generate two data matrices with expression values, one with RPKM values and one with raw read counts. All other programs use the sample meta-information matrix and the expression matrices output by those two programs.

#Define input and output dirs and files
#IN
projectdir='/path/to/your/PROJECT'
stardir=${projectdir}'/star_hg19'
rpkmforgenes_file=${projectdir}/rpkmforgenes_star_hg19/refseq_rpkms.txt

#OUT
datadir=${projectdir}/'rqc/refseq/data'
sample_meta_file=${datadir}/'mapstats.tab'
pdfdir=${projectdir}/'rqc/refseq/pdf'
brenneckedir=${projectdir}'/rqc/diffexp/brennecke'

#Create and change dir
mkdir -p $datadir
cd $datadir

#Get mapping statistics from STAR logs
make_summary_starlog.pl ${stardir} >$sample_meta_file

#Dry-run the program 'rqc' to generate a shell script with possible commands to execute
rqc -m $sample_meta_file -e $rpkmforgenes_file -d $datadir -p $pdfdir -b $brenneckedir -y

#Executable commands in the shell script generated by rqc
cat rqc.sh
Further examples

Above, the program 'rqc' was dry-run to generate a shell script (rqc.sh) with possible commands to execute. Look in rqc.sh and change or add input arguments as you wish.

You can also see test/rqc.sh for a complete list of available programs and example program calls, but there the directories are set according to the test directory.

TEST AND EXAMPLE OUTPUT

Example output you find in the 'test/rqc' subdirectory. The file 'run.rqc.sh' in the 'test' subdirectory provides an example of how to run the script 'rqc' that with the dry-run flag will generate a file (rqc.sh) with commands that calls all of the available programs in the rrnaseq suite. See 'run.rqc.sh' and the generated 'rqc.sh' file for a test example:

cd test
sh run.rqc.sh 
cat rqc.sh

QC-filter

To filter genes use the program 'gene_filter'. To filter samples use the program 'sample_filter'. This program relies on an input file (default: qc.rds), which contains a data matrix with all samples as rows and different qc-metrics as columns. Elements in this qc-matrix is set to 1 if a sample failed QC for a particular QC-metric. The QC-metric columns of the qc-matrix is added when running the corresponding program, for example, if you want to add a QC-column relating to the number of expressed genes per sample, run the program 'sample2ngenes_expr'. To then apply the filter run 'sample_filter' with the column-name of that QC-metric as an argument. See test/rqc.sh for an example.

GETTING HELP

Each program have several input arguments that should be considered. For a list of all available arguments for a program use the -h flag, for example:

pca -h

About

Suite of programs for initial analysis and QC of RNA-seq data

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •