GitHub - edsgard/rrnaseq: Suite of programs for initial analysis and QC of RNA-seq data

INTRODUCTION

rrnaseq provides a suite of programs to generate basic plots as well as QC-filtering of RNA-seq data. The programs are written in R and are executable from the command-line. It also provides a script that can run the whole suite of programs, called rqc. All programs can be found in the 'bin' sub-directory.

Currently the starting input is a tab-separated file with RPKM values and raw read counts output by rpkmforgenes.py. Most programs also require a file with meta-information about the samples, which can be generated by running 'make_summary_starlog.sh', see the "HOW TO RUN" section below.

INSTALLATION

The latest stable release can be found here.

Install R dependencies with install.packages or via biocLite. In R:

 pkgs = c('DESeq2', 'genefilter', 'statmod', 'gplots',
 'RColorBrewer', 'impute', 'moduleColor', 'graphics', 'getopt')
 source('http://www.bioconductor.org/biocLite.R')
 biocLite(pkgs)

Add the directory with binaries to your shell path (to for example .profile on OS X or .bashrc in Linux):

export PATH="/home/user/prg/rrnaseq/bin:$PATH"

HOW TO RUN

Below you find an example of how to generate a script of rrnaseq commands. If you've set your directory names under "#IN" correctly, it should all work. The program 'make_summary_starlog.sh' generates a matrix with sample annotation, one row per sample, based on read alignment metrics output by STAR. The program 'get_expr' assumes a format exactly as that generated by 'rpkmforgenes.py', as to generate two data matrices with expression values, one with RPKM values and one with raw read counts. All other programs use the sample meta-information matrix and the expression matrices output by those two programs.

#Define input and output dirs and files
#IN
projectdir='/path/to/your/PROJECT'
stardir=${projectdir}'/star_hg19'
rpkmforgenes_file=${projectdir}/rpkmforgenes_star_hg19/refseq_rpkms.txt

#OUT
datadir=${projectdir}/'rqc/refseq/data'
sample_meta_file=${datadir}/'mapstats.tab'
pdfdir=${projectdir}/'rqc/refseq/pdf'
brenneckedir=${projectdir}'/rqc/diffexp/brennecke'

#Create and change dir
mkdir -p $datadir
cd $datadir

#Get mapping statistics from STAR logs
make_summary_starlog.pl ${stardir} >$sample_meta_file

#Dry-run the program 'rqc' to generate a shell script with possible commands to execute
rqc -m $sample_meta_file -e $rpkmforgenes_file -d $datadir -p $pdfdir -b $brenneckedir -y

#Executable commands in the shell script generated by rqc
cat rqc.sh

Further examples

Above, the program 'rqc' was dry-run to generate a shell script (rqc.sh) with possible commands to execute. Look in rqc.sh and change or add input arguments as you wish.

You can also see test/rqc.sh for a complete list of available programs and example program calls, but there the directories are set according to the test directory.

TEST AND EXAMPLE OUTPUT

Example output you find in the 'test/rqc' subdirectory. The file 'run.rqc.sh' in the 'test' subdirectory provides an example of how to run the script 'rqc' that with the dry-run flag will generate a file (rqc.sh) with commands that calls all of the available programs in the rrnaseq suite. See 'run.rqc.sh' and the generated 'rqc.sh' file for a test example:

cd test
sh run.rqc.sh 
cat rqc.sh

QC-filter

To filter genes use the program 'gene_filter'. To filter samples use the program 'sample_filter'. This program relies on an input file (default: qc.rds), which contains a data matrix with all samples as rows and different qc-metrics as columns. Elements in this qc-matrix is set to 1 if a sample failed QC for a particular QC-metric. The QC-metric columns of the qc-matrix is added when running the corresponding program, for example, if you want to add a QC-column relating to the number of expressed genes per sample, run the program 'sample2ngenes_expr'. To then apply the filter run 'sample_filter' with the column-name of that QC-metric as an argument. See test/rqc.sh for an example.

GETTING HELP

Each program have several input arguments that should be considered. For a list of all available arguments for a program use the -h flag, for example:

pca -h

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
annot		annot
bin		bin
doc		doc
lib		lib
test		test
.gitignore		.gitignore
README.html		README.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INTRODUCTION

INSTALLATION

HOW TO RUN

Further examples

TEST AND EXAMPLE OUTPUT

QC-filter

GETTING HELP

About

Releases

Packages

Contributors 4

Languages

edsgard/rrnaseq

Folders and files

Latest commit

History

Repository files navigation

INTRODUCTION

INSTALLATION

HOW TO RUN

Further examples

TEST AND EXAMPLE OUTPUT

QC-filter

GETTING HELP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages