🧮 CRISPRi-seq pipeline

scbirlab/nf-crispriseq is a Nextflow pipeline to count and annotate guide RNAs in demultiplexed FASTQ files, optionally with UMIs, and optionally modelling fitness changes.

Table of contents

Processing steps
Requirements
Quick start
Inputs
Outputs
Issues, problems, suggestions
Further help

Processing steps

Per genome:

If no guide RNAs are provided, generate all possible guide RNAs from the provided genome
For each input set of guide RNAs, map to the reference genome and GFF to get genomic feature annotations

Per FASTQ file:

Filter and trim reads to adapters using cutadapt. This ensures reads used downstream have the expected features and are trimmed so that the features are in predictable places.
(Optionally) Extract UMIs using umitools extract.
Find guide RNA matches using cutadapt.
Count UMIs (if using) and reads per guide RNA using umitools count_tab.
Plot histograms and correlations of UMI and read count distributions.

Optionally [work in progress]:

If the data are from a time-course, calculate fitness per guide RNA, per condition.

Other steps

Get FASTQ quality metrics with fastqc.
Compile the logs of processing steps into an HTML report with multiqc.

Requirements

Software

You need to have Nextflow and conda installed on your system.

First time using Nextflow?

If you're at the Crick or your shared cluster has it already installed, try:

module load Nextflow

Otherwise, if it's your first time using Nextflow on your system, you can install it using conda:

conda install -c bioconda nextflow

You may need to set the NXF_HOME environment variable. For example,

mkdir -p ~/.nextflow
export NXF_HOME=~/.nextflow

To make this a permanent change, you can do something like the following:

mkdir -p ~/.nextflow
echo "export NXF_HOME=~/.nextflow" >> ~/.bash_profile
source ~/.bash_profile

Quick start

Make a sample sheet (see below) and, optionally, a nextflow.config file in the directory where you want the pipeline to run. Then run Nextflow.

nextflow run scbirlab/nf-crispriseq

Each time you run the pipeline after the first time, Nextflow will use a locally-cached version which will not be automatically updated. If you want to ensure that you're running a version of the pipeline, use the -r <version> flag. For example,

nextflow run scbirlab/nf-crispriseq -r v0.0.1

A list of versions can be found by running nextflow info scbirlab/nf-crispriseq.

For help, use nextflow run scbirlab/nf-crispriseq --help.

The first time you run the pipeline on your system, the software dependencies in environment.yml will be installed. This can take around 10 minutes.

If your run is unexpectedly interrupted, you can restart from the last completed step using the -resume flag.

nextflow run scbirlab/nf-crispriseq -resume

Inputs

The following parameters are required:

sample_sheet: path to a CSV containing sample IDs matched with FASTQ filenames, references, and adapter sequences
fastq_dir: path to directory containing the FASTQ files (optionally GZIPped)
inputs: path to directory containing files referenced in the sample_sheet, such as lists of guide RNAs.

The following parameters have default values that can be overridden if necessary.

output = "outputs": path to directory to put output files
sample_names = "sample_id": column of sample_sheet to take as a sample identifier. Use "Run" for SRA table inputs.
use_umis = false: Whether to the reads include UMIs
from_sra = false: Whether the FASTQ files should be pulled from the SRA instead of provided as local files
guides = true: Whether the name of a CSV of guide sequences is provided in the sample_sheet
name_column = "Name": If using a CSV of guide RNA sequences (guides = true), the column containing the name of each guide
sequence_column = "guide_sequence": If using a CSV of guide RNA sequences (guides = true), the column containing the sequence of each guide
rc = false: Whether to reverse complement the guide sequences before mapping.
trim_qual = 10 : For cutadapt, the minimum Phred score for trimming 3' calls
min_length = 105 : For cutadapt, the minimum trimmed length of a read. Shorter reads will be discarded

The parameters can be provided either in the nextflow.config file or on the nextflow run command.

Here is an example of the nextflow.config file:

params {
   
    sample_sheet = "/path/to/sample-sheet.csv"
    fastq_path = "/path/to/fastqs"
    guides = "/path/to/reference"

    // Optional
    rc = true
    guides = true
    trim_qual = 15
    min_length = 90

}

Alternatively, you can provide these on the command line:

nextflow run scbirlab/nf-crispriseq -r v0.0.1 \
    --sample_sheet /path/to/sample_sheet.csv \
    --fastq /path/to/fastqs \
    --reference /path/to/reference \
    --rc --guides \
    --trim_qual 15 --min_length 90

Sample sheet

The sample sheet is a CSV file providing information about each sample: which FASTQ files belong to it, the reference genome accession number, adapters to be trimmed off, (optionally) the UMI
scheme, (optionally) the name of a table of known guide RNAs, and (optionally) experimental conditions if calculating fitness.

The file must have a header with the column names below, and one line per sample to be processed.

sample_id: the unique name of the sample
genome: The NCBI assembly accession number for the organism that the guide RNAs are targeting. This number starts with "GCF_" or "GCA_".
pam: The name (e.g. "Spy" or "Sth1") or sequence (e.g "NGG" or "NGRVAN") of the dCas9 PAM used in the experiment
scaffold: The name of the sgRNA scaffold ("PerturbSeq" or "Sth1") used in the experiment. The pipeline will look for files matching <fastq_dir>/*<dir>*, and should match only the forward read if you had paired-end sequencing.
adapter_5prime: the 5' adapter on the forward read to trim to in cutadapt format. Sequence to the left will be removed, but the adapters themselves will be retained.
adapter_3prime: the 3' adapter on the forward read to trim to in cutadapt format. Sequence matching the adapter and everything to the right will be removed.

If you have set from_sra = false (the default):

reads: the search glob to find FASTQ files for each sample in fastq_dir (see config). Otherwise with from_sra = true:
Run: the SRA Run ID

If you have set use_umis = true (the default):

umi_pattern: the cell barcode and UMI pattern in umitools regex format for the forward read

If you have set guides = true (the default):

guides_filename: the name of a file in the inputs directory containing guide sequences.

Here is an example of the sample sheet:

sample_id	genome	reads	guides_filename	pam	scaffold	adapter_5prime	adapter_3prime	umi_pattern
lib001	GCA_003076915.1	FAU6865A42_*_R1	guides.csv	Spy	PerturbSeq	^N{8}TCGACTGAGCTGAAAGAAT	GTTTAAGAGCTATGCTGG	^(?P<umi_1>.{8})(?P<discard_1>.{86}).+$
lib002	GCA_003076915.1	FAU6865A43_*_R1	guides.csv	Spy	PerturbSeq	^N{8}TCGACTGAGCTGAAAGAAT	GTTTAAGAGCTATGCTGG	^(?P<umi_1>.{8})(?P<discard_1>.{86}).+$

And here is an example of the guides_filename (guides.csv in this example):

Name	guide_sequence
guide001	TCGACTGAGCTGAAAGAAT
guide002	GTTTAAGAGCTATGCTGGT

It is also possible to provide the guides as a fasta file:

>guide001
TCGACTGAGCTGAAAGAAT
>guide002
GTTTAAGAGCTATGCTGGT

You don't need to provide gene anntotation infomation, because the pipeline will map these guides back to the genome and annotate the features for you.

Outputs

Outputs are saved in the output defined in the config file. They are organised under three directories:

processed: FASTQ files and logs resulting from trimming and UMI extraction
mapped: FASTQ files and logs resulting mapping features
counts: tables and plots relating to UMI and read counts
multiqc: HTML report on processing steps

Issues, problems, suggestions

Add to the issue tracker.

Further help

Here are the help pages of the software used by this pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
bin		bin
test/sra		test/sra
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧮 CRISPRi-seq pipeline

Processing steps

Other steps

Requirements

Software

First time using Nextflow?

Quick start

Inputs

Sample sheet

Outputs

Issues, problems, suggestions

Further help

About

Releases

Packages

Languages

License

scbirlab/nf-crispriseq

Folders and files

Latest commit

History

Repository files navigation

🧮 CRISPRi-seq pipeline

Processing steps

Other steps

Requirements

Software

First time using Nextflow?

Quick start

Inputs

Sample sheet

Outputs

Issues, problems, suggestions

Further help

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages