Snakemake workflow: Demultiplex Fastq files

Given a set of index, fastq and sample to barcode file, generate a directory structure with each sample's forward and reverse reads contained in individual sample directories.

The pipeline requires the following programs

usearch
GNU parallel
seqkit
xlsx2csv
snakemake

Please do not forget to cite the authors of the tools used.

The pipeline does the following

Joins index fastq files (index1 and index2) using usearch -fastq_join
Either creates a 2-column samples to barcode file from an Excel File using xlsx2csv or uses a user supplied samples2barcode.tsv file
Converts the tab delimited samples2barcode.tsv file to a fasta file using awk
Uses Python to reformat barcodes. You might need to edit this step in the reformat_barcodes.py script to be specific for your barcodes
Demultiplexes the reads using usearch -fastx_demux
Splits the demultiplexed reads on a per sample basis with each sample's forward and reverse reads contained in sample specific directories. Uses GNU parallel for parallelization.
Counts and generates useful statistics on the demultiplexed reads using seqkit.

Authors

Olabiyi Obayomi (@olabiyi)

Before you start, make sure you have the programs listed above installed.

Steps

Step 1

Install the software list above.

Step 2

Obtain a copy of this workflow

git clone https://github.com/olabiyi/snakemake-Demultiplex-fastq.git

Step 3

Replace the reads, index and sample2barcodes.tsv file with yours

Step 4: Configure workflow

Configure the workflow according to your needs by editing the files in the config.yaml file

# Get a list of samples to be pasted in the config.yaml file
SAMPLES=($(awk '{print $1}' 01.raw_data/sample2barcode.tsv))
(echo -ne '[';echo ${SAMPLES[*]} | sed -E 's/ /, /g' | sed -E 's/(\w+)/"\1"/g'; echo -e ']')

Step 5: Run the pipeline

snakemake -pr --cores 10 --keep-going

Upon successful completion, your demultiplexed reads will be in a folder named 06.Split/ and statistics on them in a folder named 07.Count_Seqs/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
01.raw_data		01.raw_data
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
reformat_barcode.py		reformat_barcode.py
rulegraph.png		rulegraph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: Demultiplex Fastq files

The pipeline requires the following programs

The pipeline does the following

Authors

Steps

Step 1

Step 2

Step 3

Step 4: Configure workflow

Step 5: Run the pipeline

About

Releases

Packages

Languages

License

olabiyi/snakemake-Demultiplex-fastq

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: Demultiplex Fastq files

The pipeline requires the following programs

The pipeline does the following

Authors

Steps

Step 1

Step 2

Step 3

Step 4: Configure workflow

Step 5: Run the pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages