Given a set of index, fastq and sample to barcode file, generate a directory structure with each sample's forward and reverse reads contained in individual sample directories.
- usearch
- GNU parallel
- seqkit
- xlsx2csv
- snakemake
Please do not forget to cite the authors of the tools used.
-
Joins index fastq files (index1 and index2) using usearch -fastq_join
-
Either creates a 2-column samples to barcode file from an Excel File using xlsx2csv or uses a user supplied samples2barcode.tsv file
-
Converts the tab delimited samples2barcode.tsv file to a fasta file using awk
-
Uses Python to reformat barcodes. You might need to edit this step in the reformat_barcodes.py script to be specific for your barcodes
-
Demultiplexes the reads using usearch -fastx_demux
-
Splits the demultiplexed reads on a per sample basis with each sample's forward and reverse reads contained in sample specific directories. Uses GNU parallel for parallelization.
-
Counts and generates useful statistics on the demultiplexed reads using seqkit.
- Olabiyi Obayomi (@olabiyi)
Before you start, make sure you have the programs listed above installed.
Install the software list above.
Obtain a copy of this workflow
git clone https://github.com/olabiyi/snakemake-Demultiplex-fastq.git
Replace the reads, index and sample2barcodes.tsv file with yours
Configure the workflow according to your needs by editing the files in the config.yaml
file
# Get a list of samples to be pasted in the config.yaml file
SAMPLES=($(awk '{print $1}' 01.raw_data/sample2barcode.tsv))
(echo -ne '[';echo ${SAMPLES[*]} | sed -E 's/ /, /g' | sed -E 's/(\w+)/"\1"/g'; echo -e ']')
snakemake -pr --cores 10 --keep-going
Upon successful completion, your demultiplexed reads will be in a folder named 06.Split/ and statistics on them in a folder named 07.Count_Seqs/