-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Scasa is a single cell transcript quantification software designed for single cell RNA-Sequencing data. The software comprises pseudo-alignment to quantification steps. Here we show detailed instructions and examples on how to use scasa as part of a single-cell RNA-seq workflow.
Scasa works with raw FASTQ files directly for single cell RNA-Sequencing alignment and subsequent quantification processes or with alignment output files from Salmon Alevin or Kallisto Bustools for single cell RNA-Sequencing quantification process.
Type scasa --help
in the terminal to see a list of available commands.
> scasa
Usage: scasa [options] [arguments]
List of options:
--help,h 0. Help Page to Display All Options
--project,-p 1. Create a Project Name
--mapper,-m 2. Choose an Alignment Tool
--align,-a 3. Alignment Step
--quant,-q 4. Quantification Step
--in,-i 5. Provide an Input Directory
--fastq,f 6. Input FASTQ Files
--samplesheet,-s 7. Provide a Samplesheet
--out,-o 8. Provide an Output Directory
--ref,-r 9. Reference Transcriptome Fasta File
--index,-e 10. To Index Reference Fasta File
--index_dir,-d 11. Index File Directory if Index was Prebuilt
--whitelist,-w 12. Provide a Whitelist for Barcode Correction
--tech,-t 13. Sequencing Technology Used
--cellthreshold,-c 14. Number of Cells to Retain
--nthreads,-n 15. Set the Number of Threads to Use
--postalign_dir,-g 16. Post-alignment Directory if Alignment was Done in Prior
--createxmatrix,-b 17. To Generate Xmatrix
--xmatrix,-x 18. X-Matrix directory
Use --help,h
to view all options to scasa.
> scasa
Example: scasa --help
(List of options)
Use --project
to pass project name to scasa <STRING, optional, if option is not used, default is set to My_Project. No space is allowed, please use '-' or '_' or "." symbols to replace space. If you want to rerun existing folder, provide the timestamp suffix project folder name (not project directory), for example: My_Project_202104241111>.
Usage: scasa --project [arguments]
Example: scasa --project SCRNASEQ_PROJECT
Use --mapper
to state an alignment tool to be used for alignment step. <STRING, optional, currently scasa only supports two options to --mapper: salmon_alevin, kallisto_bus. Default is set to salmon_alevin>.
Usage: scasa --mapper [arguments]
Example: scasa --mapper salmon_alevin
Arguments available (Default is salmon_alevin):
--mapper salmon_alevin
--mapper kallisto_bus
Use --align
to run pseudo-alignment step <STRING, optional, if set to YES, please state in the --mapper option which pseudo-alignment to use. Currently, scasa only supports two alignment tools: salmon alevin, kallisto bus. Default is set to YES>.
Usage: scasa --align [arguments]
Arguments available (Default is YES):
--align YES
--align NO
Use --quant
to run scasa quantification step to produce transcript counts <STRING, optional, default is set to YES>
Usage: scasa --quant [arguments]
Arguments available (Default is YES):
--quant YES
--quant NO
Use --in
to provide an input directory containing input FASTQ files<STRING, optional, no space in directory path is allowed, default is set to current directory>.
Usage: scasa --in [arguments]
Example: scasa --in /mnt/PROJECT/PROJECT_OUT/
Use --fastq
to provide fastq file names to the argument (not path to fastq files, just fastq file names, path should be stated in --in
option), separate each file by commas, make sure that you have labeled R1 and R2 in each paired fastq names and the file prefix should be the same for each pair of fastq files <STRING, optional, this option is for users with few fastq files to run. User could provide argument to either -samplesheet
or --fastq
but not both. If both --samplesheet
and --fastq
options are not provided, scasa will look for fastq files in the input directory supplied by user via option --in
>.
Usage: scasa --fastq [arguments]
Example 1: scasa --fastq Sample01_R1.fastq,Sample01_R2.fastq
Example 2: scasa --fastq Sample01_R1.fastq,Sample01_R2.fastq,Sample_02_S1_L001_R1_001.fastq,Sample_02_S1_L001_R2_001.fastq
Use --samplesheet
to provide a directory to a comma or tab-separated samplesheet file containing input FASTQ file names (Download an example of samplesheet file here). One row one pair of fastq files, separated by a comma. No header line. Make sure that you have labeled R1 and R2 in each paired fastq names and the file prefix should be the same for each pair of fastq files <STRING, optional, this option is for users with many fastq files to run, if this option is not used, please supply fastq names to option --fastq
. If both --samplesheet
and --fastq
options are not provided, scasa will look for fastq files in the input directory supplied by user via option --in
. No default for this option>.
Usage: scasa --samplesheet [arguments]
Example: scasa --samplesheet /mnt/PROJECT/My_Samplesheet.csv
Use --out
to provide an output directory to your analysis <STRING, optional, no space in the directory is allowed. Output will be generated under the stated output directory with a new folder with user-provided project name with timestamp suffix, default is set to current directory>.
Usage: scasa --out [arguments]
Example: scasa --out /mnt/PROJECT/SCASA_OUT/
Use --ref
to provide reference fasta file path <STRING, required, provide a fasta reference file, currently scasa supports hg38 with the prebuilt annotation. Users could UCSC Hg38 reference fasta file here. No default for this option. However, users can consider to run Scasa for other annotations and species by following the instruction here >.
Usage: scasa --ref [arguments]
Example: scasa --ref /mnt/PROJECT/refMrna.fa
Use --index
to run indexing for reference fasta file <STRING, optional, provide YES or NO to the option. Default is set to YES>.
Scasa utilizes Alevin/Kallisto-bustools for mapping reads to the reference sequences of transcripts and extracting the eqclasses. Currently, Scasa uses only transcript sequences for indexing, decoy sequences recommended in Salmon are not used.
Usage: scasa --index [arguments]
Arguments available (Default is YES):
--index YES
--index NO
Use --index_dir
to provide a directory to reference fasta index file if --index
is set to NO <STRING, optional, right now scasa only supports salmon or kallisto indexed-fasta files. No space in directory path is allowed, no default>.
Usage: scasa --index_dir [arguments]
Example: scasa --index_dir /mnt/PROJECT/refMrna.fa.idx
Use --whitelist
to provide a white list file path for barcode correction. <STRING, optional, Note that this option will be required if --xmatrix
is set to YES for Xmatrix generation. No space in the directory path is allowed, no default>. For more information on the white lists to be used for different versions, visit the following link to obtain the relevant whitelist from 10X Genomics: https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist
Usage: scasa --whitelist [arguments]
Example: scasa --whitelist /mnt/PROJECT/whitelist.txt
Use --tech
to provide the technlogy used for sequencing <STRING, optional. Currently, scasa only supports sequencing output from 10X 3' Chromium V1, V2 and V3 chemistries. Default is set to 10xv3>.
Usage: scasa --tech [arguments]
Arguments available (Default is 10xv3):
--tech 10xv1
--tech 10xv2
--tech 10xv3
Use --index
to provide a threshold for number of expected cells to be produced <NUMERIC, optional, currently, this option is only valid for alevin alignment step, default is set to no expected cells>.
Usage: scasa --cellthreshold [arguments]
Example: scasa --cellthreshold 35000
Use --nthreads
to set the number of threads to be used for running scasa <NUMERIC, optional, set a higher number of threads for faster processing. Default is set to 4>.
Usage: scasa --nthreads [arguments]
Example: --nthreads 16
Use --postalign_dir
to provide a directory to post-alignment files if alignment has been done in prior <STRING, optional, this option is only valid if --align
is set to NO. No space in directory path is allowed, no default. Currently we only supports output from alevin or bustools>.
Usage: scasa --postalign_dir [arguments]
Example: scasa --postalign_dir /mnt/PROJECT/ALIGNMENT_OUTPUT/
Use --createxmatrix
to generate a X-matrix reference file <STRING, optional, provide YES or NO to the option. X-matrix is a matrix containing the starting values EM-algorithm. We provide Xmatrix in our scasa package in prior so users do not need to generate a reference matrix on his/her own. Currently scasa provides two Xmatrix reference files, which supports alignment output from salmon alevin or alignment output from kallisto bus, which can be set by option --xmatrix
. Default is set to NO>.
Usage: scasa --createxmatrix [arguments]
Arguments available (Default is NO):
--createxmatrix YES
--createxmatrix NO
Use --xmatrix
to provide a Xmatrix reference file if --createxmatrix
is set to NO <STRING, optional, two preset options: alevin or bustools. Give argument alevin to use Xmatrix for alevin alignment output or give bustools as argument to use Xmatrix for bustools alignment output data. If a directory is given, no space in directory path is allowed. If option is not set, default is set to use scasa prebuilt Xmatrix for alevin alignment output data>.
Usage: scasa --xmatrix [arguments]
Arguments available (Default is YES):
--xmatrix alevin
--xmatrix bustools
Example 1: scasa --xmatrix alevin
Example 2: scasa --xmatrix /mnt/PROJECT/Self-Generated-Xmatrix-Using-Scasa.RData
An Example:
scasa --fastq Sample_01_S1_L001_R1_001.fastq,Sample_01_S1_L001_R2_001.fastq \
--ref <hg38_ref_file_path> \
--whitelist <test_dataset_whitelist_path> \
--nthreads 4
### Now, you are ready to go!