MASCARA (Multiomic Analysis of Single Cell ATAC-seq and RNA-seq)

A simple method for identifying transcription factor mediated regulatory networks from scRNA and scATAC data

Team Members

Goals and Aims

The advent of single cell sequencing technologies have now allowed for the identification and characterization of rare cell types. Identifying the key transcription factors and downstream target genes is important for understanding the biology of these rare populations. The goal of this project is to develop a workflow that identifies and ranks transcriptional regulators important in the various cell states as identified by single cell sequencing. By combining both scRNA-seq and scATAC-seq we can increase our power to identify biologically meaningful gene regulatory networks.

The aim is to take a dataset which includes both single cell RNA and ATAC seqs, identify cell type clusters in them, and then integrate them in order to find different levels of cell type regulators. We would take this opportunity to standardize and automate various aspects of this pipeline, especially the integration between data types, for future uses, and also to allow direct flow into other levels of analysis. Additionally, we hope to provide information to the user about the genes in the identified networks, in order to inform conclusions and future hypothesis/experiment decisions.

Dependencies

The goal of our workflow is to be containerized so that all packages and dependencies are included in the docker image. We only require snakemake to run the pipeline and few R packages in order to visualize the output from MASCARA.

Pipeline

Singularity - a container platform.

Snakemake - a workflow management tool.

Visualization

R - A software environment for statistical computing and graphics

Install shiny, networkD3, and dplyr within R

install.packages(c("shiny", "networkD3", "dplyr"))

Workflow

)

Deliverables

A workflow that outputs a table containing a ranked transcription factor mediated network and an easy to use interactive visualization platform of the gene regulatory networks.

Installation

Clone this repository using

git clone https://github.com/NCBI-Codeathons/MASCARA.git

Input Files

The pipeline requires as input:

scRNA-seq result - a SingleCellExperiment object as .Rds file
scATAC-seq result - a SingleCellExperiment object as .Rds file
transcriptome - a GTF file
chrom.sizes file

Output

network.tsv - tab-delimited file containing the cluster specific transcription factors and downstream target genes. Column IDs are Celltype, TF (Transcription Factor), TG (Targets), weight (interaction strength on a scale from -1 to 1) , hgnc symble, ensembl gene id, entrez gene id, gene descripton, chromosome, start, stop and strand.

Tutorial

1(a): Running Example Data

The full example data can be downloaded by navigating to the data/ folder and running

snakemake

This will download two .Rds files, representing the scRNA-seq and scATAC-seq SingleCellExperiment objects from Granja, et al., 2019, as well as GTF and chrom.sizes files for hg19.

Update 11/8/20: Sample data now available for PBMC same cell single cell RNA and ATAC seq which has been verified using Seurat pipelines:

ATAC: http://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5

ATAC Meta Data: http://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_singlecell.csv

RNA: http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_filtered_feature_bc_matrix.h5

The main pipeline is preconfigured (see config.yaml) to uses these downloaded files. The full pipeline can then be run by navigating to the root of the repository (MASCARA/) and running

snakemake --use-singularity

1(b): Running Other Data

To run user-supplied data, edit the config.yaml file to specify the locations of the input files (scRNA-seq .Rds, scATAC-seq .Rds, transcriptome GTF, and chrome.sizes). Be sure to also change the genome: value to match the genome that was used in the alignments.

From the root of the repository (MASCARA/), run the pipeline using

snakemake --use-singularity

2: Visualization with Shiny

Once the pipeline has finished running, there will be a final output file data/output/network.tsv. These results can be visualized and explored interactively in a Shiny app by running the following from the command line

Rscript shinyapp/app.R data/output/network.tsv

This will automatically launch open the app in the default web browser.

Example Output

11/8/20 - Added gene information for genes contained within the network, including ensemble gene ID, gene information, and location

Future Directions

In a near future update, we will be adding increased information to the network visualization, including hover over information for the genes and motif/tissue specificity information for the transcription factors.

Longer term goals include to integrate a pseudotime analysis as a method to understand how regulatory networks change at different time points during cell type differentiation and or disease progression. Incorporating a trajectory inference may help to better characterize the evolution and divergences between cell clusters.

Citations

Data used in tutorial:

Granja, J.M., Klemm, S., McGinnis, L.M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat Biotechnol 37, 1458–1465 (2019) doi:10.1038/s41587-019-0332-7

Packages/Applications

Docker - Dirk Merkel. 2014. Docker: lightweight Linux containers for consistent development and deployment. Linux J. 2014, 239, Article 2 (March 2014).
Cicero - Pliner, H. A., Packer, J. S., McFaline-Figueroa, J. L., Cusanovich, D. A., Daza, R. M., Aghamirzaie, D., … Trapnell, C. (2018). Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. Molecular cell, 71(5), 858–871.e8. doi:10.1016/j.molcel.2018.06.044
ChromVar - Schep, A., Wu, B., Buenrostro, J. et al. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods 14, 975–978 (2017) doi:10.1038/nmeth.4401
ACTIONet - Mohammadi, S., Davila-Velderrain, J., Kellis, M. (2019) A multiresolution framework to characterize single-cell state landscapes. bioRxiv 746339; doi: doi.org/10.1101/746339
Biomart - Durinck S, Spellman P, Birney E, Huber W (2009). “Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt.” Nature Protocols, 4, 1184–1191.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
data		data
docker		docker
scripts		scripts
shinyapp		shinyapp
src		src
.DS_Store		.DS_Store
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MASCARA (Multiomic Analysis of Single Cell ATAC-seq and RNA-seq)

A simple method for identifying transcription factor mediated regulatory networks from scRNA and scATAC data

Team Members

Goals and Aims

Dependencies

Pipeline

Visualization

Workflow

Deliverables

Installation

Input Files

Output

Tutorial

1(a): Running Example Data

1(b): Running Other Data

2: Visualization with Shiny

Example Output

Future Directions

Citations

Team Members

About

Releases

Packages

Contributors 9

Languages

NCBI-Codeathons/MASCARA

Folders and files

Latest commit

History

Repository files navigation

MASCARA (Multiomic Analysis of Single Cell ATAC-seq and RNA-seq)

A simple method for identifying transcription factor mediated regulatory networks from scRNA and scATAC data

Team Members

Goals and Aims

Dependencies

Pipeline

Visualization

Workflow

Deliverables

Installation

Input Files

Output

Tutorial

1(a): Running Example Data

1(b): Running Other Data

2: Visualization with Shiny

Example Output

Future Directions

Citations

Team Members

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages