Enjoy!π
A SNEAK PEEK INTO OUR LAB
Pasilla gene 𧬠encodes a set of proteins that are most similar to those found in humans Nova-1 and Nova-2 are the names of two satellites. Nova-1 and Nova-2 are nuclear RNA-binding proteins that are normally expressed in the CNS and regulate splicing directly.There are numerous applications for RNA sequencing data with a reference genome, and there is no optimal pipeline for all cases. We will go over all the major steps in Reference based RNA-seq data analysis π», such as quality control (FastQC, MutiQC, Cutadapt), reads alignment (RNA STAR, IGV), gene and transcript quantification (HT-Seq count), differential gene expression (DEseq2), functional profiling (goseq) and advanced analysis. Using this tutorial, we were able to do differnetial gene expression analysis besides identifying the genes and pathways regulated by the Pasilla gene as they were affected by its depletion. To know more about our adventures π€― with the analysis process, don'y hesitate to check the following hyperlinks π₯π. You will explore a lot about the field of transcriptomics in the next minutes π
To check out the results of our workflow in galaxy, click on the links below:
Quality_Control | Mapping | Analysis_Visualisation_Functional Enrichment
Click on each image to know the detailed procedure to conduct this analysis step
Finally, after finishing our analysis ππ, our team decided to start an initiative to help all the members in learning from each other and in developing the biggest set of skills π§°during this stage! Hence, we organized a training on Friday, where we had a workshop for each step in the workflow. The highly passionate members π¨βπ¬ π©βπ¬ who volunteered to give the workshops are highlighted in the contributions list. In this workshop, the moderators explained the purpose of doing each step in the tutorial and how it can be benefitial for the analysis π beside highlighting some improvement points. Also, they did the anlysis process practically to help the other members follow their steps. At the end, there was a troubleshooting and a Q&A session.Getting to the end of our work, are you excited to meet our team members?!! ππ₯³π₯³
Team Sub-groups | Specific Task | Contributors | Slack IDs |
---|---|---|---|
Quality Control | Sub-samples | Yasmeen & Eman | @Sam & @Eman |
Quality Control | Full datasets | Bandana, Jaspreet, Pankaj | @Bandana, @Jaspreet, @Pankaj |
Quality Control | Markdown Documnetation | Jaspreet, Bandana, Eman | @Eman, @Bandana & @Jaspreet |
Mapping | Inspection of Mapping Results | Yasmeen, Bandana, Dawoud, Nirvana | @Sam, @Bandana, @Dawoud, @Nirvana |
Mapping | Counting the number of reads per annotated gene | Yasmeen, Saket, Johny | @Sam, @Saket, @Johny |
Mapping | Estimation of strandness | Eman, Saket, Johny | @Saket, @Johny, @Eman |
Mapping | Counting reads per genes | Ankita, Favour, Nirvana | @Anku., @Nirvana, @OYEFAVOUR |
Mapping | Markdown Documentation | Eman, Yasmeen, Dawoud, Johny | @Sam & @Eman, @Dawoud, @Johny |
Differential Gene Expression Analysis | Identification of the differentially expressed features | Rana, Utkarsha, Osama, Nikita, Chigozie, Yasmeen, Johny, Shruti | @RanaSalah, @-Utkarsha12-, @Osama, @Nikita2Chimera, @GozieNkwocha, @Sam, @Johny, @ShrutiG |
Differential Gene Expression Analysis | Extraction of annotation of differentially expressed genes | Utkarsha, Osama, Rana, Jaspreet, Nikita, Chigozie | @RanaSalah, @-Utkarsha12-, @Osama, @Nikita2Chimera, @GozieNkwocha, @Jaspreet |
Differential Gene Expression Analysis | Markdown Documentation | Rana | @RanaSalah |
Visualization of the DE genes' expression | Visualization of the normalized counts | Osama, Utkarsha, Rana, Ankita, TosinA, Dawoud | @RanaSalah, @-Utkarsha12-, @Osama, @Anku., @TosinA, @Dawoud |
Visualization of the DE genes' expression | Computation & Visualization of the Z-score | Saket, Utkarsha, Osama, Rana, Ankita, TosinA, Diyar | @Saket, @RanaSalah, @-Utkarsha12-, @Osama, @Anku., @TosinA, @diyar |
Visualization of the DE genes' expression | Markdown Documentation | Osama, Jaspreet, Utkarsha | @Osama, @Jaspreet, @-Utkarsha12- |
Functional enrichment analysis of the DE genes | Gene Ontology analysis | Amira, TosinA, Bandana, Johny, Shruti | @Amira, @TosinA, @Bandana, @Johny, @ShrutiG |
Functional enrichment analysis of the DE genes | KEGG pathways analysis | Amira, TosinA, Chigozie, Rana | @Amira, @TosinA, @GozieNkwocha, @RanaSalah |
GitHUb Markdown Development | Format & Organization | Main ReadMe: Utkarsha, Osama, Rana, Ankita, TosinA, Bandana. Quality control: Jaspreet. Mapping: Saket, Yasmeen, Johny, Dawoud. Differential Gene Expression Analysis: Rana. Visualization: Osama, Jaspreet, Utkarsha. Functional Enrichment Analysis: TosinA | @RanaSalah, @-Utkarsha12-, @Osama, @Anku., @TosinA, @Bandana, @Jaspreet, @Sam, @Saket, @Johny, @Dawoud |
Graphical Abstract Design | Rana, Osama, Jaspreet, Ankita, Diyar | @RanaSalah, @Osama, @Jaspreet, @Anku., @diyar | |
Advertisement | Writing post on transfer-market | Tosin | @TosinA |
Training | Moderated the training workshops & presented the workflow steps practically | Quality control: Yasmeen & Jaspreet. Mapping: Saket, Yasmeen. Differential Gene Expression Analysis: Rana & Osama. Visualization: Osama. Functional Enrichment Analysis: Amira | @Sam & @Jaspreet, @Saket, @RanaSalah, @Osama, @Amira |
Trapnell, C., L. Pachter, and S. L. Salzberg, 2009 TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105β1111. https://academic.oup.com/bioinformatics/article/25/9/1105/203994
BΓ©rΓ©nice Batut, Mallory Freeberg, Mo Heydarian, Anika Erxleben, Pavankumar Videm, Clemens Blank, Maria Doyle, Nicola Soranzo, Peter van Heusden, 2021 Reference-based RNA-Seq data analysis (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/ref-based/tutorial.html Online; accessed Sat Aug 21 2021
Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012
Levin, J. Z., M. Yassour, X. Adiconis, C. Nusbaum, D. A. Thompson et al., 2010 Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nature Methods 7: 709. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3005310/
Young, M. D., M. J. Wakefield, G. K. Smyth, and A. Oshlack, 2010 Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology 11: R14. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-2-r14
Marcel, M., 2011 Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17: http://journal.embnet.org/index.php/embnetjournal/article/view/200
Brooks, A. N., L. Yang, M. O. Duff, K. D. Hansen, J. W. Park et al., 2011 Conservation of an RNA regulatory map between Drosophila and mammals. Genome Research 21: 193β202. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3032923/
Robinson, J. T., H. ThorvaldsdΓ³ttir, W. Winckler, M. Guttman, E. S. Lander et al., 2011 Integrative genomics viewer. Nature Biotechnology 29: 24. https://www.nature.com/nbt/journal/v29/n1/abs/nbt.1754.html
Wang, L., S. Wang, and W. Li, 2012 RSeQC: quality control of RNA-seq experiments. Bioinformatics 28: 2184β2185. https://www.ncbi.nlm.nih.gov/pubmed/22743226
Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski et al., 2013 STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15β21. https://academic.oup.com/bioinformatics/article/29/1/15/272537
Liao, Y., G. K. Smyth, and W. Shi, 2013 featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30: 923β930. https://academic.oup.com/bioinformatics/article/31/2/166/2366196
Kim, D., G. Pertea, C. Trapnell, H. Pimentel, R. Kelley et al., 2013 TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology 14: R36. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-4-r36
Luo, W., and C. Brouwer, 2013 Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29: 1830β1831. https://academic.oup.com/bioinformatics/article-abstract/29/14/1830/232698
Love, M. I., W. Huber, and S. Anders, 2014 Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15: 550. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
Kim, D., B. Langmead, and S. L. Salzberg, 2015 HISAT: a fast spliced aligner with low memory requirements. Nature Methods 12: 357. https://www.nature.com/articles/nmeth.3317
Anders, S., P. T. Pyl, and W. Huber, 2015 HTSeqβa Python framework to work with high-throughput sequencing data. Bioinformatics 31: 166β169. https://academic.oup.com/bioinformatics/article/31/2/166/2366196
Ewels, P., M. Magnusson, S. Lundin, and M. KΓ€ller, 2016 MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32: 3047β3048. https://academic.oup.com/bioinformatics/article/32/19/3047/2196507
Thurmond, J., J. L. Goodman, V. B. Strelets, H. Attrill, L. S. Gramates et al., 2018 FlyBase 2.0: the next generation. Nucleic Acids Research 47: D759βD765. https://academic.oup.com/nar/article-abstract/47/D1/D759/5144957
Kim, D., J. M. Paggi, C. Park, C. Bennett, and S. L. Salzberg, 2019 Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology 37: 907β915. https://www.nature.com/articles/s41587-019-0201-4
Used usegalaxy.org: "The sequencing data were uploaded to the Galaxy web platform, and we used the public server at usegalaxy.org to analyze the data ( Brooks et al. 2011)."