This is the opening page for the HTS workshop. Content is divided according to the teaching stream depending on user experience and profficiency.
-
Introduction to NeSI
-
Quality filtering Illumina data
-
Quality filtering Nanopore data
-
Annotating sequences with BLAST
-
Level 2 - Advanced
The work covered in this training programme are run through the New Zealand eScience Infrastructure (NeSI) platforn. Workshop participants are expected to have set up an account with the correct project access prior to attending this workshop. Please contact the workshop organisers to arrange access to the project accounts.
If you are a beginner to this work, keep in mind that the glossary of terms and slurm module guide which will be helpful as we progress through the materials.
An introduction to the Unix shell for people working with genomics data. This material is adapted from the Data Carpentry Genomics Workshop. Please see http://www.datacarpentry.org/shell-genomics/ for the original version of this material.
Command line interface (OS shell) and graphic user interface (GUI) are different ways of interacting with a computer's operating system. The shell is a program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard combination.
There are quite a few reasons to start learning about the shell:
- For most bioinformatics tools, you have to use the shell as there is no graphical interface.
- The shell gives you power to do your work more efficiently and more quickly.
- When you need to do things tens to hundreds of times, knowing how to use the shell is transformative.
Many of the exercises covered in this training programme are obtained from, or inspired by, the Data Carptentry initiative, particularly their Genomics Workshop1.
This workshop provides a basic introduction to working with the slurm
scheduling system, and begins working with Illumina MiSeq and Oxford Nanopore Technology sequence data. The data used in this workshop is mostly using simulated reads, produced using InSilicoSeq
[^2] from the Mycoplasma bovis 8790 reference genome NZ_LAUS01000004.1. We also make use of publicly available sequencing data from the studies PRJNA813586, PRJEB38441, and PRJEB38523.
Additional teaching materials were sourced from:
- Genomics Aoteoroa Metagenomic Summer School workshop2.
- Long-Read, long reach Bioinformatics Tutorial3.
- Galaxy Training! seuqence analysis resources4.
[^2] Hadrien Gourlé, Oskar Karlsson-Lindsjö, Juliette Hayer, Erik Bongcam-Rudloff (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics 35(3), 521-522.
Footnotes
-
Erin Alison Becker, Anita Schürch, Tracy Teal, Sheldon John McKay, Jessica Elizabeth Mizzi, François Michonneau, et al. (2019, June). datacarpentry/shell-genomics: Data Carpentry: Introduction to the shell for genomics data, June 2019 (Version v2019.06.1). Zenodo. http://doi.org/10.5281/zenodo.3260560. ↩
-
Jian Sheng Boey, Dinindu Senanayake, Michael Hoggard et al. (2022). Metagenomics Summer School https://github.com/GenomicsAotearoa/metagenomics_summer_school. ↩
-
Tim Kahlke (2021). Long-Read Data Analysis https://timkahlke.github.io/LongRead_tutorials/. ↩
-
Joachim Wolff, Bérénice Batut, Helena Rasche (2023). Sequence Analysis (revision 96e01807afff10d6060ac0691d004f0469676534). https://training.galaxyproject.org/training-material/topics/sequence-analysis/. ↩