Reproducing: Using Next Generation Sequencing to Study the Genetic Diversity of Candidate Live Attenuated Zika Vaccines

This was done as a project in the course Applied Bioinformatics, Chalmers University of Technology.

Study

The study reproduced is “Using Next Generation Sequencing to Study the Genetic Diversity of Candidate Live Attenuated Zika Vaccines” (DOI: https://doi.org/10.3390/vaccines8020161) by Natalie D Collins, Chao Shan, Bruno T D Nunes, Steven G Widen, Pei-Yong Shi, Alan D T Barrett, Vanessa V Sarathy, and is accessible here: https://www.mdpi.com/2076-393X/8/2/161/htm.

Data

The data used were acquired on ArrayExpress (reference E-MTAB-8905, https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8905/samples/?query=zika).

Repository contents

Please see this file as a shorter version of the provided report found in the docs directory and see the provided project pipeline .png for a simplified look at the project data processing pipeline.

In this repository are provided all the scripts used for the reproduction of the initial study. They must be run in this order respectively:

data_acquisition.Rmd: This notebook downloads and unzips the data files from ArrayExpress. It also prepares a test data set which is used by the following scripts for faster run times.
simplifying_metadata.Rmd: This notebook simplifies the metadata file generated with the script data_acquisition.Rmd by removing the unwanted strains or passages for the study. This notebook can be used if you want to simplify the data set and focus on the study of one or several strains/passages, but it is not mandatory to do so.It generates a .txt file that will be used in the script delete_files_from_list.Rmd to delete the undesired files.
delete_files_from_list.Rmd: This notebook deletes files that were not used in the project. This includes the raw data files for the 3'_20 and 3'_30 deletion mutants.
study_pipeline.Rmd: This script executes the pipeline described in the section “Material and Methods” of the report. It is written for a test data set and can be run from start (QC preprocessing) to finish (Shannon entropy calculation). The test data set is generated in the data_acquisition.Rmd script and consist of a pair-end read for WTZIKV P0. The script is written so that it is run with the data directory as its working directory.
result_vizualisation.Rmd: This notebook allows the analysis and visualization of the results obtained from the study_pipeline.Rmd. The figures generated can be seen in the section “Results” of the report.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
docs		docs
results		results
README.md		README.md
presentation.pptx		presentation.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reproducing: Using Next Generation Sequencing to Study the Genetic Diversity of Candidate Live Attenuated Zika Vaccines

Study

Data

Repository contents

About

Releases

Packages

Contributors 2

Languages

StyrbjornKall/Bioinformatics_NGS_Zika_virus

Folders and files

Latest commit

History

Repository files navigation

Reproducing: Using Next Generation Sequencing to Study the Genetic Diversity of Candidate Live Attenuated Zika Vaccines

Study

Data

Repository contents

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages