This repository provides data, sequences, and analyses of the metagenomic detection of mammalian wildlife at the Huanan Market in early 2020.
Article sections:
- (R1): SARS-CoV-2 genetic diversity linked to the Huanan market is consistent with market emergence
- (R2): Increased High SARS-CoV-2 positivity rate in and near a wildlife stall in the Huanan market
- (R3): Mammalian wildlife species detected in five SARS-CoV-2 positive samples from a wildlife stall and in other wildlife stalls
- (R4): Wildlife stalls and SARS-CoV-2 positive samples contain other mammalian viruses associated with the animal trade
- (R5): Reconstruction of mitochondrial genotypes of potential intermediate host species of SARS-CoV-2 within the Huanan market
metadata/
: sample and species metadata (R)mitochondrial_mappings/
: Read mapping counts and covered bases (breadth) for animal mtDNA. (R3, R5)mitochondrial_genotypes/
: Animal mtDNA consensus genotypes and SNV VCFs. (R3, R5)sars2_mappings/
: identification and quantification of SARS-CoV-2 reads. (R1, R2)sars2_phylogenetics/
: SARS-CoV-2 phylogeny. (R1)viruses
: identification and quantification of other viruses in the market. (R4)correlations/
: computations of correlations between viruses and animals (R3)fastq_processing/
: code for sample (pre)processing (R)figs/
: directory where figure outputs are stored (R)figures_R/
: R code to generate figures (R)summary_and_tables/
: Supplementary tables (R)
- Mammalian mtDNA consensus genomes:
./mitochondrial_genotypes/lowconfidence_50_percent_mt_genomes.fasta
andmediumconfidence_50_percent_mt_genomes.fasta
. - Script to generate mtDNA consensus genomes:
./mitochondrial_genotypes/mt_consensus_genomes.py
. - BAMs and SNVs of animal mtDNA mappings:
./mitochondrial_genotypes/bams_and_snvs/
. - mtDNA dereplicated databases:
./mitochondrial_mappings/mitochondria_dereplicated93_final.fasta.gz
and./mitochondrial_mappings/mitochondria_dereplicated98.fasta.gz
. - Scripts to count mapped reads and covered bases:
./mitochondrial_mappings/count_reads98.py
and./mitochondrial_mappings/count_reads93.py
. - Mapping mtDNA counts and covered bases:
./mitochondrial_mappings/mitochondrial_metazoa_counts_93.tsv
and./mitochondrial_mappings/mitochondrial_metazoa_coveredbases_93.tsv
.
- Viral sequences (bamboo rat CoV, raccoon dog amdovirus, civet kobuvirus, canine coronaviruses):
./viruses/market_mammalian_viruses.fna
- Influenza H9N2 fragmented genomes reconstructed from the market and trees:
./viruses/Influenza/*.fasta
and./viruses/Influenza/HA_tree.newick
.
./metadata/Liu_etal_2023_with_sequencing.csv
: Full metadata of all samples, linking samples to their primary metagenomic (untargeted) sequencing runs.
./metadata/Sequencing_run_info.tsv
: Information on the 184 sequencing runs in Liu et al.
./metadata/Liu_PCR_results.csv
: PCR CT values reported by Liu et al.
sars2_phylogenetics/20210903.WHO.masked.fasta
: Sequences used in Fig 1A tree.sars2_phylogenetics/HSM_sequences/*.fa
: consensus FASTA sequences for HSM SARS-CoV-2 genomes.sars2_phylogenetics/combined_tMRCA.csv
: tMRCA results (Fig 1C)R/Fig1C.R
: Script to plot tMRCA distributions (Fig 1C)
sars2_mappings/count_reads_sars2.py
: Script used to count the number of SARS-CoV-2 reads per sample.sars2_mappings/sars2_reads_post_trimming.tsv
: counts of SARS-CoV-2 reads in each sample.R/Fig2ABC.R
andR/Fig2DEFGH.R
: Scripts to plot the different panels of Figure 2.
mitochondrial_mappings/clustering98.py
: Script for first clustering the MT reference database at 98% identity.mitochondrial_mappings/clustering93.py
: Script for clustering the MT reference database at 93% identity.mitochondrial_mappings/count_reads98.py
: Script for counting mtDNA reads after mapping to the 98% clustered database.mitochondrial_mappings/count_reads93.py
: Script for counting mtDNA reads after mapping to the 93% clustered database (final species-level assignments).mitochondrial_mappings/mitochondria_dereplicated93_final.fasta.gz
: the final 93% ANI dereplicated database used in this study.mitochondrial_mappings/mitochondrial_metazoa_counts_93.tsv
: Read counts for animal mtDNA.mitochondrial_mappings/mitochondrial_metazoa_coveredbases_93.tsv
: Covered bases count for animal mtDNAmitochondrial_mappings/species_descriptions_with_common_name.csv
: Descriptions of species detected in this study.R/Fig3A.R
andR/FigBCDE.R
: Scripts to plot the different panels of Figure 3.
viruses/Influenza/*.fasta
: consensus Influenza H9N2 sequences identified in this study.viruses/Influenza/HA_tree.newick
: Influenza H9N2 HA tree constructed.viruses/count_reads_viral.py
: Script for filtering and counting reads mapped to the viral database.viruses/filtered_viral_counts_97_95_20_200.tsv
: counts of mapped viral reads.viruses/market_mammalian_viruses.fna
: FASTA files of the consensus genomes assembled for mammalian viruses from the market.viruses/plant_insect_phage_exclude.txt
: List of viruses to ignore due to being phage / plant / insect viruses.viruses/viral_metadata.tsv
: Metadata on various viruses.viruses/cluster_viral_references.py
: Script to cluster the viral reference database.viruses/get_viral_consensus.py
: Script to reconstruct viral consensus genomes from multiple samples.viruses/viral_db.list.txt
: a list of all viral sequences included in the dereplicated mapping database.R/Fig4ABC.R
: Script to plot panels A-C of Figure 4.
mitochondrial_genotypes/bams_and_snvs
: All mapped mammalian mtDNA BAMs from market samples.mitochondrial_genotypes/MT_Maps.ipynb
: Script to create Figure 5a mtDNA genome maps.mitochondrial_genotypes/mt_consensus_genomes.py
: Script to reconstruct mtDNA consensus genomes.mitochondrial_genotypes/lowconfidence_50_percent_mt_genomes.fasta
: MT genomes obtained in this study, requiring just 1x read coverage.mitochondrial_genotypes/mediumconfidence_50_percent_mt_genomes.fasta
: MT genomes obtained in this study, requiring 3x read coverage to call a base.mitochondrial_genotypes/mtDNA_RD_phylogenetics.clean.ipynb
: Raccoon dog phylogenetics analysis notebook.mitochondrial_genotypes/raccoon_dog_cytB_all.duplicated.fasta
: Raccon dog reference cytB alignment.
mitochondrial_mappings/rRNA/count_reads93_rRNAs.py
: Script to count reads excluding rRNA regions of the mitochondria.mitochondrial_mappings/rRNA/mt93_counts_rrna.tsv
: Read counts ignoring rRNA regions.mitochondrial_mappings/rRNA/rRNA_analysis.ipynb
: Script to identify rRNA regions.mitochondrial_mappings/rRNA/rrna_positions.tsv
: positions of rRNA genes in the MT genomes.
correlations/Correlation.ipynb
: Script to calculate correlations between animal sequence read abundances and SARS-CoV-2.
HeCell2022/
: location of these files
He2022_metadata.csv
: Metadata about all of the files from that studycell2022_lowconfidence_mitochondria.fasta
: FASTA file of consensus mitochondrial genomes with at least 1 read covering each position.cell2022_mediumconfidence_mitochondria.fasta
: FASTA file of consensus mitochondrial genomes with at least 3 reads covering each position.scripts/mt_consensus_genomes_He2022.py
: script used to generate Mitochondrial genomes abovegame_animals.xlsx
: Additional metadata from He et al. 2022.
fastq_processing/run_bbduk.sh
: BBDuk commands used for quality filtering.fastq_processing/frequent_adapter_summary.tsv
- Adapter and quality filtering summary per file.data/fastq_processing/
: data related to FASTQ filesall_adapters.fa
: list of adapters used for trimmingfrequent_adapter_summary.tsv
: results of adapter trimming, 5 most common adapters per library.run_bbduk.sh
: List of BBDuk commands run for each file for trimming.bbduk_results/*.txt
: Full adapter trimming output for each file.