-
Notifications
You must be signed in to change notification settings - Fork 22
Structure of the EAR pdf
Here, exemplified in an extremely fabricated way to show every possible alert, all the sections of the PDF report for a HiC-phased genome assembly counting two (pseudo)haplotypes are shown in detail.
A. Header section. The version of the script to generate the document is shown. Currently, only one tag is in use. Valid tags are: ERGA-BGE
, ERGA-Pilot
, and ERGA-Satellite
.
B. Table showing Species' information. Class and Order are retrieved from GoaT based on the species name manually provided in the YAML file.
C. Summary table with genome traits showing expected/observed values. Haploid size Expected is obtained from Genomescope2 result while the Observed is the total assembly size (if more than one haplotype is available, the value corresponds to the largest Total bp). Expected haploid number is obtained from GoaT with its source while the Observed is manually provided in the YAML file. Expected Ploidy number is obtained from GoaT with its source, while the Observed is derived from the Smudgeplot result (or Genomescope2, if the latter is unavailable). The Expected sample sex is the one recorded for the sample, while the observed is the one found after the curation; both are manually provided in the YAML file.
D. EBP quality metric calculated for available haplotypes.
E. Automatic warnings based on the available assembly information and triggered by the EBP recommended thresholds. Most warnings are self-explanatory. Assembly length loss is calculated by comparing the pre and post-curation assembly. The flag based on 90% of the assembly in chromosomes is triggered when the Scaffold L90 is greater than the Observed Haploid number.
F. Curator notes are provided manually in the YAML file.
G. Quality metrics table for the pre and post-curation assembly. The amount of haplotypes gives the number of columns.
H. BUSCO version and database used in the analysis of all the assemblies. If there are differences in BUSCO version and/or databases for the different assemblies, a warning will be shown instead.
I. HiC contact map of the curated assembly. If there is more than one curated haplotype, more than one PNG contact map will appear. The hyperlink to the .pretext or .mcool file provided in the YAML file is accessible behind "[LINK]".
J. K-mer spectra of curated assembly section showing plots generated by Merqury. The type and amount of plots depend on the number of haplotypes.
K. Post-curation contamination screening section. If there is more than one curated haplotype, more than one PNG blobplot will appear.
L. Table showing data coverage based on the values manually provided in the YAML file.
M. Tree-like diagram of the assembly pipeline based on the values (name of the tool, version and important parameters to consider) manually provided in the YAML file.
N. Tree-like diagram of the curation pipeline based on the values (name of the tool, version and important parameters to consider) manually provided in the YAML file.
O. End section of the report. Submitter name and affiliation are manually provided in the YAML file. Date and time of the document's creation are shown.