Skip to content

About the quality standard

Diego De Panis edited this page Feb 18, 2024 · 12 revisions

As the European branch of the EBP project, ERGA is aligned with its genome assembly standards.

Briefly, the following metrics and criteria are to be considered in particular:

Contiguity

  • Contig N50
  • Scaffold N50

Correctness

  • Quality Value (QV)

Completeness

  • Single-copy genes presence (BUSCO)
  • Kmer Completeness

Other criteria

  • % Sequence assigned to candidate chromosomal sequences
  • Gaps/Gbps
  • Sex chromosome identification

Contiguity and Correctness metrics can be summarised with the EBP quality format: log10(Cont_N50).log10(Scaf_N50).Q(QV_value)

log10(10,000)=4, log10(100,000)=5, log10(1,000,000)=6, log10(10,000,000)=7 ...

If N50 is "chromosomal scale", then the log10 value is replaced by a C.


Standards for different scenarios

Minimum requirements are different depending on sample characteristics (sufficient tissue for DNA extraction) and genomic size (chromosomal N50 < 1 Mbp). Please have a look at the EBP Report on Assembly Standards for further details. The following table summarises the proposed metric values in each case:




Further notes

Sometimes, not all the metrics can be met (e.g., low BUSCO score due to a taxonomic group not being well represented).