Skip to content

Reviewers section

Diego De Panis edited this page Oct 23, 2024 · 14 revisions

Important aspects to keep in mind:

  • Your task as a reviewer is to evaluate the assembly using mainly the information available in the EAR.

  • The reviewing process should not be time-consuming. We expect the assemblies reaching this point to be in good shape.

  • The bottom line is that if the assembly meets the EBP Assembly Standards (v5), considering the particularities of the case, and no errors are detected in the HiC contact map, it should be approved.

  • You can request all the clarifications, updates and corrections you deem necessary. Be kind and remember that, like in every other working space, we follow the ERGA Code of Conduct during the reviewing process.

  • It is the reviewer who decides the meaning of the metrics or warnings shown in the report. Everything happens in the context of each particular assembly. The PDF report does not provide an approved status itself.

  • The report guides the reviewer throughout the process and presents metrics and warning flags in a standardised way. Be prepared for edge cases or warnings not being displayed. The PDF report is created by a code based on rules, but the rules are subject to refinement and exceptions, and the code can contain errors, so you must be vigilant.


Going through the EAR

In addition to what is explained in the example structure of the EAR, here you can find other details:

Tags: Check that a valid tag is being displayed. At the moment, the Tags field only displays the particular ERGA project to which the species belongs. Valid Tags are ERGA-BGE, ERGA-Pilot and ERGA-Community.

Species table: ToLID and Species name are manually provided. Class and Order are retrieved from GoaT (notice that sometimes only some of the data will be available on the site). Taxonomy ID is retrieved from NCBI.

Traits table: Summarises observed (from the assembly and the sample) and expected (from Genomescope, GoaT) data. Deviations should raise warnings in the section below.

Summary section: The EBP metric is calculated as floor(log10(Contig N50)).floor(log10(Scaffold N50)).QV(floor(QV)) on each haplotype of the curated assembly. The reviewer will check if the N50 score corresponds to C when the case is required. The warnings are designed to bring the attention of the reviewer to that specific point. The reviewer should quickly double-check the warning in the corresponding section from which it is coming.

The following warnings will be automatically flagged based on expected/observed values:

  • Final assembly size has more than 20% difference from the obtained with Genoscope
  • Observed Haploid number is different to the one retrieved from GoaT
  • Ploidy number obtained from Smudgeplot (or Genomescope) is different to the one retrieved from GoaT
  • Observed sex is different from the recorded Sample sex

The following warnings will be automatically flagged for each haplotype:

  • QV value is less than 40
  • Kmer completeness value is less than 90
  • BUSCO single copy value is less than 90%
  • BUSCO duplicated value is more than 5%
  • There is more than 3% loss in the size of the curated assembly in comparison with the pre-curation
  • More than 1000 gaps/Gbp
  • 90% of the assembly is not in chromosomes, inferred by comparing Scaffold L90 and the observed haploid number

All the curation notes are manually provided. They should provide insight to help understand the assembly process.

Quality metrics table: The values are obtained from gfastats, Merqury and BUSCO. The epigraph below the table shows the BUSCO version, lineage and method used for all the assemblies. A warning will be printed if there are inconsistencies across versions or lineages.

HiC contact maps section: Shows PNG snapshots of the post-curation assembly and provides a link to .pretext (and .mcool if provided) files to properly load the map and swiftly check for issues. Important: HiC maps must be analysed by opening the .pretext/.mcool file through the link available in the PDF and walking the diagonal to spot issues (unfortunately, the hyperlink is not accessible from within GitHub's PDF viewer, so you must download it). Please check the Rapid Curation guide if you need to refresh the interpretation of HiC contact maps.

Kmer spectra section: Merqury Kmer plots are not automatically analysed. The reviewer should check them for signs of issues.

Contamination screening: Blob plots are not automatically analysed to raise flags. The reviewer should check them for signs of issues, also taking into consideration the curation notes. If contamination is showing in the plot, it is expected a comment about it in the curation notes section.

Data coverage table: As of today, warning flags were not added for sequencing coverage. BGE-recommended sequencing recipes are HiFi 25x, HiC 50x, and ONT&Illumina 60&60x. Ultimately, it is the obtained quality of the assembly that determines if the sequencing is deep enough.

Pipelines sections: Both assembly and curation tools and versions here are shown to help give context to the overall process.

Ending section: The EAR ends with information about the submitter and a timestamp of the creation of the document.


The Pull Request space during reviewing

After finishing the curation, the researcher will produce the EAR pdf and open a Pull Request (PR) in the ERGA-consortium/EARs repo to get the assembly reviewed.

1.

When the PR is open, a tag is assigned to it based on the ERGA project, and one assignee is appointed. The task of the assignee is to supervise the process.



2.

If the PR is correct (required folder structure, one species per PR, etc), the assignee will assign himself as a reviewer of the overall process.



3.

At this moment, you will be asked if you agree to be the reviewer of the EAR on this particular PR.



4.

Once you accept the request (you can also reject it), you will be formally assigned as reviewer (your GitHub user will appear on the list of reviewers for the PR on the top right of the webpage). Here, your review process starts.




5.

To start, click on Add your review



6.

To find the EAR pdf, click the three dots ... and select View file



7.

You can inspect the PDF from the browser, but to access the link with the HiC contact map, you must download it (unfortunately, the hyperlink is not accessible from within GitHub's PDF viewer).



8.

To write a message, e.g., asking for clarification, click Review changes and select the Comment type. You can write all the comments that you deem necessary. If you would like to request a correction on the EAR.pdf, leave a message selecting the Request changes type. You can request all the changes that you deem necessary. To approve the Assembly and leave a comment about it, select the Approve option.



9.

Once you have approved the EAR, the assignee will check that there are no issues to merge the PR, and will also give the approval.



10.

When the PR is merged, a new row with your name, the EAR's species name and the PR link will be added to the reviews list, a review will be added to your total count, and your state will be changed to Not Busy.