Skip to content

Commit

Permalink
Merge pull request #76 from nfdi4plants/status_quo
Browse files Browse the repository at this point in the history
Include isa-xlsx for ARC-specification 1.2
  • Loading branch information
HLWeil authored Nov 9, 2023
2 parents ddfa462 + c113ff9 commit 427935a
Show file tree
Hide file tree
Showing 2 changed files with 779 additions and 64 deletions.
105 changes: 41 additions & 64 deletions ARC specification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Annotated Research Context Specification, v1.1-rfc
# Annotated Research Context Specification, v1.2

Please provide feedback via GitHub issues or a pull request.

Expand All @@ -10,29 +10,30 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use

## Table of Contents

- [Introduction](#introduction)
- [Extensions](#extensions)
- [ARC Structure and Content](#arc-structure-and-content)
- [High-Level Schema](#high-level-schema)
- [Example ARC structure](#example-arc-structure)
- [ARC Representation](#arc-representation)
- [ISA-XLSX Format](#isa-xlsx-format)
- [Study and Resources](#study-and-resources)
- [Assay Data and Metadata](#assay-data-and-metadata)
- [Workflow Description](#workflow-description)
- [Run Description](#run-description)
- [Additional Payload](#additional-payload)
- [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description)
- [Investigation and Study Metadata](#investigation-and-study-metadata)
- [Top-Level Run Description](#top-level-run-description)
- [Shareable and Publishable ARCs](#shareable-and-publishable-arcs)
- [Reproducible ARCs](#reproducible-arcs)
- [Mechanism for Quality Control of ARCs](#mechanism-for-quality-control-of-arcs)
- [Best Practices](#best-practices)
- [Community Specific Data Formats](#community-specific-data-formats)
- [Compression and Encryption](#compression-and-encryption)
- [Directory and File Naming Conventions](#directory-and-file-naming-conventions)
- [Appendix: Conversion of ARCs to RO Crates](#appendix-conversion-of-arcs-to-ro-crates)
- [Annotated Research Context Specification, v1.2](#annotated-research-context-specification-v12)
- [Introduction](#introduction)
- [Extensions](#extensions)
- [ARC Structure and Content](#arc-structure-and-content)
- [High-Level Schema](#high-level-schema)
- [Example ARC structure](#example-arc-structure)
- [ARC Representation](#arc-representation)
- [ISA-XLSX Format](#isa-xlsx-format)
- [Study and Resources](#study-and-resources)
- [Assay Data and Metadata](#assay-data-and-metadata)
- [Workflow Description](#workflow-description)
- [Run Description](#run-description)
- [Additional Payload](#additional-payload)
- [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description)
- [Investigation and Study Metadata](#investigation-and-study-metadata)
- [Top-Level Run Description](#top-level-run-description)
- [Shareable and Publishable ARCs](#shareable-and-publishable-arcs)
- [Reproducible ARCs](#reproducible-arcs)
- [Mechanism for Quality Control of ARCs](#mechanism-for-quality-control-of-arcs)
- [Best Practices](#best-practices)
- [Community Specific Data Formats](#community-specific-data-formats)
- [Compression and Encryption](#compression-and-encryption)
- [Directory and File Naming Conventions](#directory-and-file-naming-conventions)
- [Appendix: Conversion of ARCs to RO Crates](#appendix-conversion-of-arcs-to-ro-crates)

## Introduction

Expand Down Expand Up @@ -127,61 +128,33 @@ Notes:

### ISA-XLSX Format

ISA-XLSX follows the ISA model specification (v1.0) saved in a XLSX format. The XLSX format uses the SpreadsheetML markup language and schema to represent a spreadsheet document. Conceptually, using the terminology of the Spreadsheet ML specification [ISO/IEC 29500-1](https://www.loc.gov/preservation/digital/formats/fdd/fdd000398.shtml#:~:text=The%20XLSX%20format%20uses%20the,a%20rectangular%20grid%20of%20cells.), the document comprises one or more worksheets in a workbook. Every worksheet MUST contain one table object storing the metadata. Comments or auxillary information MAY be stored alongside with table objects in a worksheet.
The ISA-XLSX specification is currently part of the ARC specification. Its version therefore follows the version of the ARC specification.

### Study and Resources

The characteristics of all material and resources used within the investigation must be specified in a study. Studies must be placed into a unique subdirectory of the top-level `studies` subdirectory. All ISA metadata specific to a single study MUST be annotated in the file `isa.study.xlsx` at the root of the study's subdirectory. This workbook MUST contain a single resources description that can be organized in one or multiple worksheets. Material or experimental samples can be stored in the form of virtual sample files (containing unique identifiers) in the resources directory. Each external data file can be interpreted as a virtual sample and stored accordingly under resources. External data refers to data that is neither originating within the investigation scope of the ARC nor can be referenced externally, but is required to ensure reproducibility.

Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory.

### Assay Data and Metadata
https://github.com/nfdi4plants/ARC-specfication/blob/main/ISA-XLSX.md

All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets. Worksheets MUST be named uniquely within the same workbook. A worksheet named `Assay` MUST store the STUDY ASSAYS section defined on investigation-level of the ISA model and is not required in the `isa.investigation.xlsx`. These include the terms `Study Assay Measurement Type`, `Study Assay Measurement Type Term Accession Number`, `Study Assay Measurement Type Term Source REF`, `Study Assay Technology Type`, `Study Assay Technology Type Term Accession Number`, `Study Assay Technology Type Term Source REF`, and `Study Assay Technology Platform`.
Additional worksheets MUST contain a table object with fields organized on a per-row basis. The first row of the table object MUST be used for column headers. A `Source` MUST be indicated with the column heading `Source Name`. Every table object MUST define one source per row and MUST contain at least one source. A `Sample` MUST be indicated with the column heading `Sample Name`. The source-sample-relation MUST follow a unique path in a directed acyclic graph, but MAY be distributed across different worksheets.

<table>
### Study and Resources

<tr><td>
The characteristics of all material and resources used within the investigation must be specified in a study. Studies must be placed into a unique subdirectory of the top-level `studies` subdirectory. All ISA metadata specific to a single study MUST be annotated in the file `isa.study.xlsx` at the root of the study's subdirectory. This workbook MUST contain a single resources description that can be organized in one or multiple worksheets.

| | |
|-|-|
| Study Assay Measurement Type | "value" |
| Study Assay Measurement Type Term Accession Number | "value" |
| Study Assay Measurement Type Term Source REF | "value" |
| ... | ... |
The `study` file MUST follow the [ISA-XLSX study file specification](ISA-XLSX.md#study-file).

</td><td>
Material or experimental samples can be stored in the form of virtual sample files (containing unique identifiers) in the resources directory. Each external data file can be interpreted as a virtual sample and stored accordingly under resources. External data refers to data that is neither originating within the investigation scope of the ARC nor can be referenced externally, but is required to ensure reproducibility.

| Source Name | building block* | Sample Name |
|-|-|-|
mrv1 | descriptorA | s1 |
mrv2 | descriptorB | s2 |
_
Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory.

</td><td>
### Assay Data and Metadata

| Source Name | building block* | Sample Name |
|-|-|-|
s1 | descriptorC | n1 |
s1 | descriptorD | n2 |
s2 | descriptorD | n3 |
All measurement data sets are considered as assays and are considered immutable input data. Assay data MUST be placed into a unique subdirectory of the top-level `assays` subdirectory. All ISA metadata specific to a single assay MUST be annotated in the file `isa.assay.xlsx` at the root of the assay's subdirectory. This workbook MUST contain a single assay that can be organized in one or multiple worksheets.

</td></tr>
<tr>
<th style="text-align:center">[assay]</th>
<th style="text-align:center">[worksheet1]</th>
<th style="text-align:center">[worksheet2]</th>
</tr>
</table>
The `assay` file MUST follow the [ISA-XLSX assay file specification](ISA-XLSX.md#assay-file).

Notes:

- There are no requirements on specific assay-level metadata per formal ARC definition. Conversion of ARCs into other repository or archival formats (e.g. PRIDE, GEO, ENA) may however mandate the presence of specific terms required in the destination format.

- To ensure reusability of assays, it is strongly RECOMMENDED to include necessary metadata mandated by typical metadata schemes necessary for reproduction. This process is facilitated by the use of templates that can be found [here](https://github.com/nfdi4plants/SWATE_templates).

- It is RECOMMENDED to order worksheets according to the source-sample-relation for readability.
- It is RECOMMENDED to order worksheets according to the input-output-relation for readability.

- It is RECOMMENDED to adopt the structure outlined [below](#best-practices) to organize assay data files and other supporting information.

Expand Down Expand Up @@ -240,7 +213,11 @@ Note:

### Top-level Metadata and Workflow Description

*Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the files `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. Furthermore, top-level reproducibility information MUST be provided in the CWL `arc.cwl`, which also MUST exist.
*Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of an investigation captured in the `isa.investigation.xlsx` file, which MUST be present.

The `investigation` file MUST follow the [ISA-XLSX investigation file specification](ISA-XLSX.md#investigation-file).

Furthermore, top-level reproducibility information MUST be provided in the CWL `arc.cwl`, which also MUST exist.

#### Investigation and Study Metadata

Expand Down
Loading

0 comments on commit 427935a

Please sign in to comment.