-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying the Organizational Structure of GeoZarr #34
Comments
The NCZarr convention link above is not the most current version. The most current version is in the netCDF-C docs at this very ugly URL [1]. The main difference is the change to storing NCZarr specific information as extra keys within the Zarr JSON objects (e.g. [1] Sorry for the multiple versions and ugly URL, we are working our way through a big clean-up/reorganization of our netCDF documentation. |
Text edited. |
As reported by @ethanrd and agreed, we aim to align GeoZarr terminology whenever possible with CF terminology which itself relies heavily on NetCDF User Guide. NetCDFAbout dataset
About group
About dimensions
About variables
About coordinate variables
About attributes
CF definitionsauxiliary coordinate variable
coordinate variable
|
I'm a little confused by:
I think the |
@christine-e-smit Absolutely, the .zmetadata indeed consolidates all metadata for groups and arrays within the specified store into a singular resource. This statement in the definition doesn't contradict but rather implies that having this consolidated metadata at the dataset level is mandatory, allowing libraries (like xarray) to understand the structure without needing to read each object individually. |
Improvement:
|
I have not made so much progress, but I would like to share some thoughts about the concept of dataset (coming from xarray, itself based on NetCDF). The GeoZarr specification must balance two key objectives:
For this reason, I think that providing requirements around Dataset (group with coordinates and variables) is essential. It identifies a minimal Zarr structure for interpreting a set of raster variables while still allowing (not excluding) other types of data (e.g., secondary,auxiliary data, point clouds, ...) in other Zarr groups. For example the conformance class "http://www.opengis.net/spec/ogc-geozarr/1.0/conf/dataset" might include a requirement that defines the minimal aspect that are expected by a client. Following xarray encoding of NetCDF:
The relationship with metadata (which is key in Cloud native geospatial), is that I expect a STAC Item/STAC Collection to define asset objects (links) for each dataset, indicating a dedicated dataset media type that informs the client it can be easily displayed on a map, or used in a Jupyter Notebook. --- Reminder --- 📂 GeoZarr Dataset: is a collection of EO data arrays (one or more) that represents information about a measured or observed geospatial phenomena capture at one or more locations and times. It can encompass various formats and types of data, such as granules (individual data points or images), geospatial time series (3D datasets capturing changes over time), or hyperspectral data (capturing a wide spectrum of light beyond visible light for each pixel). 📦 GeoZarr Group, like Zarr Group, acting as directories in a Unix file system, are hierarchically organized, to arbitrary depth. They can be used to organize large numbers of variables.Each group can have attributes, dimensions, variables, and other nested groups. |
ℹ️ Edit: This post has been updated to more accurately capture my original message's intent
One of the foundational steps in developing GeoZarr specifications should involve detailing its organizational structure (typically based on the Zarr objects). The initial version of GeoZarr outlines the GeoZarr Classes but doesn't detail the data model storage strucure and format.
GeoZarr conventions rely on XArray (including its terminology which borrows from CF conventions) which itself does not document explicitly the format.
Implicit structure of GeoZarr/ xArray Zarr
GeoZarr organizes data in a way that is compatible with the structure of Zarr. This structure should be clearly defined, similar to how it is done in the documentation for NCZarr.
Example of structure:
Here’s a simplified breakdown of how GeoZarr organizes its data, using XArray concepts as a foundation:
.zattrs
) and outlines the structure of its contents based on children items (data arrays, coordinates, etc.) in another file (.zmetadata
).Dataset .zattrs
.zarray
). Additionally, it holds geospatial information (e.g., type of observation, units, CF conventions) in another metadata file (.zattrs), including:_ARRAY_DIMENSIONS
: provides the name of dimensions (which siblings provides the coordinates)grid_mapping
: mapping of data to geographical projections (based on CF)Data Array .zattrs
.zattrs
) specifying CF attributes and the dimensions it relates to (ensuring it matches the size of the dimensions of the data arrays it references).An explicit explanation of how coordinates work within the GeoZarr context—especially their interaction with data arrays and how they enable spatial indexing could provide clarity.
Coordinate .zattrs
Structure Overview
With SMOS dataset example:
Structure Specification
🔍 The new structure might differ from XArray's typical approach. For example, the following changes may be considered :
Original (geo) Zarr discussions
The following old discussions related to the conventions initally created by xarray, NCZarr, etc. may help:
Early draft data model structure spec
🚧 List of statements to be assessed, improved and agreed:
Definitions of core elements:
Structure of Dataset:
zgeo
set toDataset
.Structure of DataArray:
zgeo
set toDataArray
.. TBD: exact list of recommended CF attributes ❓_ARRAY_DIMENSIONS
shall provide the name of dimensions coordinates (which siblings provides the coordinates) as defined in the Dataset indexes. ❓Structure of Coordinate:
zgeo
set toCoordinate
.The text was updated successfully, but these errors were encountered: