My thoughts on coordinate #48

martindurant · 2024-05-29T17:03:01Z

Sorry for getting distracted at the end of the geo-zarr meeting we just had (for those that were there). Here is a summary of what I was getting at.

(@rabernat , yes I know this has been discussed many times over - apologies)

There are two principal parts to the coordinates problem:

coordinate tranform
parsing/reading coordinate definitions

Coordinate transform

A mechanism within zarr/xarray to find (each of) the coordinates of a given array position and the (fractional) array location of a given coordinate set. This should be a vectorized operation each way.

Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).

I suggest that this should be an extension point, each associated with a different internal representation (e.g., affine is usually a square matrix, explicit arrays are usually one- or two-dimensional arrays with sizes determined by the data)
on day 1, we want to support explicit values and affine (linear transform)
other transforms should be pluggable, and eventually include for instance the large number of each curvature models built into grib
whether we should have a single affine matrix across all dimensions (lon, lat, time = f(x, y, z)), or if we should split dimensions (lon, lat = f1(x, y); time = f2(z)) is a decision to be taken early.
the coordinates interface must support slicing and might support units.

Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.

Coordinate definitions

In the meeting, a few specific (geo) coordinate definitions were mentioned:

gdal coefficients
tiff bounding box
CRS text/parameters

plus, of course, netCDF explicit arrays (with or without CF). I also mentioned astro WCS as a reference point (which supports explicit, affine, and various analytic forms for arbitrary dimensionality with no geo reference; interestingly, it also applies to fields of tables).

I would suggest that it is the job of geo-zarr to build the converters to and from these styles of definitions to transform internal representation, such that you can round-trip coordinate information without losing accuracy.

dblodgett-usgs · 2024-05-29T17:23:22Z

Wish I had space to take part in this work more... sorry to pop into this issue out of the blue, but I can't resist.

I Couldn't agree more @martindurant.

A potential source for inspiration on this is the implementation of rectilinear, curvilinear, and discrete spatio-temporal array axes in the stars R package. @edzer may be able to weigh in / advise. https://r-spatial.github.io/stars/articles/stars4.html is probably a good place to start.

mdsumner · 2024-06-02T23:17:04Z

I'm also trying to find my feet in this Python heavy space. Shouldn't this be a Zarr topic? Non lonlat geography exists in "geo", and even xarray has recognized the need to move beyond degenerate rectilinear arrays as the most compact referencing model. Zarr itself needs these compact forms as well, it's more about graphics and model arrays than geo-anything. Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain, independently of whether a NetCDF style or more general framework is used. I just worry this tent isn't broad enough, but I appreciate the importance (and brilliance) of Zarr. If it can get this smarter referencing for regular or graphics arrays, and not mix up regular-grids-devolved-to-longlat with real curvilinear cases it will truly be a general and future-proof framework.

martindurant · 2024-06-03T14:23:24Z

Shouldn't this be a Zarr topic?

Yes, certainly it could be copied there; or maybe the coordinate interpreting discussion belongs in xarray? Maybe zarr simply presents the attributes defining coordinates mapping to other libraries, but personally I'd be happy to see the f(x, y, z, ...) and its inverse(s) defined in zarr.

Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain

Exactly. I particularly have in mind medical ("device" and "patient" coordinates, normally affiune transforms) and astro (curvilinear celestial coordinates and physical units like wavelength), because of my background.

christophenoel · 2024-06-24T10:57:28Z

I just want to drop this here: the OGC specification that deals with all types of coverage and their encoding is OGC Coverage Implementation Schema 1.1 : https://docs.ogc.org/is/09-146r6/09-146r6.html#39

benbovy · 2024-09-18T12:57:07Z

Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).

Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.

This is outside of the geozarr scope and more specific to Xarray, but I'd be happy to help towards having some kind of CoordinateTransform class built in Xarray, which would provide an abstract interface + a minimal set of features (e.g., for data slicing) such that it can be easily reused in 3rd-party, domain-specific Xarray indexes.

(note: Xarray "flexible" indexes are still not well documented)

christophenoel · 2024-09-25T06:22:36Z

@benbovy : Do you have any information about these flexible indexes. How does it work ?

benbovy · 2024-09-25T08:18:01Z

@christophenoel - If I had to summarize how it works in one sentence: Xarray coordinates are all about data (labels) and metadata (attributes) whereas Xarray Index provides an API that allows dealing with these data / metadata in a highly customizable way for most common Xarray operations (isel, sel, align, concat, stack...). Xarray indexes are also stateful objects that may hold and propagate additional information (arbitrarily structured) along with coordinate labels and attributes.

There's only very basic documentation here: https://docs.xarray.dev/en/stable/internals/how-to-create-custom-index.html

You can also have a look at the different examples collected in pydata/xarray#7041.

Also specific to this issue: pydata/xarray#9543

More to come soon(-ish), hopefully!

rabernat · 2024-09-25T12:51:45Z

Everyone on this repo should check out Benoit's PR above. It's exactly what we need to move this forward.

christophenoel changed the title ~~My thoughts~~ My thoughts on coordinate Jun 24, 2024

benbovy mentioned this issue Sep 24, 2024

Flexible coordinate transform pydata/xarray#9543

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My thoughts on coordinate #48

My thoughts on coordinate #48

martindurant commented May 29, 2024

dblodgett-usgs commented May 29, 2024

mdsumner commented Jun 2, 2024

martindurant commented Jun 3, 2024

christophenoel commented Jun 24, 2024

benbovy commented Sep 18, 2024

christophenoel commented Sep 25, 2024

benbovy commented Sep 25, 2024

rabernat commented Sep 25, 2024

My thoughts on coordinate #48

My thoughts on coordinate #48

Comments

martindurant commented May 29, 2024

Coordinate transform

Coordinate definitions

dblodgett-usgs commented May 29, 2024

mdsumner commented Jun 2, 2024

martindurant commented Jun 3, 2024

christophenoel commented Jun 24, 2024

benbovy commented Sep 18, 2024

christophenoel commented Sep 25, 2024

benbovy commented Sep 25, 2024

rabernat commented Sep 25, 2024