Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use implicit fill values for zarr v2 #2274

Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Sep 30, 2024

Closes #2271

I'm not sure if this is a good idea, but this, consolidated metadata, and #2270 lets xarray round trip a dataset. The basic idea is to let ArrayV2Metadata represent the v2 metadata a bit more faithfully. Currently, we "eagerly" interpret the lack of fill_value in a V2 document as the default fill value for some dtype (e.g. 0 for int64). This PR changes that, to allow ArrayV2Metadata.fill_value to be None if it's unspecified in the metadata. But we do need some fill value when reading data. So that's where we finally substitute in the default.

This doesn't really affect Zarr directly, but it is useful for downstream libraries that want to inspect the array metadata before reading and make some decision based on whether or not fill_value is specified.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Seems reasonable though.

src/zarr/codecs/pipeline.py Outdated Show resolved Hide resolved
src/zarr/core/metadata/v2.py Outdated Show resolved Hide resolved
@TomAugspurger TomAugspurger marked this pull request as ready for review October 7, 2024 15:26
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. My one suggestion would be to consider a special case for object / string dtypes.

src/zarr/core/metadata/v2.py Outdated Show resolved Hide resolved
src/zarr/core/metadata/v2.py Outdated Show resolved Hide resolved
tests/v3/test_v2.py Outdated Show resolved Hide resolved
Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, nice work Tom!

@TomAugspurger TomAugspurger added the downstream Downstream libraries using zarr label Oct 9, 2024
out[out_selection] = chunk_spec.fill_value
fill_value = chunk_spec.fill_value

if fill_value is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we may consider in the future is to parameterize the CodecPipeline class with the zarr_format of the calling Array. This would allow us feel more confident that workarounds like this one only apply to v2 data.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TomAugspurger!

@jhamman jhamman added this to the 3.0.0.beta milestone Oct 9, 2024
@jhamman jhamman added the V3 Affects the v3 branch label Oct 9, 2024
@TomAugspurger TomAugspurger merged commit 81a87d6 into zarr-developers:v3 Oct 10, 2024
22 checks passed
@TomAugspurger TomAugspurger deleted the user/tom/fix/v2-no-fill-value branch October 10, 2024 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
downstream Downstream libraries using zarr V3 Affects the v3 branch
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Don't infer a default fill_value for zarr_format=2?
3 participants