Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check datasets are identical. #162

Open
william-silversmith opened this issue Dec 11, 2023 · 0 comments
Open

Check datasets are identical. #162

william-silversmith opened this issue Dec 11, 2023 · 0 comments
Labels

Comments

@william-silversmith
Copy link
Contributor

william-silversmith commented Dec 11, 2023

It would be useful to check that two datasets are identical via the comparison of a single hash.

There should be two levels of equality:

  • exact equality in terms of e.g. chunking
  • equality regardless of chunking, sharding, encoding, etc

https://en.wikipedia.org/wiki/Merkle_tree

@william-silversmith william-silversmith changed the title Hash of a dataset Check datasets are identical. Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant