Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array_api usage for Buffer #2199

Open
ilan-gold opened this issue Sep 17, 2024 Discussed in #2197 · 3 comments
Open

array_api usage for Buffer #2199

ilan-gold opened this issue Sep 17, 2024 Discussed in #2197 · 3 comments

Comments

@ilan-gold
Copy link

ilan-gold commented Sep 17, 2024

Discussed in #2197

Originally posted by ilan-gold September 17, 2024
Hello all,

I have been looking into the core code here (teehee) and I noticed the recent #1967. It made me wonder why array_api + dlpack is not used for the Buffer class (in general, and also specifically in that PR). I am guessing "no one has done it yet" is the answer, which is fair considering we all have other things going on, but then I am curious what #1967 gives that array_api + dlpack wouldn't.

I think cupy, jax (mentioned in the PR), and numpy all support array_api if memory serves correctly.

Sorry for the short question, with probably a long answer, but don't want to presume too much. I also did a quick search of the issues and found nothing relevant to the array api, but might have missed something: https://data-apis.org/array-api/latest/index.html

cc: @akshaysubr @madsbk @d-v-b @dcherian

@ilan-gold ilan-gold changed the title array_api usage array_api usage for Buffer Sep 17, 2024
@akshaysubr
Copy link
Contributor

@ilan-gold This is a good discussion worth having. #1967 was meant to be a first pass at adding device support and it exposed many areas of the code that were relying implicitly on numpy and CPU backed arrays in general. In #1751, I proposed supporting both the __cuda_array_interface__ as well as dlpack but left out the dlpack support in #1967 mainly for simplicity. It would certainly be useful to add a dlpack based Buffer implementation.

I was not aware of array_api and it does seem like a really useful standard that we can use for a generic Buffer implementation. The entrypoints based autodiscovery is a useful feature as well. From a quick search, here are the libraries that currently do/do not support the array_api specification:

  1. Numpy: supported
  2. CuPy: experimental
  3. PyTorch: not supported, but work in progress
  4. JAX: supported

Given the current state of array library compliance, might it make sense to add an array_api based Buffer implementation but keep the existing explicit NumPy and CuPy based implementations with a view to move to the array_api based implementation down the line?

@ilan-gold
Copy link
Author

ilan-gold commented Sep 18, 2024

@akshaysubr I would say that sounds fine, but perhaps not keep the explicit numpy one since it is officially supported as array_api. This would just leave cupy for which there is this: https://github.com/data-apis/array-api-compat if I am not mistaken. I think the goal should really just be one implementation. I imagine it's doable modulo one or two extra if-then checks for things that might be broken. In fact, it seems like we won't have to wait too long if we don't want the compat layer: cupy/cupy#8470

I really think we should strive for one implementation - simpler, and if there are bugs, maybe we can upstream the fixes since array api should be a one-stop-shop!

@ilan-gold
Copy link
Author

ilan-gold commented Sep 18, 2024

Also re: pytorch, I think it's ok if zarr doesn't support it at first. This could be a good impetus to get its developers back on board re: array_api. Also the data interchange with cupy is quite good and anyone using zarr with torch for deep learning would probably have to write a data loader of some sort, no? So custom code already there as a layer around every read. Just some quick thoughts on that, but not as familiar.

Alternatively we could make the buffer class customizable to the point where you have an array-api compliant array as the "base" buffer, and then we decorate all functions that return arrays to return a dlpack representation in a different, less-compliant library i.e., cupy as base and then torch on top. Same thing would probably enable tensorflow (experimentally, maybe, just from a quick google): https://www.tensorflow.org/api_docs/python/tf/experimental/dlpack/from_dlpack + https://docs.cupy.dev/en/stable/user_guide/interoperability.html#dlpack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants