-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: extend limit on number of regular arrays to concatenate #3312
fix: extend limit on number of regular arrays to concatenate #3312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpivarski - I think, adding a few specializations to the kernels fixes the issue.
It would solve the issue without changing the implementation. We're still not allowing UnionArrays to have more than 127 types, right? Is the idea that we keep the |
These kernels are only used to make an
Then this index is used to create a simplified "UnionArray " and return it as a RegularArray :awkward/src/awkward/operations/ak_concatenate.py Lines 278 to 288 in 4db7973
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this is fine—the helper functions were designed for UnionArrays, but here they're being used for concatenation and not UnionArrays.
Do you really need int32, uint32, and int64 versions? If it's just an intermediate step in the ak.concatenate
operation, why not make that intermediate step always int64? In general, the reason we have multiple dtypes of these nodes is to accept data with those dtypes from other sources without a copy-transformation. In fact, UnionArray's uint32 was motivated by a very early version of RNTuple—before that, Awkward didn't have any unsigned index types. 32-bit indexes in general were introduced for the sake of Arrow (along with BitMaskedArray).
Yes, initially I thought we do not need all of them. However, looking at the code again it seems, yes, we need all of them: these are specializations for the |
No description provided.