Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: extend limit on number of regular arrays to concatenate #3312

Conversation

ianna
Copy link
Collaborator

@ianna ianna commented Nov 21, 2024

No description provided.

@ianna ianna linked an issue Nov 21, 2024 that may be closed by this pull request
Copy link
Collaborator Author

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpivarski - I think, adding a few specializations to the kernels fixes the issue.

@ianna ianna requested a review from jpivarski November 21, 2024 17:04
@jpivarski
Copy link
Member

It would solve the issue without changing the implementation. We're still not allowing UnionArrays to have more than 127 types, right? Is the idea that we keep the axis != 0 RegularArray concatenation algorithm the way it is, allow bigger integer types in UnionArray.simplified, but raise an error if UnionArray.simplifed returns a UnionArray? (Which it shouldn't even be able to build because UnionArray constructors should continue to require tags to be int8.)

@ianna
Copy link
Collaborator Author

ianna commented Nov 21, 2024

It would solve the issue without changing the implementation. We're still not allowing UnionArrays to have more than 127 types, right? Is the idea that we keep the axis != 0 RegularArray concatenation algorithm the way it is, allow bigger integer types in UnionArray.simplified, but raise an error if UnionArray.simplifed returns a UnionArray? (Which it shouldn't even be able to build because UnionArray constructors should continue to require tags to be int8.)

These kernels are only used to make an index in:

index = ak.contents.UnionArray.regular_index(tags, backend=backend)

Then this index is used to create a simplified "UnionArray" and return it as a RegularArray:
index = ak.contents.UnionArray.regular_index(tags, backend=backend)
inner = ak.contents.UnionArray.simplified(
tags,
index,
[x._content for x in regulararrays],
mergebool=mergebool,
)
return (ak.contents.RegularArray(inner, prototype.size),)
elif all(

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is fine—the helper functions were designed for UnionArrays, but here they're being used for concatenation and not UnionArrays.

Do you really need int32, uint32, and int64 versions? If it's just an intermediate step in the ak.concatenate operation, why not make that intermediate step always int64? In general, the reason we have multiple dtypes of these nodes is to accept data with those dtypes from other sources without a copy-transformation. In fact, UnionArray's uint32 was motivated by a very early version of RNTuple—before that, Awkward didn't have any unsigned index types. 32-bit indexes in general were introduced for the sake of Arrow (along with BitMaskedArray).

@ianna
Copy link
Collaborator Author

ianna commented Nov 22, 2024

Okay, this is fine—the helper functions were designed for UnionArrays, but here they're being used for concatenation and not UnionArrays.

Do you really need int32, uint32, and int64 versions? If it's just an intermediate step in the ak.concatenate operation, why not make that intermediate step always int64? In general, the reason we have multiple dtypes of these nodes is to accept data with those dtypes from other sources without a copy-transformation. In fact, UnionArray's uint32 was motivated by a very early version of RNTuple—before that, Awkward didn't have any unsigned index types. 32-bit indexes in general were introduced for the sake of Arrow (along with BitMaskedArray).

Yes, initially I thought we do not need all of them. However, looking at the code again it seems, yes, we need all of them: these are specializations for the tags defined as either int8 or int64 type.

@ianna ianna merged commit 4eebc08 into main Nov 22, 2024
41 checks passed
@ianna ianna deleted the ianna/3310-akconcatenate-with-axis=1-fails-for-more-than-127-regular-length-arrays branch November 22, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ak.concatenate with axis=1 fails for more than 127 regular-length arrays
2 participants