Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid adding a NullBuffer when decoding timestamp offsets #90

Merged
merged 1 commit into from
May 24, 2024

Conversation

progval
Copy link
Contributor

@progval progval commented May 23, 2024

Since 60288cd we applied an unary_opt kernel to decode timezones. This kernel always returns Some unless the date is outside the 1677-2262.

Unfortunately, even though the kernel is unlikely to return None, applying the kernel always causes the resulting array to get a NullBuffer, even if the source array did not have one.

In order to avoid unnecessarily adding a NullBuffer, this commit first tries to apply a non-nullable kernel; and only falls back to unary_opt in the rare case it fails.

An alternative implementation that does not risk running the kernel twice would be to check the NullBuffer's null_count after running the kernel then strip it if its null_count is zero; but it requires the unnecessary allocation of a NullBuffer.

Since 60288cd we applied an unary_opt
kernel to decode timezones. This kernel always returns `Some` unless
the date is outside the 1677-2262.

Unfortunately, even though the kernel is unlikely to return `None`,
applying the kernel always causes the resulting array to get a
`NullBuffer`, even if the source array did not have one.

In order to avoid unnecessarily adding a `NullBuffer`, this commit first
tries to apply a non-nullable kernel; and only falls back to `unary_opt`
in the rare case it fails.

An alternative implementation that does not risk running the kernel
twice would be to check the `NullBuffer`'s `null_count` after running
the kernel, then strip it if its `null_count` is zero; but it requires
the unnecessary allocation of a `NullBuffer`.
Copy link
Collaborator

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, thanks 👍

@Jefffrey Jefffrey merged commit ac5a8ab into datafusion-contrib:main May 24, 2024
9 checks passed
waynexia pushed a commit that referenced this pull request Oct 24, 2024
Since 60288cd we applied an unary_opt
kernel to decode timezones. This kernel always returns `Some` unless
the date is outside the 1677-2262.

Unfortunately, even though the kernel is unlikely to return `None`,
applying the kernel always causes the resulting array to get a
`NullBuffer`, even if the source array did not have one.

In order to avoid unnecessarily adding a `NullBuffer`, this commit first
tries to apply a non-nullable kernel; and only falls back to `unary_opt`
in the rare case it fails.

An alternative implementation that does not risk running the kernel
twice would be to check the `NullBuffer`'s `null_count` after running
the kernel, then strip it if its `null_count` is zero; but it requires
the unnecessary allocation of a `NullBuffer`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants