Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that the chunked packer bounce buffer is realease after the synchronize #11887

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

abellina
Copy link
Collaborator

Closes #11885

The issue this fixes is that the bounce buffer seen here cannot be closed (released back to the pool) before the stream synchronize that happens at the caller of .next. We need to return the bounce buffer and allow the caller to close, which can be done in the right order.

… synchronize

Signed-off-by: Alessandro Bellina <abellina@nvidia.com>
@abellina
Copy link
Collaborator Author

build

@@ -1719,7 +1719,7 @@ class ChunkedPacker(table: Table,
}

override def next(): (DeviceBounceBuffer, Long) = {
withResource(bounceBufferPool.nextBuffer()) { bounceBuffer =>
closeOnExcept(bounceBufferPool.nextBuffer()) { bounceBuffer =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to confirm that the bounce buffer will be closed by the caller of next? I am just confused because that should have caused a double free of some kind because we closed it and then returned it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is DeviceBounceBuffer, it is not DeviceMemoryBuffer. I can add a throw in DeviceBounceBuffer if we unacquire too many times causing the double free. The bug is that it returns the buffer back to the pool and now it is eligible for the next write.. doing bad bad things.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine, it is just confusing that we only "fixed" one side of it.

@sameerz sameerz added the bug Something isn't working label Dec 18, 2024
@abellina abellina merged commit 0e5b5cb into NVIDIA:branch-25.02 Dec 19, 2024
51 checks passed
@abellina abellina deleted the fix_spill_corruption branch December 19, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] data corruption with spill framework changes
3 participants