Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MLBuffer destruction timeline #716

Open
reillyeon opened this issue Jul 2, 2024 · 10 comments
Open

MLBuffer destruction timeline #716

reillyeon opened this issue Jul 2, 2024 · 10 comments
Labels

Comments

@reillyeon
Copy link
Contributor

While prototyping an implementation of the MLBuffer interface for Chromium's TFLite-based WebNN backend I've encountered a question: What should happen if a developer executes the following code?

let buffer = context.createBuffer(...);
context.writeBuffer(...);
let promise = context.readBuffer();
buffer.destroy();

As currently implemented promise will never be resolved. This is likely undesirable. We probably desire one of the following behaviors:

  1. promise resolves with the contents of the buffer. More generally, the buffer is not destroyed until all pending operations referencing it are completed.
  2. promise rejects (probably with an AbortError or InvalidStateError). More generally, any incomplete operation referencing the buffer is aborted (if possible).

Option 1 is consistent with the idea that readBuffer(), writeBuffer() and dispatch() all queue tasks to run on a "timeline" separate from JavaScript execution. destroy() would be one more such method. Option 2 seems useful for quickly releasing resources, but means the developer must take care to only call destroy() when they are truly done with the buffer (e.g. after awaiting the Promise returned by readBuffer()).

@bbernhar
Copy link

bbernhar commented Jul 9, 2024

I'm in favor of Option 1. This was the original proposal intent #542:

Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.

@huningxin
Copy link
Contributor

let buffer = context.createBuffer(...);
context.writeBuffer(...);
let promise = context.readBuffer();
buffer.destroy();

In the current Chromium prototype, the promise is never resolved nor rejected, because destroy() resets the mojo remote of the buffer. WebNN service still reads the data from GPU but fails (silently) to send the data back to renderer and never resolves the promise in JavaScript.

I guess destory() can be fixed by rejecting the pending promises that basically implements Option 2. WebNN service still needs to wait for GPU readback is done and releases the readback buffer. The backing resource of the destroyed buffer would be released earlier when handling the destory() / disconnection message.

e.g. after awaiting the Promise returned by readBuffer()

I suppose this is the typical usage.

@bbernhar
Copy link

Updated #543 with additional clarification on content/script timeline: destroy() must reject pending readBuffer promises.

@reillyeon
Copy link
Contributor Author

The updated proposal defines what happens to the promises returned by readBuffer(), but what about pending dispatch operations? If only one buffer involved in a dispatch is destroyed do we cancel the dispatch and reject from any readBuffer() calls any of the output buffers?

@bbernhar
Copy link

If only one buffer involved in a dispatch is destroyed do we cancel the dispatch and reject from any readBuffer() calls any of the output buffers?

Thanks for raising the question, Reilly. I don't believe its possible to stop or cancel GPUs from executing in-flight commands without resetting the device. If a dispatch is pending (or still running on-device) then we can allow it to complete which should (on the device timeline) safely release any destroyed buffers used as input/output.

@reillyeon
Copy link
Contributor Author

Thanks for raising the question, Reilly. I don't believe it's possible to stop or cancel GPUs from executing in-flight commands without resetting the device. If a dispatch is pending (or still running on-device) then we can allow it to complete which should (on the device timeline) safely release any destroyed buffers used as input/output.

In that case I'm not sure of the value of causing readBuffer() to reject since the compute is still going to happen and we don't actually free any resources immediately.

@bbernhar
Copy link

I don't feel strongly either way. However, I think it's atypical to call destroy() without first waiting on readBuffer(). Therefore, I'm okay with disallowing it and rejecting the current approach.

@reillyeon
Copy link
Contributor Author

Thinking about this more with the additional context of MLContext.destroy() I think rejecting the promises immediately is the right behavior as it will make these two methods consistent, even though it makes the overall behavior somewhat inconsistent with the concept that MLBuffer operations all occur on the device timeline.

@bbernhar
Copy link

even though it makes the overall behavior somewhat inconsistent with the concept that MLBuffer operations all occur on the device timeline.

Could we have MLBuffer operations be defined across two timelines: script and device/queue. Then calling destroy() would reject pending promises and as its last step, issue the release operation for the device/queue timeline, as its first step.

MLContext.destroy() could effectively call MLBuffer.destroy()... But this leads me to wonder, why are MLGraph(s) not destroyable too? Like MLBuffer, MLGraph can hold copies of buffers (ie. a device-internal representation) which could benefit from predictable release (and consistent destruction).

@reillyeon
Copy link
Contributor Author

Could we have MLBuffer operations be defined across two timelines: script and device/queue. Then calling destroy() would reject pending promises and as its last step, issue the release operation for the device/queue timeline, as its first step.

Yes, I think explicitly splitting the timelines in the method's algorithm will make this clear for developers and implementers.

MLContext.destroy() could effectively call MLBuffer.destroy()... But this leads me to wonder, why are MLGraph(s) not destroyable too? Like MLBuffer, MLGraph can hold copies of buffers (ie. a device-internal representation) which could benefit from predictable release (and consistent destruction).

+1 to adding MLGraph.destroy() and MLGraphBuilder.destroy() methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants