MLBuffer destruction timeline #716

reillyeon · 2024-07-02T01:45:45Z

While prototyping an implementation of the MLBuffer interface for Chromium's TFLite-based WebNN backend I've encountered a question: What should happen if a developer executes the following code?

let buffer = context.createBuffer(...);
context.writeBuffer(...);
let promise = context.readBuffer();
buffer.destroy();

As currently implemented promise will never be resolved. This is likely undesirable. We probably desire one of the following behaviors:

promise resolves with the contents of the buffer. More generally, the buffer is not destroyed until all pending operations referencing it are completed.
promise rejects (probably with an AbortError or InvalidStateError). More generally, any incomplete operation referencing the buffer is aborted (if possible).

Option 1 is consistent with the idea that readBuffer(), writeBuffer() and dispatch() all queue tasks to run on a "timeline" separate from JavaScript execution. destroy() would be one more such method. Option 2 seems useful for quickly releasing resources, but means the developer must take care to only call destroy() when they are truly done with the buffer (e.g. after awaiting the Promise returned by readBuffer()).

The text was updated successfully, but these errors were encountered:

bbernhar · 2024-07-09T22:26:40Z

I'm in favor of Option 1. This was the original proposal intent #542:

Destroy() gets called on the context timeline but doesn't actually release until the device signals completion.

huningxin · 2024-07-10T08:17:54Z

let buffer = context.createBuffer(...);
context.writeBuffer(...);
let promise = context.readBuffer();
buffer.destroy();

In the current Chromium prototype, the promise is never resolved nor rejected, because destroy() resets the mojo remote of the buffer. WebNN service still reads the data from GPU but fails (silently) to send the data back to renderer and never resolves the promise in JavaScript.

I guess destory() can be fixed by rejecting the pending promises that basically implements Option 2. WebNN service still needs to wait for GPU readback is done and releases the readback buffer. The backing resource of the destroyed buffer would be released earlier when handling the destory() / disconnection message.

e.g. after awaiting the Promise returned by readBuffer()

I suppose this is the typical usage.

bbernhar · 2024-07-23T20:18:12Z

Updated #543 with additional clarification on content/script timeline: destroy() must reject pending readBuffer promises.

reillyeon · 2024-07-24T00:10:57Z

The updated proposal defines what happens to the promises returned by readBuffer(), but what about pending dispatch operations? If only one buffer involved in a dispatch is destroyed do we cancel the dispatch and reject from any readBuffer() calls any of the output buffers?

bbernhar · 2024-07-24T17:07:37Z

If only one buffer involved in a dispatch is destroyed do we cancel the dispatch and reject from any readBuffer() calls any of the output buffers?

Thanks for raising the question, Reilly. I don't believe its possible to stop or cancel GPUs from executing in-flight commands without resetting the device. If a dispatch is pending (or still running on-device) then we can allow it to complete which should (on the device timeline) safely release any destroyed buffers used as input/output.

reillyeon · 2024-07-26T19:04:57Z

Thanks for raising the question, Reilly. I don't believe it's possible to stop or cancel GPUs from executing in-flight commands without resetting the device. If a dispatch is pending (or still running on-device) then we can allow it to complete which should (on the device timeline) safely release any destroyed buffers used as input/output.

In that case I'm not sure of the value of causing readBuffer() to reject since the compute is still going to happen and we don't actually free any resources immediately.

bbernhar · 2024-07-26T19:27:11Z

I don't feel strongly either way. However, I think it's atypical to call destroy() without first waiting on readBuffer(). Therefore, I'm okay with disallowing it and rejecting the current approach.

reillyeon · 2024-07-30T18:22:20Z

Thinking about this more with the additional context of MLContext.destroy() I think rejecting the promises immediately is the right behavior as it will make these two methods consistent, even though it makes the overall behavior somewhat inconsistent with the concept that MLBuffer operations all occur on the device timeline.

bbernhar · 2024-07-30T21:12:32Z

even though it makes the overall behavior somewhat inconsistent with the concept that MLBuffer operations all occur on the device timeline.

Could we have MLBuffer operations be defined across two timelines: script and device/queue. Then calling destroy() would reject pending promises and as its last step, issue the release operation for the device/queue timeline, as its first step.

MLContext.destroy() could effectively call MLBuffer.destroy()... But this leads me to wonder, why are MLGraph(s) not destroyable too? Like MLBuffer, MLGraph can hold copies of buffers (ie. a device-internal representation) which could benefit from predictable release (and consistent destruction).

reillyeon · 2024-08-02T18:08:05Z

Could we have MLBuffer operations be defined across two timelines: script and device/queue. Then calling destroy() would reject pending promises and as its last step, issue the release operation for the device/queue timeline, as its first step.

Yes, I think explicitly splitting the timelines in the method's algorithm will make this clear for developers and implementers.

MLContext.destroy() could effectively call MLBuffer.destroy()... But this leads me to wonder, why are MLGraph(s) not destroyable too? Like MLBuffer, MLGraph can hold copies of buffers (ie. a device-internal representation) which could benefit from predictable release (and consistent destruction).

+1 to adding MLGraph.destroy() and MLGraphBuilder.destroy() methods.

inexorabletash added the question label Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLBuffer destruction timeline #716

MLBuffer destruction timeline #716

reillyeon commented Jul 2, 2024

bbernhar commented Jul 9, 2024

huningxin commented Jul 10, 2024

bbernhar commented Jul 23, 2024

reillyeon commented Jul 24, 2024

bbernhar commented Jul 24, 2024

reillyeon commented Jul 26, 2024

bbernhar commented Jul 26, 2024

reillyeon commented Jul 30, 2024

bbernhar commented Jul 30, 2024

reillyeon commented Aug 2, 2024

MLBuffer destruction timeline #716

MLBuffer destruction timeline #716

Comments

reillyeon commented Jul 2, 2024

bbernhar commented Jul 9, 2024

huningxin commented Jul 10, 2024

bbernhar commented Jul 23, 2024

reillyeon commented Jul 24, 2024

bbernhar commented Jul 24, 2024

reillyeon commented Jul 26, 2024

bbernhar commented Jul 26, 2024

reillyeon commented Jul 30, 2024

bbernhar commented Jul 30, 2024

reillyeon commented Aug 2, 2024