Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raytracing Tests Failing on AMD GPU #6727

Open
Tracked by #6762
cwfitzgerald opened this issue Dec 13, 2024 · 14 comments
Open
Tracked by #6762

Raytracing Tests Failing on AMD GPU #6727

cwfitzgerald opened this issue Dec 13, 2024 · 14 comments
Labels
api: vulkan Issues with Vulkan feature: raytracing Issues with the Ray Tracing Native Feature type: bug Something isn't working

Comments

@cwfitzgerald
Copy link
Member

The following tests are all failing with either a device OOM on bind group creation, or a segfault on both my AMD GPUs

     Summary [   4.082s] 54 tests run: 48 passed, 6 failed, 1305 skipped
       ABORT [   3.860s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::as_build::out_of_order_as_build
     Message [         ] code 0xc0000005: Invalid access to memory location. (os error 998)
        FAIL [   0.787s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::as_build::out_of_order_as_build_use
       ABORT [   3.986s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::as_build::unbuilt_blas     
     Message [         ] code 0xc0000005: Invalid access to memory location. (os error 998)
        FAIL [   0.912s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::as_use_after_free::acceleration_structure_use_after_free
       ABORT [   3.912s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::scene::acceleration_structure_build_no_index
     Message [         ] code 0xc0000005: Invalid access to memory location. (os error 998)
       ABORT [   4.026s] wgpu-test::wgpu-test [Executed] [Vulkan/AMD Radeon(TM) 890M Graphics/0] wgpu_test::ray_tracing::scene::acceleration_structure_build_with_index
     Message [         ] code 0xc0000005: Invalid access to memory location. (os error 998
@cwfitzgerald
Copy link
Member Author

One stack:

>	wgpu_test-a51d5c185a8577b7.exe!ash::extensions_generated::khr::acceleration_structure::Device::get_acceleration_structure_build_sizes(ash::vk::enums::AccelerationStructureBuildTypeKHR self, ash::vk::definitions::AccelerationStructureBuildGeometryInfoKHR * build_type, ref$<slice2$<u32>> build_info, ash::vk::definitions::AccelerationStructureBuildSizesInfoKHR * size_info) Line 275	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_hal::vulkan::device::impl$4::get_acceleration_structure_build_sizes(wgpu_hal::vulkan::Device * self, wgpu_hal::GetAccelerationStructureBuildSizesDescriptor<wgpu_hal::vulkan::Buffer> * desc) Line 2412	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_hal::dynamic::device::impl$0::get_acceleration_structure_build_sizes<wgpu_hal::vulkan::Device>(wgpu_hal::vulkan::Device * self, wgpu_hal::GetAccelerationStructureBuildSizesDescriptor<dyn$<wgpu_hal::dynamic::DynBuffer>> * desc) Line 501	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_core::device::resource::Device::create_blas(wgpu_types::CreateBlasDescriptor<enum2$<core::option::Option<enum2$<alloc::borrow::Cow<str$>>>>> * self, enum2$<wgpu_types::BlasGeometrySizeDescriptors> blas_desc) Line 69	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_core::global::Global::device_create_blas(wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Device>> self, wgpu_types::CreateBlasDescriptor<enum2$<core::option::Option<enum2$<alloc::borrow::Cow<str$>>>>> * device_id, enum2$<wgpu_types::BlasGeometrySizeDescriptors> desc, enum2$<core::option::Option<wgpu_core::id::Id<enum2$<wgpu_core::id::markers::Blas>>>> sizes) Line 204	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu::backend::wgpu_core::impl$13::create_blas(wgpu::backend::wgpu_core::CoreDevice * self, wgpu_types::CreateBlasDescriptor<enum2$<core::option::Option<ref$<str$>>>> * desc, enum2$<wgpu_types::BlasGeometrySizeDescriptors> sizes) Line 1447	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu::api::device::Device::create_blas(wgpu_types::CreateBlasDescriptor<enum2$<core::option::Option<ref$<str$>>>> * self, enum2$<wgpu_types::BlasGeometrySizeDescriptors> desc) Line 463	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_test::ray_tracing::as_build::AsBuildContext::new() Line 33	Rust
 	wgpu_test-a51d5c185a8577b7.exe!wgpu_test::ray_tracing::as_build::out_of_order_as_build(wgpu_test::run::TestingContext ctx) Line 122	Rust

@Vecvec
Copy link
Contributor

Vecvec commented Dec 14, 2024

was this one crashing with an OOM or a access violation?

@cwfitzgerald
Copy link
Member Author

Access violation - it crashed in amdvlk64.dll, but I missed those in the stack

@Vecvec
Copy link
Contributor

Vecvec commented Dec 14, 2024

Could you try hal ray-traced triangle? If it does crash, does it crash at blas or tlas size getting?

Nevermind I don't think that would help.

@Vecvec
Copy link
Contributor

Vecvec commented Dec 14, 2024

I wonder if it's trying to read from the vertex buffer (we don't and aren't required to set it), can you attach a program that does this to a debugger and see the address it's trying dereference? It would likely be 0.

@cwfitzgerald cwfitzgerald added feature: bindless Issues with Bindless Native Feature feature: raytracing Issues with the Ray Tracing Native Feature and removed feature: bindless Issues with Bindless Native Feature labels Dec 14, 2024
@Vecvec
Copy link
Contributor

Vecvec commented Dec 15, 2024

Since AMD's drivers are open source, is it possible to get debug symbols out of them? Knowing where the problem is occurring helps figure out what could be occurring.

@cwfitzgerald
Copy link
Member Author

Not the windows ones :)

@Vecvec
Copy link
Contributor

Vecvec commented Dec 15, 2024

Oh, that's annoying. Is it possible to (using a debugger) see at what address the access violation is occurring?

@cwfitzgerald
Copy link
Member Author

Yeah, I'll look when I've a minute

@cwfitzgerald
Copy link
Member Author

It's 0x8, but that's just adding 8 to a zero pointer.

image

If it helps this is the debugger dump of locals from ashes wrapper function,

@cwfitzgerald
Copy link
Member Author

cwfitzgerald commented Dec 15, 2024

Ah alright, it's first doing

rax = *(self.handle + 8 + 43144) // this is zero
// stuff
rcx = *(rax + 8)
00007FF95CA652B4  mov         rax,qword ptr [r15+0A888h]  
00007FF95CA652BB  lea         r8,[rbp-1]  
00007FF95CA652BF  lea         rdx,[rbp-51h]  
00007FF95CA652C3  mov         rcx,qword ptr [rax+8] 

r15 is self.handle + 8

@cwfitzgerald
Copy link
Member Author

image

Bad AS? The handle seems to point as approximately nothing

@Vecvec
Copy link
Contributor

Vecvec commented Dec 15, 2024

Bad AS? The handle seems to point as approximately nothing

At get_acceleration_structure_build_sizes there is no acceleration structure, that function call is to get its size so that it can be allocated. There are also no buffers as they should not impact size. (I'm assuming AS means acceleration structure in this context)

@nical
Copy link
Contributor

nical commented Dec 16, 2024

We should probably not expose the ray tracing feature on AMD until this stuff is figured out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan feature: raytracing Issues with the Ray Tracing Native Feature type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants