Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults in hiprtCreateGeometry when using distribution packaged hip #18

Open
littlewu2508 opened this issue Aug 15, 2023 · 9 comments

Comments

@littlewu2508
Copy link

I'm using 6700XT on Gentoo dev-util/hip-5.6.0 with upstream clang-16.0.6, and hiprt buildID_linux.txt: 453.

00_context_creation passed.

When executing 01_geom_intersection64D, it segfaults. The stack trace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/opt/gentoo/lib64/libthread_db.so.1".
[New Thread 0x7fffe8dff6c0 (LWP 3748500)]
[New Thread 0x7fffe3fff6c0 (LWP 3748501)]
[Thread 0x7fffe3fff6c0 (LWP 3748501) exited]
hiprt ver.02000
Executing on 'AMD Radeon RX 6700 XT'

Thread 1 "01_geom_interse" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 2 (Thread 0x7fffe8dff6c0 (LWP 3748500) "01_geom_interse"):
#0  0x00007ffff7a47c1b in ioctl () from /opt/gentoo/lib64/libc.so.6
#1  0x00007ffff78f1df0 in ?? () from /opt/gentoo/usr/lib64/libhsakmt.so.1
#2  0x00007ffff78eb295 in hsaKmtWaitOnMultipleEvents () from /opt/gentoo/usr/lib64/libhsakmt.so.1
#3  0x00007ffff52dc285 in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#4  0x00007ffff52b859e in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#5  0x00007ffff52d1f6a in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#6  0x00007ffff527e537 in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#7  0x00007ffff79d0299 in ?? () from /opt/gentoo/lib64/libc.so.6
#8  0x00007ffff7a5332c in ?? () from /opt/gentoo/lib64/libc.so.6

Thread 1 (Thread 0x7ffff7ea5740 (LWP 3748497) "01_geom_interse"):
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7f3804a in ?? () from ../../hiprt/linux64/libhiprt0200064.so
#2  0x00007ffff7f621a4 in hiprtCreateGeometry () from ../../hiprt/linux64/libhiprt0200064.so
#3  0x000055555556e2eb in Tutorial::run (this=0x7fffffffc3e0) at ../01_geom_intersection/main.cpp:69
#4  0x000055555556deb6 in main (argc=1, argv=0x7fffffffc548) at ../01_geom_intersection/main.cpp:96

If I use the amd's rocm distribution (at /opt/rocm), then it's the same issue with #15 (comment)

@meistdan
Copy link
Collaborator

meistdan commented Nov 9, 2023

Hi @littlewu2508 We have released a new version on https://gpuopen.com/hiprt/ Could you please try this version and let us know if the issue still persists?

@littlewu2508
Copy link
Author

Confirms that with the newest hiprtsdk (2.1.c202dac) the issue perssits.

Also I uses the Orochi bundled by hiprtsdk-2.0.0 because the newest one cause errors even on 00_context_creation:

Starting program: /data/wuyy/hiprt-2.1/tutorials/dist/bin/Debug/00_context_creation64D 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/opt/gentoo/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 1 (Thread 0x7ffff7e9e740 (LWP 3483337) "00_context_crea"):
#0  0x0000000000000000 in ?? ()
#1  0x0000555555559cf2 in oroGetErrorString (error=4294967295, pStr=0x7fffffffbf50) at ../../contrib/Orochi/Orochi/Orochi.cpp:242
#2  0x0000555555572ed6 in checkOro (res=4294967295, file=0x55555557b4e8 "../00_context_creation/main.cpp", line=29) at ../common/TutorialBase.cpp:33
#3  0x000055555556dbb5 in main (argc=1, argv=0x7fffffffc418) at ../00_context_creation/main.cpp:29

@meistdan
Copy link
Collaborator

Which version of ROCm do you use? The provided binaries are available with 5.7 (https://repo.radeon.com/amdgpu-install/23.20/ubuntu/focal/).

@littlewu2508
Copy link
Author

Which version of ROCm do you use? The provided binaries are available with 5.7 (https://repo.radeon.com/amdgpu-install/23.20/ubuntu/focal/).

I am using ROCm 5.7.1

@LAKostis
Copy link

Just confirming, neither 2.1-alt1.gc202dac nor v2.2.0e68f54 doesn't work with ROCm 5.7.1, I'm getting segfaults for all tutorials:

example segfault bt with 2.1

(gdb) run
Starting program: /opt/git/upstream/HIPRTSDK/tutorials/dist/bin/DebugGpu/01_geom_intersection64D 
Downloading separate debug info for system-supplied DSO at 0x7ffff7fc8000                                                                                                                                                                                                                 
Downloading separate debug info for /usr/lib64/libhiprt0200164.so                                                                                                                                                                                                                         
[Thread debugging using libthread_db enabled]                                                                                                                                                                                                                                             
Using host libthread_db library "/lib64/libthread_db.so.1".
Downloading separate debug info for /usr/lib64/libamdhip64.so
Missing separate debuginfo for /usr/lib64/libamdhip64.so.                                                                                                                                                                                                                                 
Try to install the hash file /usr/lib/debug/.build-id/10/78f70f65ce207875e9f834533bc0763834fdf2.debug
Downloading separate debug info for /usr/lib64/libhiprtc.so                                                                                                                                                                                                                               
Missing separate debuginfo for /usr/lib64/libhiprtc.so.                                                                                                                                                                                                                                   
Try to install the hash file /usr/lib/debug/.build-id/6a/dc9289b47bd759efbddd543d6361c8089e52d3.debug
[New Thread 0x7fffeda196c0 (LWP 150839)]
[New Thread 0x7ffeed1ff6c0 (LWP 150840)]
[Thread 0x7ffeed1ff6c0 (LWP 150840) exited]
hiprt ver.02001
Executing on 'AMD Radeon RX 6700 XT'

Thread 1 "01_geom_interse" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 2 (Thread 0x7fffeda196c0 (LWP 150839) "01_geom_interse"):
#0  __GI___ioctl (fd=fd@entry=3, request=request@entry=3222817548) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x00007ffff7d25f48 in kmtIoctl (fd=3, request=request@entry=3222817548, arg=arg@entry=0x7fffeda18bc0) at /usr/src/debug/roct-thunk-interface-5.7.1/src/libhsakmt.c:13
#2  0x00007ffff7d27150 in hsaKmtWaitOnMultipleEvents_Ext (event_age=0x7fffeda18c70, Milliseconds=4294967294, WaitOnAll=<optimized out>, NumEvents=3, Events=0x7fffeda18d00) at /usr/src/debug/roct-thunk-interface-5.7.1/src/events.c:407
#3  hsaKmtWaitOnMultipleEvents_Ext (Events=0x7fffeda18d00, NumEvents=3, WaitOnAll=<optimized out>, Milliseconds=4294967294, event_age=0x7fffeda18c70) at /usr/src/debug/roct-thunk-interface-5.7.1/src/events.c:378
#4  0x00007fffedc7d2be in rocr::core::Signal::WaitAny (signal_count=signal_count@entry=6, hsa_signals=hsa_signals@entry=0x7ffee8000de0, conds=conds@entry=0x7ffee8000be0, values=values@entry=0x7ffee8000e30, timeout=timeout@entry=18446744073709551615, wait_hint=<optimized out>, wait_hint@entry=HSA_WAIT_STATE_BLOCKED, satisfying_value=<optimized out>) at /usr/src/debug/rocr-runtime-5.7.1/src/core/runtime/signal.cpp:321
#5  0x00007fffedc5b21e in rocr::AMD::hsa_amd_signal_wait_any (signal_count=6, hsa_signals=0x7ffee8000de0, conds=0x7ffee8000be0, values=0x7ffee8000e30, timeout_hint=timeout_hint@entry=18446744073709551615, wait_hint=wait_hint@entry=HSA_WAIT_STATE_BLOCKED, satisfying_value=0x7fffeda18e38) at /usr/src/debug/rocr-runtime-5.7.1/src/core/runtime/hsa_ext_amd.cpp:572
#6  0x00007fffedc75bda in rocr::core::Runtime::AsyncEventsLoop () at /usr/src/debug/rocr-runtime-5.7.1/src/core/runtime/runtime.cpp:1125
#7  0x00007fffedc277b7 in rocr::os::ThreadTrampoline (arg=<optimized out>) at /usr/src/debug/rocr-runtime-5.7.1/src/core/util/lnx/os_linux.cpp:80
#8  0x00007ffff78a392b in start_thread (arg=<optimized out>) at pthread_create.c:444
#9  0x00007ffff7925cb8 in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7ffff7dab740 (LWP 150801) "01_geom_interse"):
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7f0306f in ?? () from /usr/lib64/libhiprt0200164.so
#2  0x00007ffff7f2afe7 in hiprtCreateGeometries () from /usr/lib64/libhiprt0200164.so
#3  0x00007ffff7f2b0af in hiprtCreateGeometry () from /usr/lib64/libhiprt0200164.so
#4  0x0000555555559ec2 in Tutorial::run (this=0x7fffffffdb00) at ../01_geom_intersection/main.cpp:69
#5  0x0000555555559a01 in main (argc=1, argv=0x7fffffffdc68) at ../01_geom_intersection/main.cpp:96

@meistdan
Copy link
Collaborator

Sorry for the late reply. Could you try this particular version of 5.7, plesae? https://repo.radeon.com/amdgpu-install/23.20/ubuntu/focal/

@meistdan
Copy link
Collaborator

meistdan commented Jan 19, 2024

Confirms that with the newest hiprtsdk (2.1.c202dac) the issue perssits.

Also I uses the Orochi bundled by hiprtsdk-2.0.0 because the newest one cause errors even on 00_context_creation:

Starting program: /data/wuyy/hiprt-2.1/tutorials/dist/bin/Debug/00_context_creation64D 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/opt/gentoo/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 1 (Thread 0x7ffff7e9e740 (LWP 3483337) "00_context_crea"):
#0  0x0000000000000000 in ?? ()
#1  0x0000555555559cf2 in oroGetErrorString (error=4294967295, pStr=0x7fffffffbf50) at ../../contrib/Orochi/Orochi/Orochi.cpp:242
#2  0x0000555555572ed6 in checkOro (res=4294967295, file=0x55555557b4e8 "../00_context_creation/main.cpp", line=29) at ../common/TutorialBase.cpp:33
#3  0x000055555556dbb5 in main (argc=1, argv=0x7fffffffc418) at ../00_context_creation/main.cpp:29

It seems that Orochi did not load the function. Could you check please these paths on your system? https://github.com/amdadvtech/Orochi/blob/cdf5c7624dd826335c2d2022ddfb770178cad46a/contrib/hipew/src/hipew.cpp#L295-L298

@littlewu2508
Copy link
Author

It seems that Orochi did not load the function. Could you check please these paths on your system? https://github.com/amdadvtech/Orochi/blob/cdf5c7624dd826335c2d2022ddfb770178cad46a/contrib/hipew/src/hipew.cpp#L295-L298

Oh, these locations does not exists on my system. My hip libraries are installed in /opt/gentoo/usr/lib64

After fixing this issue, I got similar issue with @LAKostis:

Thread 2 (Thread 0x7fffe89ff6c0 (LWP 62143) "01_geom_interse"):
#0  0x00007ffff7a5627b in ioctl () from /opt/gentoo/lib64/libc.so.6
#1  0x00007ffff7909e80 in ?? () from /opt/gentoo/usr/lib64/libhsakmt.so.1
#2  0x00007ffff7902ce6 in hsaKmtWaitOnMultipleEvents_Ext () from /opt/gentoo/usr/lib64/libhsakmt.so.1
#3  0x00007ffff52e52ca in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#4  0x00007ffff52bd30e in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#5  0x00007ffff52dafea in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#6  0x00007ffff52825a7 in ?? () from /opt/gentoo/usr/lib64/libhsa-runtime64.so.1
#7  0x00007ffff79e7069 in ?? () from /opt/gentoo/lib64/libc.so.6
#8  0x00007ffff7a5a708 in ?? () from /opt/gentoo/lib64/libc.so.6

Thread 1 (Thread 0x7ffff7e99740 (LWP 62117) "01_geom_interse"):
#0  0x0000000000000000 in ?? ()
#1  0x00007ffff7f3706f in ?? () from ../../hiprt/linux64/libhiprt0200164.so
#2  0x00007ffff7f5efe7 in hiprtCreateGeometries () from ../../hiprt/linux64/libhiprt0200164.so
#3  0x00007ffff7f5f0af in hiprtCreateGeometry () from ../../hiprt/linux64/libhiprt0200164.so
#4  0x000055555556e05c in Tutorial::run (this=0x7fffffffc240) at ../01_geom_intersection/main.cpp:69
#5  0x000055555556dbd8 in main (argc=1, argv=0x7fffffffc3a8) at ../01_geom_intersection/main.cpp:96

@meistdan
Copy link
Collaborator

Hello, meanwhile, we released source codes of HIPRT. I know it's not perfect solution but you can try to compile HIPRT for your system. The compilation should be straightforward: https://github.com/GPUOpen-LibrariesAndSDKs/HIPRT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants