Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Zhuohao-Li · 2024-08-13T18:57:11Z

Hi,

I tried to build on my own but something weird happens when I compile kernels and build end-to-end operators with PyBind.
The error comes both when make -j and bash setup.sh when link the ops.

Here is the details to reproduce it:

CMD:

(1)

cd kernels
mkdir build && cd build
cmake ..
make -j

(2)

cd quest/ops
bash setup.sh

log:

for (1) when compilation

[ 71%] Building CUDA object 3rdparty/nvbench/exec/CMakeFiles/nvbench.ctl.dir/nvbench-ctl.cu.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);
                     ^

[ 73%] Linking CXX shared library ../../../lib/libgmock.so
[ 73%] Built target gmock
[ 74%] Building CXX object 3rdparty/googletest/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

for (2) when linking

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);

I make sure the include <cuda_fp16.h> is included in quest/kernels/include/decode/decode_page.cuh

Devices

Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
NVIDIA Driver: 535.183.01
CUDA:12.1
cmake: 3.26.4
A100-SXM4-40GB
env var:

export PATH="/usr/local/cuda/bin:$PATH"
export PATH="/home/ubuntu/.local/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib"
export CUDA_INSTALL_PATH="/usr/local/cuda"
export CUDA_HOME="/usr/local/cuda"
export CC="/usr/bin/gcc-11"
export CXX="/usr/bin/g++-11"

I did not find CUDART_MAX_NORMAL_FP16 in cuda_fp16.hpp, can you please check with that? Or if I miss something, thanks!

The text was updated successfully, but these errors were encountered:

yangqy1 · 2024-08-16T03:04:17Z

I've encountered the same issue. Have you found a solution yet?

happierpig · 2024-08-18T18:31:16Z

Hi @Zhuohao-Li and @yangqy1 ,

Thanks for your interest in our project!!

CUDART_MAX_NORMAL_FP16 seems to be introduced by CUDA 12.4 (used in our experiments). Check doc for details. It's also okay to directly replace this macro definition with the correct constant value as a quick fix.

Hope this can solve your issues.

yangqy1 · 2024-08-20T03:40:48Z

Hi @happierpig and @Zhuohao-Li ,

Thank you for your prompt response and helpful suggestions!

I successfully ran the quest/scripts/example_textgen.py using CUDA version 11.8 with an A800 GPU. Despite the issues I mentioned, I encountered two additional problems and found solutions for them as follows:

Regarding the missing CUDART_MAX_NORMAL_FP16:
- For the setup commands:
```
cd quest/ops
bash setup.sh
```
  I added #define CUDART_MAX_NORMAL_FP16 __ushort_as_half((unsigned short)0x7BFFU) right after #include <cuda_fp16.h> in quest/kernels/include/decode/decode_page.cuh.
- For the optional build commands:
```
cd kernels
mkdir build && cd build
cmake ..
make -j
```
  In quest/kernels/src/test/test_page.cu, I inserted half fill_value = __float2half(-65504.0f); and replaced CUDART_MAX_NORMAL_FP16 with fill_value.
When running the tests in quest/kernels/build, I encountered the error:
```
Fail: Unexpected error: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
```
Since my GPU is an A800 and the original code was compiled for an RTX 4090, I changed the compile-time parameters:
I modified set(CMAKE_CUDA_ARCHITECTURES 89) to set(CMAKE_CUDA_ARCHITECTURES 80) in both quest/kernels/CMakeLists.txt and quest/quest/ops/CMakeLists.txt to match my GPU's capabilities. This resolved the issue after recompilation.
When executing quest/scripts/example_textgen.py, I faced a CUDA error:
```
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
```
By adding torch.cuda.set_device("cuda:0") and specifying the device as device="cuda:0" during model.quest_init(), I resolved the issue.

I hope this detailed explanation can help others facing similar issues!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Zhuohao-Li commented Aug 13, 2024

yangqy1 commented Aug 16, 2024

happierpig commented Aug 18, 2024

yangqy1 commented Aug 20, 2024

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Comments

Zhuohao-Li commented Aug 13, 2024

log:

Devices

yangqy1 commented Aug 16, 2024

happierpig commented Aug 18, 2024

yangqy1 commented Aug 20, 2024