Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Open
Zhuohao-Li opened this issue Aug 13, 2024 · 3 comments
Open

Error on building ops, "CUDART_MAX_NORMAL_FP16" is undefined #7

Zhuohao-Li opened this issue Aug 13, 2024 · 3 comments

Comments

@Zhuohao-Li
Copy link

Hi,

I tried to build on my own but something weird happens when I compile kernels and build end-to-end operators with PyBind.
The error comes both when make -j and bash setup.sh when link the ops.

Here is the details to reproduce it:

CMD:

(1)

cd kernels
mkdir build && cd build
cmake ..
make -j

(2)

cd quest/ops
bash setup.sh

log:

for (1) when compilation

[ 71%] Building CUDA object 3rdparty/nvbench/exec/CMakeFiles/nvbench.ctl.dir/nvbench-ctl.cu.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);
                     ^

[ 73%] Linking CXX shared library ../../../lib/libgmock.so
[ 73%] Built target gmock
[ 74%] Building CXX object 3rdparty/googletest/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
/home/ubuntu/zhuohao-dev-3/quest/kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

for (2) when linking

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(425): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
    local_max.fill(-CUDART_MAX_NORMAL_FP16);
                    ^

/home/ubuntu/zhuohao-dev-3/quest/quest/ops/../../kernels/include/decode/decode_page.cuh(516): error: identifier "CUDART_MAX_NORMAL_FP16" is undefined
     local_max.fill(-CUDART_MAX_NORMAL_FP16);

I make sure the include <cuda_fp16.h> is included in quest/kernels/include/decode/decode_page.cuh

Devices

Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal
NVIDIA Driver: 535.183.01
CUDA:12.1
cmake: 3.26.4
A100-SXM4-40GB
env var:

export PATH="/usr/local/cuda/bin:$PATH"
export PATH="/home/ubuntu/.local/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib"
export CUDA_INSTALL_PATH="/usr/local/cuda"
export CUDA_HOME="/usr/local/cuda"
export CC="/usr/bin/gcc-11"
export CXX="/usr/bin/g++-11"

I did not find CUDART_MAX_NORMAL_FP16 in cuda_fp16.hpp, can you please check with that? Or if I miss something, thanks!

@yangqy1
Copy link

yangqy1 commented Aug 16, 2024

I've encountered the same issue. Have you found a solution yet?

@happierpig
Copy link
Collaborator

Hi @Zhuohao-Li and @yangqy1 ,

Thanks for your interest in our project!!

CUDART_MAX_NORMAL_FP16 seems to be introduced by CUDA 12.4 (used in our experiments). Check doc for details. It's also okay to directly replace this macro definition with the correct constant value as a quick fix.

Hope this can solve your issues.

@yangqy1
Copy link

yangqy1 commented Aug 20, 2024

Hi @happierpig and @Zhuohao-Li ,

Thank you for your prompt response and helpful suggestions!

I successfully ran the quest/scripts/example_textgen.py using CUDA version 11.8 with an A800 GPU. Despite the issues I mentioned, I encountered two additional problems and found solutions for them as follows:

  1. Regarding the missing CUDART_MAX_NORMAL_FP16:

    • For the setup commands:
      cd quest/ops
      bash setup.sh
      
      I added #define CUDART_MAX_NORMAL_FP16 __ushort_as_half((unsigned short)0x7BFFU) right after #include <cuda_fp16.h> in quest/kernels/include/decode/decode_page.cuh.
    • For the optional build commands:
      cd kernels
      mkdir build && cd build
      cmake ..
      make -j
      
      In quest/kernels/src/test/test_page.cu, I inserted half fill_value = __float2half(-65504.0f); and replaced CUDART_MAX_NORMAL_FP16 with fill_value.
  2. When running the tests in quest/kernels/build, I encountered the error:

    Fail: Unexpected error: parallel_for failed: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device.
    

    Since my GPU is an A800 and the original code was compiled for an RTX 4090, I changed the compile-time parameters:
    I modified set(CMAKE_CUDA_ARCHITECTURES 89) to set(CMAKE_CUDA_ARCHITECTURES 80) in both quest/kernels/CMakeLists.txt and quest/quest/ops/CMakeLists.txt to match my GPU's capabilities. This resolved the issue after recompilation.

  3. When executing quest/scripts/example_textgen.py, I faced a CUDA error:

    RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
    

    By adding torch.cuda.set_device("cuda:0") and specifying the device as device="cuda:0" during model.quest_init(), I resolved the issue.

I hope this detailed explanation can help others facing similar issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants