-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC]: 环境安装失败 #6066
Comments
Title: [DOC]: Environment installation failed |
You should use |
I tried to use BUILD_EXT=1 pip install . and it failed to build, please check the log files I uploaded. |
You should troubleshoot your issue from the following aspects (the provided log information is limited). First, check that there are no issues with your machine, for example, by running nvidia-smi to confirm the availability of the GPUs. Check environment variables such as CUDA_VISIBLE_DEVICES, and ensure that LD_LIBRARY_PATH and CUDA_HOME are pointing to the correct CUDA version. |
Oh, I got it, seems like it's keeping compiling the JIT kernel op, it really takes some time and you didn't finish the compiling. |
Yesterday I ran " instead of "CUDA_EXT=1 pip install .", it build successfully. Then I ran benchmark with "bash gemini.sh", it took long time without responding. |
(colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $CUDA_HOME (colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $PATH (colo01) root@DESKTOP-5H0EB03:~/ColossalAI# echo $LD_LIBRARY_PATH |
(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh |
(colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh [rank0]: During handling of the above exception, another exception occurred: [rank0]: Traceback (most recent call last): Exception ignored in: <function GeminiDDP.del at 0x7f2fc9e3f640>
|
Hi, can you try out GCC (Ubuntu 9.4.0-1ubuntu1~20.04.2) version 9.4.0? |
@flybird11111 Do you mean to degrade gcc 12.3 to 9.4? |
yes. |
@flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# sudo update-alternatives --config gcc Selection Path Priority Status0 /usr/bin/gcc-12 60 auto mode
Press to keep the current choice[*], or type selection number: 3 @flybird11111 (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# gcc --version (colo01) root@DESKTOP-5H0EB03:~/ColossalAI/examples/language/llama/scripts/benchmark_7B# bash gemini.sh [rank0]: During handling of the above exception, another exception occurred: [rank0]: Traceback (most recent call last): Exception ignored in: <function GeminiDDP.del at 0x7ff98200b640>
|
[rank0]: RuntimeError: CUDA error: out of memory, It seems that the issue is due to insufficient memory. |
@flybird11111 I changed benchmark.py degrade model to 1b, num_hidden_layers=1 and set cache to 256, colossalai run --nproc_per_node 1 benchmark.py -g -x -b 16 -c 1b -l 256 MODEL_CONFIGS = { |
📚 The doc issue
Win11安装 Ubuntu24.04子系统 WSL2
按照网站指导https://colossalai.org/zh-Hans/docs/get_started/installation
具体按照步骤如下:
export CUDA_INSTALL_DIR=/usr/local/cuda-12.1
export CUDA_HOME=/usr/local/cuda-12.1
export LD_LIBRARY_PATH=$CUDA_HOME"/lib64:$LD_LIBRARY_PATH"
export PATH=$CUDA_HOME"/bin:$PATH"
conda create -n colo01 python=3.10
conda activate colo01
export PATH=~/miniconda3/envs/colo01/bin:$PATH
sudo apt update
sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 60
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-12 60
sudo update-alternatives --config gcc
gcc --version
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda_12.1.0_530.30.02_linux.run
sudo sh cuda_12.1.0_530.30.02_linux.run
验证 CUDA 安装:nvidia-smi
conda install nvidia/label/cuda-12.1.0::cuda-toolkit
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI
pip install -r requirements/requirements.txt
CUDA_EXT=1 pip install .
安装相关的开发库
pip install transformers
pip install xformers
pip install datasets tensorboard
运行benchmark
Step1: 切换目录
cd examples/language/llama/scripts/benchmark_7B
修改gemini.sh
bash gemini.sh
执行后提示错误
[rank0]: ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
然后安装flashattention-2成功
pip install packaging
pip install ninja
ninja --version
echo $?
conda install -c conda-channel attention2
pip install flash-attn --no-build-isolation
再次执行bash gemini.sh,还是有错误。麻烦根据上传的log文件给予解答,最好能够完善安装文档,谢谢!
log.txt
The text was updated successfully, but these errors were encountered: