Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checkpoint conversion script (/llama/convert_checkpoint.py) for Llama-3.2-3B-Instruct is failing with the following error #2339

Open
GaneshDoosa opened this issue Oct 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@GaneshDoosa
Copy link

[TensorRT-LLM] TensorRT-LLM version: 0.13.0
0.13.0
^M0it [00:00, ?it/s]^M139it [00:00, 1375.80it/s]^M201it [00:00, 1554.11it/s]
[1729020016.135793] [toyota-tom-buddy-ml-vm:879 :0] ucp_context.c:1774 UCX WARN UCP version is incompatible, required: 1.17, actual: 1.12 (release 1)
[1729020016.154083] [toyota-tom-buddy-ml-vm:879 :0] ucp_context.c:1774 UCX WARN UCP version is incompatible, required: 1.17, actual: 1.12 (release 1)
Traceback (most recent call last):
File "/tensorrtllm_backend/convert_checkpoint_v0.13.0.py", line 503, in
main()
File "/tensorrtllm_backend/convert_checkpoint_v0.13.0.py", line 495, in main
convert_and_save_hf(args)
File "/tensorrtllm_backend/convert_checkpoint_v0.13.0.py", line 437, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/tensorrtllm_backend/convert_checkpoint_v0.13.0.py", line 444, in execute
f(args, rank)
File "/tensorrtllm_backend/convert_checkpoint_v0.13.0.py", line 423, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 358, in from_hugging_face
loader.generate_tllm_weights(model)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 357, in generate_tllm_weights
self.load(tllm_key,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 278, in load
v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 391, in postprocess
weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

@Superjomn Superjomn added bug Something isn't working triaged Issue has been triaged by maintainers labels Oct 16, 2024
@mrakgr
Copy link

mrakgr commented Nov 3, 2024

I am getting this same error for meta-llama/Llama-3.2-3B. Here is the script I used to run it:

model=meta-llama/Llama-3.2-3B
suffix=bfloat16_1gpu_tp1
in_dir=/root/huggingface_local_models
out_conv_path=/root/trt_checkpoint_dir/$model/$suffix
out_build_path=/root/trt_build_dir/$model/$suffix

# Build LLaMA v3 3B TP=1 using HF checkpoints directly.
python3 convert_checkpoint.py --model_dir $in_dir/$model \
                            --output_dir $out_conv_path \
                            --dtype bfloat16 \
                            --tp_size 1
root@08478573f7bc:~/TensorRT-LLM/examples/llama# bash run.sh
[TensorRT-LLM] TensorRT-LLM version: 0.15.0.dev2024102900
0.15.0.dev2024102900
201it [00:00, 1037.89it/s]
Traceback (most recent call last):
  File "/root/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 529, in <module>
    main()
  File "/root/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 521, in main
    convert_and_save_hf(args)
  File "/root/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 463, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/root/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 470, in execute
    f(args, rank)
  File "/root/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 447, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 397, in from_hugging_face
    loader.generate_tllm_weights(model)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 403, in generate_tllm_weights
    self.load(tllm_key,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 291, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 390, in postprocess
    weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'

I got this same error when I tried running it with the quickstart script yesterday. Yesterday, I just took that script and substituted the model with the meta-llama/Llama-3.2-3B.

@JoJoLev
Copy link

JoJoLev commented Nov 17, 2024

Has anyone built an engine file for the llama3.2-3B Unquantized? I can get it to build and engine quantized, but the unquantized for the checkpoint conversion fails with the same error @GaneshDoosa and @mrakgr have.

@mrakgr
Copy link

mrakgr commented Nov 17, 2024

I've switched to LMDeploy by now.

@byshiue
Copy link
Collaborator

byshiue commented Nov 21, 2024

Please note that llama 3.2 is not supported in release 0.13. Please try on main branch.
Also, please add --use_embedding_sharing when you convert the llama 3.2 ckpt.

@byshiue byshiue self-assigned this Nov 21, 2024
@jingzhaoou
Copy link

jingzhaoou commented Dec 2, 2024

I checked out the r24.10 branch of tensorrtllm_backend. With the nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 docker image, I ran into the same error when converting checkpoints for model meta-llama/Llama-3.2-3B-Instruct. I resolved the issue by adding --use_embedding_sharing. Everything works great for an un-quantized model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants