Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRT-LLM fails on GH200 node #2571

Open
4 tasks
ttim opened this issue Dec 12, 2024 · 1 comment
Open
4 tasks

TRT-LLM fails on GH200 node #2571

ttim opened this issue Dec 12, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ttim
Copy link
Contributor

ttim commented Dec 12, 2024

System Info

  • GH200 x1, lambdalabs
  • TRT LLM 0.15.0
  • nvcr.io/nvidia/pytorch:24.10-py3 base image

pip install succeeds, but running code fails with

  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 32, in <module>
    import tensorrt_llm.functional as functional
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 25, in <module>
    import tensorrt as trt
ModuleNotFoundError: No module named 'tensorrt'

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Use docker image nvcr.io/nvidia/pytorch:24.10-py3
  2. Install trt llm 0.15.0 version
  3. Try to run

Expected behavior

  • Works

actual behavior

  • Fails

additional notes

None

@ttim ttim added the bug Something isn't working label Dec 12, 2024
@nv-guomingz
Copy link
Collaborator

@Shixiaowei02 could u please answer this question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants