Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

Open
theanh-ktmt opened this issue Nov 22, 2024 · 1 comment

Comments

@theanh-ktmt
Copy link

Description

When attempting to load ONNX checkpoints that have separated weight and bias files (common for ONNX files larger than 2GB) in TensorRT, the framework searches for these element files (weights and biases) in the current working directory instead of the directory that contains the ONNX checkpoint. This behavior is inconsistent with the ONNX export process, which places all element files into a single directory.

Error Message

When trying to load the exported ONNX checkpoints, the following error message is encountered:

[11/20/2024-18:55:41] [TRT] [E] WeightsContext.cpp:178: Failed to open file: stdit3_simplified.onnx.data
[11/20/2024-18:55:41] [TRT] [E] In node -1 with name:  and operator:  (parseGraph): INVALID_GRAPH: Failed to import initializer
In node -1 with name:  and operator:  (parseGraph): INVALID_GRAPH: Failed to import initializer

Proposed Solution

To resolve this issue, move all element files to the current working directory. However, this approach can become quite cumbersome and disorganized, especially for large models with hundreds of element files.

# error
current-working-directory
└── save
    ├── stdit3.onnx
    ├── linear.1.bias
    ├── linear.1.weight
    └── ...
    
# proposed solution
current-working-directory
├── save
│   └── stdit3.onnx
├── linear.1.bias
├── linear.1.weight
└── ...

A more systematic solution might be needed to handle the organization of these files effectively.

Environment

  • TensorRT Version: 10.5.0
  • NVIDIA GPU: NVIDIA A100 80GB
  • NVIDIA Driver Version: 545.23.06
  • CUDA Version: 12.1
  • CUDNN Version: 8.9.3
  • Operating System: Ubuntu 20.04.1 LTS
  • Python Version: 3.9.19
  • PyTorch Version: 2.2.2

Relevant Files

Relevant Files: https://drive.google.com/drive/folders/1nLdYn8nDPs79ZKNx8TssdQSj4x-p4qfl?usp=sharing

  • stdit3_simplified.onnx: ONNX checkpoint
  • stdit3_simplified.onnx.data: Its data

Steps To Reproduce

  1. Download ONNX checkpoint and its data from above Google Drive link, then structure working directory like this.
current-working-directory
└── save
    ├── stdit3_simplified.onnx
    └── stdit3_simplified.onnx.data
  1. Try to load with ONNX (success)
import onnx
path = "save/stdit3_simplified.onnx"
model = onnx.load(path)
  1. Try to parse with TensorRT (error, error message as mentioned above)
import tensorrt as trt

path = "save/stdit3_simplified.onnx"
trt_logger = trt.Logger()
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

with trt.Builder(trt_logger) as builder, \
        builder.create_network(explicit_batch_flag) as network, \
        builder.create_builder_config() as config:

        parser = trt.OnnxParser(network, trt_logger)
        with open(path , 'rb') as model:
            if not parser.parse(model.read()):
                for error in range(parser.num_errors):
                    logger.info(parser.get_error(error))
                return None
        print('Completed parsing ONNX model')
  1. Re-structure working directory like this:
current-working-directory
├── save
│   └── stdit3_simplified.onnx
└── stdit3_simplified.onnx.data
  1. Try to parse again with ONNX parser (success)
@lix19937
Copy link

Can you try to use trtexec --onnx=spec ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants