[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

theanh-ktmt · 2024-11-22T07:19:17Z

Description

When attempting to load ONNX checkpoints that have separated weight and bias files (common for ONNX files larger than 2GB) in TensorRT, the framework searches for these element files (weights and biases) in the current working directory instead of the directory that contains the ONNX checkpoint. This behavior is inconsistent with the ONNX export process, which places all element files into a single directory.

Error Message

When trying to load the exported ONNX checkpoints, the following error message is encountered:

[11/20/2024-18:55:41] [TRT] [E] WeightsContext.cpp:178: Failed to open file: stdit3_simplified.onnx.data
[11/20/2024-18:55:41] [TRT] [E] In node -1 with name:  and operator:  (parseGraph): INVALID_GRAPH: Failed to import initializer
In node -1 with name:  and operator:  (parseGraph): INVALID_GRAPH: Failed to import initializer

Proposed Solution

To resolve this issue, move all element files to the current working directory. However, this approach can become quite cumbersome and disorganized, especially for large models with hundreds of element files.

# error
current-working-directory
└── save
    ├── stdit3.onnx
    ├── linear.1.bias
    ├── linear.1.weight
    └── ...
    
# proposed solution
current-working-directory
├── save
│   └── stdit3.onnx
├── linear.1.bias
├── linear.1.weight
└── ...

A more systematic solution might be needed to handle the organization of these files effectively.

Environment

TensorRT Version: 10.5.0
NVIDIA GPU: NVIDIA A100 80GB
NVIDIA Driver Version: 545.23.06
CUDA Version: 12.1
CUDNN Version: 8.9.3
Operating System: Ubuntu 20.04.1 LTS
Python Version: 3.9.19
PyTorch Version: 2.2.2

Relevant Files

Relevant Files: https://drive.google.com/drive/folders/1nLdYn8nDPs79ZKNx8TssdQSj4x-p4qfl?usp=sharing

stdit3_simplified.onnx: ONNX checkpoint
stdit3_simplified.onnx.data: Its data

Steps To Reproduce

Download ONNX checkpoint and its data from above Google Drive link, then structure working directory like this.

current-working-directory
└── save
    ├── stdit3_simplified.onnx
    └── stdit3_simplified.onnx.data

Try to load with ONNX (success)

import onnx
path = "save/stdit3_simplified.onnx"
model = onnx.load(path)

Try to parse with TensorRT (error, error message as mentioned above)

import tensorrt as trt

path = "save/stdit3_simplified.onnx"
trt_logger = trt.Logger()
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)

with trt.Builder(trt_logger) as builder, \
        builder.create_network(explicit_batch_flag) as network, \
        builder.create_builder_config() as config:

        parser = trt.OnnxParser(network, trt_logger)
        with open(path , 'rb') as model:
            if not parser.parse(model.read()):
                for error in range(parser.num_errors):
                    logger.info(parser.get_error(error))
                return None
        print('Completed parsing ONNX model')

Re-structure working directory like this:

current-working-directory
├── save
│   └── stdit3_simplified.onnx
└── stdit3_simplified.onnx.data

Try to parse again with ONNX parser (success)

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-12-16T09:52:35Z

Can you try to use trtexec --onnx=spec ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

theanh-ktmt commented Nov 22, 2024

lix19937 commented Dec 16, 2024

[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

[ONNXParser] TensorRT Fails to Load ONNX Checkpoints with Separated Weight and Bias Files #4257

Comments

theanh-ktmt commented Nov 22, 2024

Description

Error Message

Proposed Solution

Environment

Relevant Files

Steps To Reproduce

lix19937 commented Dec 16, 2024