-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to make 4bit pytorch_quantization model export to .engine model? #4262
Comments
Current the latest trtexec only support https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec |
In TensorRT10, it can use --int4 and won't report any errors, but the result is still FP32. |
Can you upload the build log when |
Sorry, I am wrong, from https://github.com/NVIDIA/TensorRT/blob/release/10.6/samples/common/sampleOptions.cpp#L1231 v10.6 has already support int4, which need meet the requirements:
|
but i use pytorch_quantization to add 4bit Q/DQ layer, but can't use torch.onnx.export export to onnx model. is there any way to do that? |
see https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_choosing_quant_methods.html
|
pytorch_quantization is supporting 4bit, ONNX is supporting 4bit, but torch.onnx.export is not support 4bit. How to make 4bit pytorch_quantization .pt model export to .engine model?
The text was updated successfully, but these errors were encountered: