[Question] Int8 Gemm's perf degraded in real models. #2351
Labels
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
question
Further information is requested
triaged
Issue has been triaged by maintainers
I have encountered a problem that whe benchmark the
TensorRT-LLM/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_template.h
Lines 62 to 63 in a65dba7
But when it comes to real model, the int8 gemm kernel's perf degraded a lot. (I did use gemmprofilerplugin).
Two pictures from the nsight:
Int8 Gemm (above: seperate kernels, below: in real models)
int8 in benchmark:
int8 in models:
Device: A100 SXM-80GB
For 16, 6144, 4096; 14us -> 24us, almost doubbled. The config is exactly the same as the nsight shows.
The text was updated successfully, but these errors were encountered: