[Question] Int8 Gemm's perf degraded in real models. #2351

foreverlms · 2024-10-18T09:20:32Z

I have encountered a problem that whe benchmark the

TensorRT-LLM/cpp/tensorrt_llm/kernels/cutlass_kernels/int8_gemm/int8_gemm_template.h

Lines 62 to 63 in a65dba7

    
           void genericInt8GemmKernelLauncher(int8_t const* A, int8_t const* B, tk::QuantMode quantOption, float const* alphaCol, 
        
               float const* alphaRow, T* C, int m, int n, int k, tkc::CutlassGemmConfig gemmConfig, char* workspace,

Int8 cutlass kernel seperately, the kernel really beats the fp16 gemm kernel.
But when it comes to real model, the int8 gemm kernel's perf degraded a lot. (I did use gemmprofilerplugin).

Two pictures from the nsight:

Int8 Gemm (above: seperate kernels, below: in real models)

int8 in benchmark:

int8 in models:

Device: A100 SXM-80GB

For 16, 6144, 4096; 14us -> 24us, almost doubbled. The config is exactly the same as the nsight shows.

github-actions · 2024-12-02T02:10:25Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

foreverlms · 2024-12-02T14:56:58Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

!

foreverlms mentioned this issue Oct 21, 2024

[QST] Would there have possibilty that kernel's perf differ between unittest and real model? NVIDIA/cutlass#1888

Open

Superjomn added question Further information is requested Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers labels Oct 25, 2024

github-actions bot added the stale label Dec 2, 2024

github-actions bot removed the stale label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Int8 Gemm's perf degraded in real models. #2351

[Question] Int8 Gemm's perf degraded in real models. #2351

foreverlms commented Oct 18, 2024 •

edited

Loading

github-actions bot commented Dec 2, 2024

foreverlms commented Dec 2, 2024

[Question] Int8 Gemm's perf degraded in real models. #2351

[Question] Int8 Gemm's perf degraded in real models. #2351

Comments

foreverlms commented Oct 18, 2024 • edited Loading

github-actions bot commented Dec 2, 2024

foreverlms commented Dec 2, 2024

foreverlms commented Oct 18, 2024 •

edited

Loading