Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Int8 Gemm's perf degraded in real models. #2351

Open
foreverlms opened this issue Oct 18, 2024 · 2 comments
Open

[Question] Int8 Gemm's perf degraded in real models. #2351

foreverlms opened this issue Oct 18, 2024 · 2 comments
Labels
Low Precision Issue about lower bit quantization, including int8, int4, fp8 question Further information is requested triaged Issue has been triaged by maintainers

Comments

@foreverlms
Copy link

foreverlms commented Oct 18, 2024

I have encountered a problem that whe benchmark the

void genericInt8GemmKernelLauncher(int8_t const* A, int8_t const* B, tk::QuantMode quantOption, float const* alphaCol,
float const* alphaRow, T* C, int m, int n, int k, tkc::CutlassGemmConfig gemmConfig, char* workspace,
Int8 cutlass kernel seperately, the kernel really beats the fp16 gemm kernel.
But when it comes to real model, the int8 gemm kernel's perf degraded a lot. (I did use gemmprofilerplugin).

Two pictures from the nsight:

Int8 Gemm (above: seperate kernels, below: in real models)

int8 in benchmark:

Image

int8 in models:

Image

Device: A100 SXM-80GB

For 16, 6144, 4096; 14us -> 24us, almost doubbled. The config is exactly the same as the nsight shows.

@Superjomn Superjomn added question Further information is requested Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers labels Oct 25, 2024
Copy link

github-actions bot commented Dec 2, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

@github-actions github-actions bot added the stale label Dec 2, 2024
@foreverlms
Copy link
Author

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

!

@github-actions github-actions bot removed the stale label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Low Precision Issue about lower bit quantization, including int8, int4, fp8 question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants