-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does "weights_scaling_factor_2" mean in safetensor results of awq_w4a8 #2561
Comments
For a linear layer with GEMM shape [M, N, K], we need these components in the TRT-LLM layer:
The calculation process is: However, the checkpoint will have more parameters, here is how they are converted when building the engine. |
Thanks for your reply. However, in w4a8_awq, I found if quant_algo == QuantAlgo.W4A8_AWQ:
for name in list(weights):
if name.endswith('weights_scaling_factor'):
activation_scaling_factor = weights.pop(
name.replace('weights_scaling_factor',
'activation_scaling_factor'))
weights_scaling_factor_2 = weights.pop(
name.replace('weights_scaling_factor',
'weights_scaling_factor_2'))
weights[name] /= weights_scaling_factor_2
weights[name] = weights[name].to(torch.float16).view(
str_dtype_to_torch(model_config.dtype))
weights[name.replace(
'weights_scaling_factor',
'prequant_scaling_factor')] /= activation_scaling_factor
weights[name.replace(
'weights_scaling_factor', 'alpha'
)] = activation_scaling_factor * weights_scaling_factor_2 ,alpha semms to be computed as So, the calculation process of w4a8 is Am I right? |
Exactly.
|
I follow this step to do quantization for qwen2 model.
Then I got the safetensor results like
What does ' prequant_scaling_factor', 'activation_scaling_factor', 'weights_scaling_factor', 'weights_scaling_factor_2' mean. And how are they used in the w4a8 gemm?
The text was updated successfully, but these errors were encountered: