Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 1.8 KB

Meta-Llama-3-8B-Instruct-acc.md

File metadata and controls

24 lines (20 loc) · 1.8 KB

Due to licensing restrictions, we are unable to release the model. lm-eval 0.4.2 is used

For evaluating w4g128 without quantized lm-head,

lm_eval --model hf --model_args pretrained="./",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 16

To assess the performance of w4g128 with a quantized lm-head, we chose to evaluate the qdq model instead. This decision was made due to the lack of support for lm-head quantization in AutoGPTQ, which results in random initialization of the lm-head.

Metric BF16 w4g128 w/o lm-head w4g128 with lm-head qdq
Avg. 0.6352 0.6312 0.6303
mmlu 0.6386 0.6306 0.6318
winogrande 0.7143 0.7238 0.7269
truthfulqa_mc1 0.3623 0.3537 0.3525
rte 0.6751 0.6859 0.6679
piqa 0.7867 0.7797 0.7802
openbookqa 0.3400 0.3300 0.3320
lambada_openai 0.7182 0.7200 0.7173
hellaswag 0.5769 0.5699 0.5701
boolq 0.8297 0.8309 0.8284
arc_easy 0.8152 0.8089 0.8106
arc_challenge 0.5299 0.5102 0.5154