Due to licensing restrictions, we are unable to release the model. lm-eval 0.4.2 is used

For evaluating w4g128 without quantized lm-head,

lm_eval --model hf --model_args pretrained="./",autogptq=True,gptq_use_triton=True --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 16

To assess the performance of w4g128 with a quantized lm-head, we chose to evaluate the qdq model instead. This decision was made due to the lack of support for lm-head quantization in AutoGPTQ, which results in random initialization of the lm-head.

Metric	BF16	w4g128 w/o lm-head	w4g128 with lm-head qdq
Avg.	0.6352	0.6312	0.6303
mmlu	0.6386	0.6306	0.6318
winogrande	0.7143	0.7238	0.7269
truthfulqa_mc1	0.3623	0.3537	0.3525
rte	0.6751	0.6859	0.6679
piqa	0.7867	0.7797	0.7802
openbookqa	0.3400	0.3300	0.3320
lambada_openai	0.7182	0.7200	0.7173
hellaswag	0.5769	0.5699	0.5701
boolq	0.8297	0.8309	0.8284
arc_easy	0.8152	0.8089	0.8106
arc_challenge	0.5299	0.5102	0.5154

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-Llama-3-8B-Instruct-acc.md

Meta-Llama-3-8B-Instruct-acc.md

Files

Meta-Llama-3-8B-Instruct-acc.md

Latest commit

History

Meta-Llama-3-8B-Instruct-acc.md

File metadata and controls