-
I was trying out some WOQ algorithms and was curious to know the perplexity scores for different algorithms (RTN/GPTQ/AWQ) and with different configs (Sym/Asym, Group-size). If something like this already exists, can someone pls point me to it? |
Beta Was this translation helpful? Give feedback.
Answered by
yiliu30
Feb 26, 2024
Replies: 1 comment 4 replies
-
Hi @VishalX. Thanks for your interest in our project! |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @VishalX, it is a very interesting question.
Let's take a close to the generate pipeline:
input_word
(str) ->tokenizer.encode(input_word)
->input_ids
(token) ->model
orq_model
->logits
(tensor) ->predicted_ids
(token) ->tokenizer.decode
->output_word
(str)The quantization process unavoidably introduces errors, causing the distribution of the quantized model's output (
logits
) to shift slightly from the output of the float model. These shifted logits are then converted topredicted_ids
(token), and the resulting token might construct words in another language.Let's encode the output back to tokens: