-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX IQ Quants #7845
AVX IQ Quants #7845
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the test-backend-ops
passes
Yeah it runs and passes with
Considering how it only takes a minute to run I think it's worth adding the CPU version of |
The goal of |
I compared the AVX CPU vs the GPU results on my linux box and tests are passing. Should be good to merge |
Update hv/matmul up to: commit 557b653 (HEAD -> master, origin/master, origin/HEAD) Author: k.h.lai <adrian.k.h.lai@outlook.com> Date: Fri Jun 21 16:28:20 2024 +0800 vulkan: detect multiple devices by deviceUUID instead of deviceID (ggerganov#8022) commit 7d5e877 Author: Eve <139727413+netrunnereve@users.noreply.github.com> Date: Fri Jun 21 05:57:36 2024 +0000 ggml : AVX IQ quants (ggerganov#7845) ...
* initial iq4_xs * fix ci * iq4_nl * iq1_m * iq1_s * iq2_xxs * iq3_xxs * iq2_s * iq2_xs * iq3_s before sllv * iq3_s * iq3_s small fix * iq3_s sllv can be safely replaced with sse multiply
I finally had the time to work on original AVX versions of the IQ quants
ggml_vec_dot
for Sandy Bridge and Ivy Bridge users.Master:
PR:
Some example benchmarks:
The scalar IQ code is really slow on my computer, even with a 8B model. Pretty much any K quant of equivalent size can beat it with a 30B model! I mostly followed the original AVX2 implementation and converted the new 256-bit instructions into two 128-bit ones when required.