ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

eddnjjn · 2024-11-15T21:56:50Z

This pull request optimizes the code for repacking Q4_0 into Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8.

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

slaren

Very nice, for me this cuts the load time by 2/3 on x86, even more on M3 Max.

ggml: Optimize Q4_0 into Q4_0_X_Y repack

8007cb0

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 15, 2024

slaren approved these changes Nov 16, 2024

View reviewed changes

slaren merged commit 1e58ee1 into ggerganov:master Nov 16, 2024
54 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024

ggml : optimize Q4_0 into Q4_0_X_Y repack (ggerganov#10324)

4d50702

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

eddnjjn commented Nov 15, 2024

slaren left a comment •

edited

Loading

ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

Conversation

eddnjjn commented Nov 15, 2024

slaren left a comment • edited Loading

Choose a reason for hiding this comment

slaren left a comment •

edited

Loading