Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml: Optimize Q4_0 into Q4_0_X_Y repack #10324

Merged
merged 1 commit into from
Nov 16, 2024

Conversation

eddnjjn
Copy link
Contributor

@eddnjjn eddnjjn commented Nov 15, 2024

This pull request optimizes the code for repacking Q4_0 into Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Nov 15, 2024
Copy link
Collaborator

@slaren slaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, for me this cuts the load time by 2/3 on x86, even more on M3 Max.

@slaren slaren merged commit 1e58ee1 into ggerganov:master Nov 16, 2024
54 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants