Releases: ngxson/llama.cpp
Releases · ngxson/llama.cpp
b4122
b4120
CUDA: fix MMV kernel being used for FP16 src1 (#10357)
b4118
llama : only use default buffer types for the KV cache (#10358)
b4114
CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)
b4112
ggml : fix possible buffer use after free in sched reserve (#9930)
b4103
llama/ex: remove --logdir argument (#10339)
b4102
llamafile : fix include path (#0) ggml-ci
b4100
server: (web UI) Add samplers sequence customization (#10255) * Samplers sequence: simplified and input field. * Removed unused function * Modify and use `settings-modal-short-input` * rename "name" --> "label" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b4098
vulkan: Optimize some mat-vec mul quant shaders (#10296) Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses the B loads across the rows and also reuses some addressing calculations. This required manually partially unrolling the loop, since the compiler is less willing to unroll outer loops. Add bounds-checking on the last iteration of the loop. I think this was at least partly broken before. Optimize the Q4_K shader to vectorize most loads and reduce the number of bit twiddling instructions.
b4096
ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)