Releases · ngxson/llama.cpp

18 Nov 11:38

9b75f03

b4122 Latest

Latest

Vulkan: Fix device info output format specifiers (#10366)

* Vulkan: Fix device info output format specifiers

* Vulkan: Use zu printf specifier for size_t instead of ld

Assets 21

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-11-18T11:38:48Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-11-18T11:38:59Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-11-18T11:39:11Z
llama-b1-bin-win-hip-x64-gfx1100.zip

237 MB 2024-11-18T11:39:18Z
llama-b1-bin-win-hip-x64-gfx1101.zip

237 MB 2024-11-18T11:39:26Z
llama-b4122-bin-macos-arm64.zip

51 MB 2024-11-18T11:39:34Z
llama-b4122-bin-macos-x64.zip

51.9 MB 2024-11-18T11:39:36Z
llama-b4122-bin-ubuntu-x64.zip

56 MB 2024-11-18T11:39:39Z
llama-b4122-bin-win-avx-x64.zip

8.1 MB 2024-11-18T11:39:42Z
llama-b4122-bin-win-avx2-x64.zip

8.1 MB 2024-11-18T11:39:43Z
Source code (zip)

2024-11-18T10:02:43Z
Source code (tar.gz)

2024-11-18T10:02:43Z

17 Nov 23:32

github-actions

b4120

76e9e58

b4120

CUDA: fix MMV kernel being used for FP16 src1 (#10357)

Assets 21

17 Nov 12:36

github-actions

b4118

be5cacc

b4118

llama : only use default buffer types for the KV cache (#10358)

Assets 21

17 Nov 09:35

github-actions

b4114

c3ea58a

b4114

CUDA: remove DMMV, consolidate F16 mult mat vec (#10318)

Assets 21

17 Nov 07:33

github-actions

b4112

eda7e1d

b4112

ggml : fix possible buffer use after free in sched reserve (#9930)

Assets 21

16 Nov 23:44

github-actions

b4103

4e54be0

b4103

llama/ex: remove --logdir argument (#10339)

Assets 21

16 Nov 20:39

github-actions

b4102

db4cfd5

b4102

llamafile : fix include path (#0)

ggml-ci

Assets 21

16 Nov 14:30

github-actions

b4100

bcdb7a2

b4100

server: (web UI) Add samplers sequence customization (#10255)

* Samplers sequence: simplified and input field.

* Removed unused function

* Modify and use `settings-modal-short-input`

* rename "name" --> "label"

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

Assets 21

16 Nov 07:50

github-actions

b4098

772703c

b4098

vulkan: Optimize some mat-vec mul quant shaders (#10296)

Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.

Assets 21

16 Nov 02:45

github-actions

b4096

1e58ee1

b4096

ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324)

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ngxson/llama.cpp

b4122

b4120

b4118

b4114

b4112

b4103

b4102

b4100

b4098

b4096