Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan : cmake integration #8119

Merged
merged 40 commits into from
Jul 13, 2024
Merged

Conversation

bandoti
Copy link
Contributor

@bandoti bandoti commented Jun 25, 2024

This change introduces a make and CMake build target for Vulkan shaders per #5356. This ensures ggml-vulkan-shaders.hpp is generated at build time (instead of storing in SCM). In addition, ggml-vulkan-shaders.cpp is added to move compiled shaders into its own translation unit.

In addition, this change updates the relocatable CMake package to link against the new ggml library.

@bandoti bandoti changed the title Vulkan CMake integration (#5356) Vulkan CMake integration Jun 25, 2024
@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 25, 2024
@github-actions github-actions bot added build Compilation issues script Script related Vulkan Issues specific to the Vulkan backend python python script changes labels Jun 25, 2024
@bandoti
Copy link
Contributor Author

bandoti commented Jun 26, 2024

The ubuntu-22-cmake-vulkan build is failing due to missing glslc executable. Looks like there's a couple ways to get the dependency on Ubuntu.

@netrunnereve
Copy link
Contributor

netrunnereve commented Jun 26, 2024

It's worth noting here that this will add glslc as a dependency on top of the Vulkan libs for anyone wishing to build this from scratch. This pretty much makes the Vulkan SDK a requirement unless they can get glslc from somewhere else.

@github-actions github-actions bot added the devops improvements to build systems and github actions label Jun 26, 2024
@0cc4m 0cc4m self-requested a review June 26, 2024 07:11
@bandoti
Copy link
Contributor Author

bandoti commented Jun 26, 2024

I agree the Vulkan SDK is somewhat heavy dependency and pulls in lots of graphics-related dependencies. For Arch and MSYS2 there are the vulkan-devel and shaderc packages. There appears to be similar packages for Ubuntu (newer than v 22) though I haven't been able to test those.

@github-actions github-actions bot added the nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment label Jun 27, 2024
@bandoti
Copy link
Contributor Author

bandoti commented Jul 3, 2024

@MaggotHATE You're correct, the change doesn't force either method. The user may choose to put the libraries within w64devkit if they so choose! Nothing requires one or the other because it depends what is done in the vulkan.pc file.

To get it to build though I had to add the $(shell pkg-config --cflags) because it couldn't find the Vulkan includes. This shouldn't affect your setup as it's not required to have Cflags:...

@MaggotHATE
Copy link
Contributor

To get it to build though I had to add the $(shell pkg-config --cflags) because it couldn't find the Vulkan includes.

Yes, I understand it, and that's what I'm hesitant about: w64devkit is about portability - which is lost in currently suggested workflow from Readme. I think "copy folders + create a correct vulkan.pc file" should be preferable for that reason:

Windows (w64devkit)

Download w64devkit and unpack it into a preferable folder.
Download Vulkan SDK installer and install to a preferable path.
Copy folders include and lib from VulkanSDK/*version of your SDK*/ into w64devkit/x86_64-w64-mingw32/.
Create a vulkan.pc file inside w64devkit/x86_64-w64-mingw32/lib/pkgconfig/ folder.
Open it as text file and add the following:

 Name: Vulkan-Loader
 Description: Vulkan Loader
 Version: 1.3.283
 Libs: -lvulkan-1

Run w64devkit.exe, switch into llama.cpp directory and build using Makefile:

make GGML_VULKAN=1

@bandoti
Copy link
Contributor Author

bandoti commented Jul 3, 2024

Understood—I'll add those steps in the readme. 😊

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 5, 2024
@bandoti bandoti changed the title Vulkan CMake integration vulkan : cmake integration Jul 6, 2024
@AndreasKunar
Copy link
Contributor

This is FYI only (for now):

I'm trying to get llama.cpp to work with Vulkan on SnapDragon X Copilot+PCs on Windows for Arm.

I installed the Vulkan-SDK for Windows (x64). Then I built Vulkan-Loader's vulkan1.lib as MSVC Windows arm64 .lib and replaced the Vulkan SDK's x64 lib with it. The main llama.cpp branch seems to build OK with 'cmake -B build -DGGML_VULKAN=1' (on MSVC), but it does not work.

I will try and maybe find/fix the issue with the main branch, before trying to test your PR. I will update you, when I have better results. But any ideas very welcome.

llama.cpp llama-cli log:

[1720599200] Log start
[1720599200] Cmd: build\bin\release\llama-cli.exe -i --temp 0.2 --n-predict -1 -m ..\models.llama.cpp\Llama-2-7b-Chat-GGUF\llama-2-7b-chat.Q4_0.gguf --interactive-first -cnv -ngl 99
[1720599200] main: build = 3353 (9925ca4)
[1720599200] main: built with MSVC 19.40.33811.0 for ARM64
[1720599200] main: seed = 1720599200
[1720599200] main: llama backend init
[1720599200] main: load the model and apply lora adapter, if any
[1720599200] llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ..\models.llama.cpp\Llama-2-7b-Chat-GGUF\llama-2-7b-chat.Q4_0.gguf (version GGUF V2)
[1720599200] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
[1720599200] llama_model_loader: - kv 0: general.architecture str = llama
[1720599200] llama_model_loader: - kv 1: general.name str = LLaMA v2
[1720599200] llama_model_loader: - kv 2: llama.context_length u32 = 4096
[1720599200] llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
[1720599200] llama_model_loader: - kv 4: llama.block_count u32 = 32
[1720599200] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
[1720599200] llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
[1720599200] llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
[1720599200] llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
[1720599200] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
[1720599200] llama_model_loader: - kv 10: general.file_type u32 = 2
[1720599200] llama_model_loader: - kv 11: tokenizer.ggml.model str = llama
[1720599200] llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<...
[1720599200] llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
[1720599200] llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
[1720599200] llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 = 1
[1720599200] llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 = 2
[1720599200] llama_model_loader: - kv 17: tokenizer.ggml.unknown_token_id u32 = 0
[1720599200] llama_model_loader: - kv 18: general.quantization_version u32 = 2
[1720599200] llama_model_loader: - type f32: 65 tensors
[1720599200] llama_model_loader: - type q4_0: 225 tensors
[1720599200] llama_model_loader: - type q6_K: 1 tensors
[1720599200] llm_load_vocab: special tokens cache size = 259
[1720599200] llm_load_vocab: token to piece cache size = 0.1684 MB
[1720599200] llm_load_print_meta: format = GGUF V2
[1720599200] llm_load_print_meta: arch = llama
[1720599200] llm_load_print_meta: vocab type = SPM
[1720599200] llm_load_print_meta: n_vocab = 32000
[1720599200] llm_load_print_meta: n_merges = 0
[1720599200] llm_load_print_meta: vocab_only = 0
[1720599200] llm_load_print_meta: n_ctx_train = 4096
[1720599200] llm_load_print_meta: n_embd = 4096
[1720599200] llm_load_print_meta: n_layer = 32
[1720599200] llm_load_print_meta: n_head = 32
[1720599200] llm_load_print_meta: n_head_kv = 32
[1720599200] llm_load_print_meta: n_rot = 128
[1720599200] llm_load_print_meta: n_swa = 0
[1720599200] llm_load_print_meta: n_embd_head_k = 128
[1720599200] llm_load_print_meta: n_embd_head_v = 128
[1720599200] llm_load_print_meta: n_gqa = 1
[1720599200] llm_load_print_meta: n_embd_k_gqa = 4096
[1720599200] llm_load_print_meta: n_embd_v_gqa = 4096
[1720599200] llm_load_print_meta: f_norm_eps = 0.0e+00
[1720599200] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
[1720599200] llm_load_print_meta: f_clamp_kqv = 0.0e+00
[1720599200] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
[1720599200] llm_load_print_meta: f_logit_scale = 0.0e+00
[1720599200] llm_load_print_meta: n_ff = 11008
[1720599200] llm_load_print_meta: n_expert = 0
[1720599200] llm_load_print_meta: n_expert_used = 0
[1720599200] llm_load_print_meta: causal attn = 1
[1720599200] llm_load_print_meta: pooling type = 0
[1720599200] llm_load_print_meta: rope type = 0
[1720599200] llm_load_print_meta: rope scaling = linear
[1720599200] llm_load_print_meta: freq_base_train = 10000.0
[1720599200] llm_load_print_meta: freq_scale_train = 1
[1720599200] llm_load_print_meta: n_ctx_orig_yarn = 4096
[1720599200] llm_load_print_meta: rope_finetuned = unknown
[1720599200] llm_load_print_meta: ssm_d_conv = 0
[1720599200] llm_load_print_meta: ssm_d_inner = 0
[1720599200] llm_load_print_meta: ssm_d_state = 0
[1720599200] llm_load_print_meta: ssm_dt_rank = 0
[1720599200] llm_load_print_meta: model type = 7B
[1720599200] llm_load_print_meta: model ftype = Q4_0
[1720599200] llm_load_print_meta: model params = 6.74 B
[1720599200] llm_load_print_meta: model size = 3.56 GiB (4.54 BPW)
[1720599200] llm_load_print_meta: general.name = LLaMA v2
[1720599200] llm_load_print_meta: BOS token = 1 ''
[1720599200] llm_load_print_meta: EOS token = 2 '
'
[1720599200] llm_load_print_meta: UNK token = 0 ''
[1720599200] llm_load_print_meta: LF token = 13 '<0x0A>'
[1720599200] llm_load_print_meta: max token length = 48
[1720599202] llama_model_load: error loading model: vk::Device::createComputePipeline: ErrorUnknown
[1720599202] llama_load_model_from_file: failed to load model
[1720599202] main: error: unable to load model

Windows vulkaininfoSDK --summary:

Vulkan Instance Version: 1.3.277

Instance Extensions: count = 13
VK_EXT_debug_report : extension revision 10
VK_EXT_debug_utils : extension revision 2
VK_KHR_device_group_creation : extension revision 1
VK_KHR_external_fence_capabilities : extension revision 1
VK_KHR_external_memory_capabilities : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2 : extension revision 1
VK_KHR_portability_enumeration : extension revision 1
VK_KHR_surface : extension revision 25
VK_KHR_surface_protected_capabilities : extension revision 1
VK_KHR_win32_surface : extension revision 6
VK_LUNARG_direct_driver_loading : extension revision 1

Instance Layers: count = 10
VK_LAYER_KHRONOS_profiles Khronos Profiles layer 1.3.283 version 1
VK_LAYER_KHRONOS_shader_object Khronos Shader object layer 1.3.283 version 1
VK_LAYER_KHRONOS_synchronization2 Khronos Synchronization2 layer 1.3.283 version 1
VK_LAYER_KHRONOS_validation Khronos Validation Layer 1.3.283 version 1
VK_LAYER_LUNARG_api_dump LunarG API dump layer 1.3.283 version 2
VK_LAYER_LUNARG_gfxreconstruct GFXReconstruct Capture Layer Version 1.0.4 1.3.283 version 4194308
VK_LAYER_LUNARG_monitor Execution Monitoring Layer 1.3.283 version 1
VK_LAYER_LUNARG_screenshot LunarG image capture layer 1.3.283 version 1
VK_LAYER_MSFT_driver_sorting Microsoft driver sorting layer 1.3.275 version 1
VK_LAYER_MSFT_driver_sorting Microsoft driver sorting layer 1.3.275 version 1

Devices:

GPU0:
apiVersion = 1.3.276
driverVersion = 0.778.0
vendorID = 0x5143
deviceID = 0x43050c01
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Snapdragon(R) X Plus - X1P64100 - Qualcomm(R) Adreno(TM) GPU
driverID = DRIVER_ID_QUALCOMM_PROPRIETARY
driverName = Qualcomm Technologies Inc. Adreno Vulkan Driver
driverInfo = Driver Build: , , 1661514817
Date: 08/26/2022
Compiler Version: E031.46.08.00
Driver Branch:
conformanceVersion = 1.3.6.0
deviceUUID = 43510000-0600-0000-12c5-6000140012c5
driverUUID = 04000000-0100-0000-0200-000000000000

GPU1:
apiVersion = 1.2.287
driverVersion = 24.1.99
vendorID = 0x4d4f4351
deviceID = 0x36334330
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = Microsoft Direct3D12 (Snapdragon(R) X Plus - X1P64100 - Qualcomm(R) Adreno(TM) GPU)
driverID = DRIVER_ID_MESA_DOZEN
driverName = Dozen
driverInfo = Mesa 24.2.0-devel (git-57f4f8520a)
conformanceVersion = 0.0.0.0
deviceUUID = 1ff76724-7b94-2593-0e2d-f4e24ac772d0
driverUUID = 4274c734-b178-8d51-5eb5-7af709d5e8d4

GPU2:
apiVersion = 1.2.287
driverVersion = 24.1.99
vendorID = 0x1414
deviceID = 0x008c
deviceType = PHYSICAL_DEVICE_TYPE_CPU
deviceName = Microsoft Direct3D12 (Microsoft Basic Render Driver)
driverID = DRIVER_ID_MESA_DOZEN
driverName = Dozen
driverInfo = Mesa 24.2.0-devel (git-57f4f8520a)
conformanceVersion = 0.0.0.0
deviceUUID = a3d805df-8b57-914f-fc75-bd5971402f6b
driverUUID = 4274c734-b178-8d51-5eb5-7af709d5e8d4

@0cc4m
Copy link
Collaborator

0cc4m commented Jul 10, 2024

@AndreasKunar That's great, can you open an issue about that and track your progress/problems there?

@bandoti I'm sorry I haven't managed to review this yet, I don't have much time at the moment. I'll try to do it this evening.

@AndreasKunar
Copy link
Contributor

@AndreasKunar That's great, can you open an issue about that and track your progress/problems there?

@bandoti I'm sorry I haven't managed to review this yet, I don't have much time at the moment. I'll try to do it this evening.

Thanks! I will first try and debug the vulkan1.lib, to make sure that it works (with the Vulkan samples). Once I know, that my WoA vulkan1.lib is OK, I will open an issue (probably not before late tomorrow).

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I tested it and it works perfectly, on Linux.

What I noticed that if you have Vulkan, but not glslc, you get a lot of spam like this:

cannot compile matmul_f32_f16_fp32

Vulkan_GLSLC_EXECUTABLE-NOTFOUND -fshader-stage=compute --target-env=vulkan1.2 -O /home/user/llama.cpp/ggml/src/vulkan-shaders/mul_mm.comp -o /home/user/llama.cpp/build_vk/ggml/src/vulkan-shaders.spv/matmul_f32_f16_fp32.spv -DB_TYPE=float16_t -DDATA_A_F32=1 -DD_TYPE=float -DFLOAT_TYPE=float

sh: 1: Vulkan_GLSLC_EXECUTABLE-NOTFOUND: not found

It should be possible to have it throw an error earlier if the shader compiler isn't found, right?

@bandoti
Copy link
Contributor Author

bandoti commented Jul 10, 2024

Ah—yeah, in that case we should probably add those deps to find_package(Vulkan COMPONENTS glslc REQUIRED). I'll get that change in shortly.

In a future pull request we can add checks for MoltenVk to get it working on Apple too. Would rather defer that though to get the broader change in there first. 😊

@0cc4m
Copy link
Collaborator

0cc4m commented Jul 10, 2024

Why would MoltenVK not work with these changes? It was working before.

@bandoti
Copy link
Contributor Author

bandoti commented Jul 10, 2024

I'd imagine it would work still! Just haven't tested it myself. I guess more specifically I meant in the find_package it can check for MoltenVk and cause an error if it's not found. However, I don't think that'll be an issue because it's kind of assumed on MacOS that this will be installed with Vulkan (unless it's an intel mac I guess).

.devops/nix/package.nix Outdated Show resolved Hide resolved
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I tested it in more detail. Works well with CMake and Make on Linux, and it shows issues and fails properly when compiling shaders. LGTM

@0cc4m
Copy link
Collaborator

0cc4m commented Jul 13, 2024

@ggerganov Considering this touches quite a few files beyond my Vulkan backend, can this be merged, or is there something we should wait for (like another PR or someone else's approval)?

@AndreasKunar
Copy link
Contributor

AndreasKunar commented Jul 13, 2024

I tested your scripts on Windows for Arm.

  1. without vulkan: llvm (--preset arm64-windows-llvm-release) builds OK, but MSVC builds (-preset arm64-windows-msvc-release and defaults) fail.

This is NOT because of your cmake/vulkan but because of a recent issue with ggml-aarch64.c raised by me with issue#8446 - they do not check for MSVC correctly and generate a gcc _asm_ directive, which aborts the compile.

If I fix ggml-aarch64.c with the correct MSVC-excluding compiler-conditionals, you also build for MSVC.

  1. with vulkan-support and MSVC (cmake -S . -B build -G "Visual Studio 17 2022" -A ARM64 -D GGML_VULKAN=1) and the fix above it builds. This is the only Vulkan build which worked for me on Windows for Arm in the main llama.cpp branch. But the vulkan on Windows for Arm issue remains (not the scope of your PR).

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@0cc4m It's good to merge 👍

@AndreasKunar The Arm issue is noted and will be resolved in the scope of #8446

get_directory_property(GGML_DIR_DEFINES DIRECTORY ggml/src COMPILE_DEFINITIONS)
get_target_property(GGML_TARGET_DEFINES ggml COMPILE_DEFINITIONS)
set(GGML_TRANSIENT_DEFINES ${GGML_TARGET_DEFINES} ${GGML_DIR_DEFINES})
get_target_property(GGML_LINK_LIBRARIES ggml LINK_LIBRARIES)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is needed - an example with a specific flag that is not exported would help. It's OK to merge as it is though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov I will take a closer look at this. The flags which should be exported should be all interface defines (public interface only) on the ggml/llama libraries. I will follow up with a new pull request.

@0cc4m 0cc4m merged commit 17eb6aa into ggerganov:master Jul 13, 2024
55 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions documentation Improvements or additions to documentation nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix script Script related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants