[kernel] added half2 specialization for layernorm kernel #139

dongxianzhe · 2024-04-22T06:14:10Z

optimize layernorm kernel using half2 type
test layernorm kernel

guocuimi · 2024-04-22T18:09:09Z

src/kernels/layernorm_kernels.cu

+}
+
+template <>
+void invoke_layernorm_kernel<half2>(half2* out,


sounds this template specializations are optional since they are covered by the general template. no?

guocuimi · 2024-04-22T18:10:10Z

src/kernels/layernorm_kernels.cu

+                                   const float epsilon,
+                                   int m,
+                                   int n) {
+  int half_n = n / 2;


what if n % 2 != 0?

sounds you didn't cover this in unittest.

guocuimi · 2024-04-22T18:12:14Z

src/kernels/layernrom_kernels_test.cu

+  float* dinput;
+  float* dweight;
+  float* dbias;
+  cudaMalloc((void**)&dout, sizeof(float) * m * n);


use torch::tensor to allocate memory

guocuimi · 2024-04-22T18:13:13Z

src/kernels/layernrom_kernels_test.cu

+      torch::nn::functional::LayerNormFuncOptions({n}).weight(weight).bias(
+          bias));
+
+  half* hout = (half*)malloc(m * n * sizeof(half));


guocuimi · 2024-04-22T18:14:16Z

src/kernels/layernrom_kernels_test.cu

+  cudaMemcpy(dweight, hweight, sizeof(half) * n, cudaMemcpyHostToDevice);
+  cudaMemcpy(dbias, hbias, sizeof(half) * n, cudaMemcpyHostToDevice);
+
+  llm::kernel::invoke_layernorm_kernel<half>(


just test llm::kernel::layer_norm instead but pass in different length of input to trigger different kernel.

guocuimi

thanks for adding the optimization. could you also add benchmark to show the improvements? thanks

…nitest and just test llm::kernel::layer_norm

guocuimi reviewed Apr 22, 2024

View reviewed changes

guocuimi changed the title ~~[op] layernorm kernel~~ [kernel] added half2 specialization for layernorm kernel Apr 22, 2024

Xianzhe Dong added 4 commits April 27, 2024 02:11

[op] optimize layernorm kernel for half2 type

bd5b91f

[ut] add layernorm kernel unitest

14a99f7

use gtest library rewrite layernorm kernel unitest

6ff1738

added layernorm kernel half2 unit test using gtest library

bc9f7e2

dongxianzhe force-pushed the op/layernorm_kernel branch from e18e337 to bc9f7e2 Compare April 27, 2024 06:22

[refactor] use torch::tensor to allocate memory in layernorm kernel u…

faaec07

…nitest and just test llm::kernel::layer_norm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kernel] added half2 specialization for layernorm kernel #139

[kernel] added half2 specialization for layernorm kernel #139

dongxianzhe commented Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi Apr 22, 2024

guocuimi left a comment

[kernel] added half2 specialization for layernorm kernel #139

Are you sure you want to change the base?

[kernel] added half2 specialization for layernorm kernel #139

Conversation

dongxianzhe commented Apr 22, 2024

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi Apr 22, 2024

Choose a reason for hiding this comment

guocuimi left a comment

Choose a reason for hiding this comment