Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sycl-exp : Re-enabled mul_mat_batched_sycl Path for batched Q*K & KQ*V #8057

Merged

Conversation

OuadiElfarouki
Copy link
Collaborator

This PR re-enables the MUL_MAT path for QK and QKV operations involving MKL's gemm_batch through ggml_sycl_mul_mat_batched_sycl(otherwise unused as the src1->type() == GGML_TYPE_F16 is never realized) .
Prompt Processing performance is restored, for e.g. :

  • Nvidia A100 + 70B Q4_K (b=p=512): 241 t/s -> 538 t/s
  • Intel Arc A770 + 13B Q4_K (b=p=512) : 363 t/s -> 587 t/s
  • Nvidia A4000 + 7B Q4_K (b=p=512) : 1710 t/s -> 2033 t/s

Text Generation performance remained the same / showed very small improvements.

@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 21, 2024
@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jun 21, 2024
@joeatodd
Copy link
Collaborator

@airMeng this PR might restore the perf regression you describe here?

Copy link
Collaborator

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outstanding to catch this small mistake!

@airMeng airMeng merged commit ea784c1 into ggerganov:codeplay/sycl-main Jun 24, 2024
65 checks passed
@joeatodd
Copy link
Collaborator

Outstanding to catch this small mistake!

Thanks for the feedback @airMeng - be aware this PR is to our temporary perf branch codeplay/sycl-main. You will want it in master too I guess!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants