[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. #126

ByronHsu · 2024-08-27T19:42:11Z

🚀 The feature, motivation and pitch

The official implementation of flash attention is in CUDA, so in AMD GPUs, users cannot easily use flash attention on transformers to training LLM. With the supports, we can unlock many exciting use cases on AMD. The code is already there at https://triton-lang.org/main/getting-started/tutorials/06-fused-attention.html.

Another option is to use flex-attn from PyTorch team, which uses torch.compile to optimize on top of existing handwritten triton kernels

Alternatives

No response

Additional context

No response

helloworld1 · 2024-08-27T20:20:12Z

The FA provided by https://github.com/Dao-AILab/flash-attention has only MI200 or MI300 GPUs. With Trition 3.0, the kernel can work on a much broad range of AMD GPUs. Tested kernels on AMD 7000 series working great.

thevasudevgupta · 2024-08-28T06:56:10Z

I implemented flash attention v1 as well in triton. Feel free to copy/adapt from here: https://github.com/thevasudevgupta/gpt-triton/blob/6a12b71e4e332a2077e6b7f742f97c7160fe0242/kernels.py#L376 (my repo is MIT license!!)

I might plan to work on v2/v3 version in future. Will let you know when I finish.

unclemusclez · 2024-08-29T12:01:29Z

Working Navi 31 / 7900 / gfx1100 support: https://github.com/ROCm/flash-attention/tree/howiejay/navi_support

remi-or · 2024-09-26T21:43:07Z

To resolve this issue using a triton kernel, I opened this PR: #275
While FlexAttention is still in the nightly releases only, this seems the only way to add FA monkey-patching without adding a dependancy to pytorch-nightly.

ByronHsu changed the title ~~Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs.~~ [AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. Aug 27, 2024

ByronHsu added AMD feature labels Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. #126

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. #126

ByronHsu commented Aug 27, 2024 •

edited

Loading

helloworld1 commented Aug 27, 2024

thevasudevgupta commented Aug 28, 2024

unclemusclez commented Aug 29, 2024 •

edited

Loading

remi-or commented Sep 26, 2024

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. #126

[AMD] Implement Flash Attention in Triton to enable transformers to run with Flash Attention on AMD GPUs. #126

Comments

ByronHsu commented Aug 27, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

helloworld1 commented Aug 27, 2024

thevasudevgupta commented Aug 28, 2024

unclemusclez commented Aug 29, 2024 • edited Loading

remi-or commented Sep 26, 2024

ByronHsu commented Aug 27, 2024 •

edited

Loading

unclemusclez commented Aug 29, 2024 •

edited

Loading