Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect inline ptx device assembly code usage #766

Open
zhiweij1 opened this issue Oct 13, 2023 · 0 comments
Open

Incorrect inline ptx device assembly code usage #766

zhiweij1 opened this issue Oct 13, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@zhiweij1
Copy link

Branch/Tag/Commit

main

Docker Image Version

N/A

GPU name

N/A

CUDA Driver

N/A

Reproduced Steps

https://github.com/NVIDIA/FasterTransformer/blob/afdf9a9eb86f15363c0249117d166d6b45dbb371/src/fastertransformer/kernels/decoder_masked_multihead_attention/decoder_masked_multihead_attention_template.hpp#L643

A comma should be inserted between `"r"(a.x)` and `"r"(a.y)`.

Although it can be compiled by nvcc, but when using clang, clang will report an error: https://godbolt.org/z/jxd483a8j

I think it can be fixed in the FasterTransformer source code.
@zhiweij1 zhiweij1 added the bug Something isn't working label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant