-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
softmax kernel #106
base: master
Are you sure you want to change the base?
softmax kernel #106
Conversation
@SimonDanisch @MikeInnes What was the conclusion regarding this PR? |
I have a working, reasonably fast, but not very generic CUDA softmax in https://github.com/jekbradbury/Transformer.jl/blob/master/src/kernels.jl |
Yeah looks like it’s relatively CUDA-specific.
I wonder if it would be easier to port James’s kernel to OpenCL versus
writing the OpenCL softmax kernel from scratch.
On Tue, Jul 17, 2018 at 03:26 James Bradbury ***@***.***> wrote:
I have a working, reasonably fast, but not very generic CUDA softmax in
https://github.com/jekbradbury/Transformer.jl/blob/master/src/kernels.jl
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFXAraOOvJrUq65Uo0GsOlpZNlI13kPSks5uHZGMgaJpZM4SUB7x>
.
--
*Dilum Aluthge*
*dilum@aluthge.com <dilum@aluthge.com>*
*https://www.aluthge.com <https://www.aluthge.com>*
|
We can just port it to julia in a way that it works with both CLArrays + CuArrays. I already took a look at it - the only thing holding us back is, that @jekbradbury used dynamic shared memory, that behaves a bit peculiar compared to the CuStaticSharedMem (which is also supported by CLArrays, when you use the GPUArray version). I had a stab at supporting dynamic shared memory in GPUArrays vendor independently, but couldn't implement it in the time frame I set myself... In theory it's quite straightforward and I should make a PR out of what I had ;) |
I don't know if there's any particular reason Marian-NMT used dynamic shared memory for this rather than static. (Also, this kernel contains a reasonably fast mapreducedim implementation for reductions over the inner dim, so it would be useful to include that separately if someone works on porting) |
39e7783
to
fef2421
Compare
No description provided.