-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement mapreduce #561
base: master
Are you sure you want to change the base?
Implement mapreduce #561
Conversation
Could these things maybe live in https://github.com/anicusan/AcceleratedKernels.jl in the future ? |
There is a dependency ordering issue, GPUArrays is the common infrastructure and this is would be the fallback implementation for a common implementation. So GPUArrays would need to take a dependency on something like AcceleratedKernels.jl |
Of course JLArrays doesn't work.. That uses the CPU backend and this is |
I was considering it as "leave it to AcceleratedKernels" to implement these. Well, it's a very young package, but I was wondering if it could be a path towards the future ;) |
Just to write down my current understanding of the JLArray issue:
Is not valid for the CPU in KA right now due to the synchronization within a GPU execution on all vendors should still work, and Arrays should have their own implementation somewhere else. It's just that the JLArray tests will fail for a bit here. |
# reduce_items = launch_configuration(kernel) | ||
reduce_items = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# reduce_items = launch_configuration(kernel) | |
reduce_items = 512 | |
# reduce_items = compute_items(launch_configuration(kernel)) | |
reduce_items = compute_items(512) |
But also has to become dynamic, of course.
# we need multiple steps to cover all values to reduce | ||
partial = similar(R, (size(R)..., reduce_groups)) | ||
if init === nothing | ||
# without an explicit initializer we need to copy from the output container | ||
partial .= R | ||
end | ||
reduce_kernel(f, op, init, Val(items), Rreduce, Rother, partial, A; ndrange) | ||
|
||
GPUArrays.mapreducedim!(identity, op, R′, partial; init=init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be a good time to add support for grid stride loops to KA.jl and handle this with a single kernel launch + atomic writes to global memory?
7348bba
to
f418d7a
Compare
4974a5e
to
2314e24
Compare
Ported from oneAPI.jl