You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:
config =launch_configuration(kernel.fun)
threads =min(length(ps), config.threads)
#XXX: this kernel performs much better with all blocks active
blocks =max(cld(length(ps), threads), config.blocks)
threads =cld(length(ps), blocks)
I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.
The text was updated successfully, but these errors were encountered:
We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:
I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.
The text was updated successfully, but these errors were encountered: