Consider maximizing grid utilization #1321

maleadt · 2022-01-13T16:52:11Z

We currently maximize block utilization (taking the max threads), which may leave SMs underutilized. We should consider first selecting an optimal amount of blocks, before maximizing the thread could:

    config = launch_configuration(kernel.fun)
    threads = min(length(ps), config.threads)
    # XXX: this kernel performs much better with all blocks active
    blocks = max(cld(length(ps), threads), config.blocks)
    threads = cld(length(ps), blocks)

I'm sure this will lead to some kernels performing worse, though, but it's probably a good thing to test.

maleadt · 2022-01-21T14:45:06Z

Attempt in JuliaGPU/GPUArrays.jl#389

maleadt added good first issue Good for newcomers performance How fast can we go? labels Jan 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider maximizing grid utilization #1321

Consider maximizing grid utilization #1321

maleadt commented Jan 13, 2022

maleadt commented Jan 21, 2022

Consider maximizing grid utilization #1321

Consider maximizing grid utilization #1321

Comments

maleadt commented Jan 13, 2022

maleadt commented Jan 21, 2022