[Cuda Codegen] Emit launch bounds #526

thetheodor · 2018-06-19T12:53:55Z

Cuda functions can be annotated with launch bounds, that is the maximum
number of threads per block (the minimum blocks per multiprocessor can
also be specified). This information is used by nvrtc/nvcc during
register allocation (and probably other phases as well).

ftynse · 2018-06-19T15:09:15Z

did you check what happens if somebody manually maps to .mapToThreads(32,0,0) ?

Cuda functions can be annotated with launch bounds, that is the maximum number of threads per block (the minimum blocks per multiprocessor can also be specified). This information is used by nvrtc/nvcc during register allocation (and probably other phases as well).

thetheodor · 2018-06-20T07:12:02Z

Fixed.

skimo-openhub

I'm just putting a block on this because it fails for me with

[ RUN      ] TensorDot_32_512_8_2_28_28.BaseCorrect
unknown file: Failure
C++ exception with description "Error at: /home/skimo/git/c2isl/tc/core/cuda/cuda_rtc.cc:188: CUDA_ERROR_INVALID_VALUE" thrown in the test body.
[  FAILED  ] TensorDot_32_512_8_2_28_28.BaseCorrect (540 ms)

skimo-openhub · 2018-06-20T07:30:42Z

On Tue, Jun 19, 2018 at 08:09:18AM -0700, ftynse wrote: did you check what happens if somebody manually maps to `.mapToThreads(32,0,0)` ?

Does that make any sense? Surely, the kernel is not going to run at all in that case, so why bother with special cases for this situation? skimo

thetheodor · 2018-06-20T07:32:10Z

Oh, this test is failing for me as well. However, if I dump the cuda and compile it with nvcc, then I see no error.

skimo-openhub · 2018-06-20T07:42:25Z

tc/core/polyhedral/cuda/codegen.cc

+  auto b1 = block.view[1];
+  b1 = b1 == 0 ? 1 : b1;
+  auto b2 = block.view[2];
+  b1 = b2 == 0 ? 1 : b2;


This should be b2 instead of b1.
However, I would suggest you remove this special handling of 0.

nicolasvasilache

trying to unsubscribe, don't see a way other than approving

thetheodor requested review from ftynse, nicolasvasilache and skimo-openhub June 19, 2018 12:53

facebook-github-bot added the CLA Signed label Jun 19, 2018

thetheodor force-pushed the launch_bounds branch from 4ec077e to f6a78dc Compare June 20, 2018 07:11

skimo-openhub suggested changes Jun 20, 2018

View reviewed changes

skimo-openhub reviewed Jun 20, 2018

View reviewed changes

nicolasvasilache approved these changes Dec 11, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cuda Codegen] Emit launch bounds #526

[Cuda Codegen] Emit launch bounds #526

thetheodor commented Jun 19, 2018

ftynse commented Jun 19, 2018

thetheodor commented Jun 20, 2018

skimo-openhub left a comment

skimo-openhub commented Jun 20, 2018 via email

thetheodor commented Jun 20, 2018

skimo-openhub Jun 20, 2018

nicolasvasilache left a comment

[Cuda Codegen] Emit launch bounds #526

Are you sure you want to change the base?

[Cuda Codegen] Emit launch bounds #526

Conversation

thetheodor commented Jun 19, 2018

ftynse commented Jun 19, 2018

thetheodor commented Jun 20, 2018

skimo-openhub left a comment

Choose a reason for hiding this comment

skimo-openhub commented Jun 20, 2018 via email

thetheodor commented Jun 20, 2018

skimo-openhub Jun 20, 2018

Choose a reason for hiding this comment

nicolasvasilache left a comment

Choose a reason for hiding this comment