Intel® Inspector Help
Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.
ID |
Code Location |
Description |
---|---|---|
1 |
Allocation site |
Represents source location of passing arguments from host to a kernel. |
By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.
In OpenCL™, arguments are specified directly using the clSetKernelArg function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.
In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as [=] or as [&], all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.
In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.
DPC++ Example
const int N = 1000; struct Data { double numbers[N]; }; Data data; queue.submit([&](cl::sycl::handler &cgh) { cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) { deviceData[0] += data.numbers[idx]; // Implicit usage of Data structure from host }); }); queue.wait(); // The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.
To avoid the problem, use the following hints: