A number of people on this forum have reported spilling issues with CUDA 7.5 and have dutifully filed bugs.
It’s a real pain to deal with this issue. Hopefully an updated Toolkit gets released soon.
But until we see a new release of the Toolkit, here are a couple observations and possible workarounds…
One workaround that works some of the time is to make sure variables that are declared but might not be set are initialized with a default value.
I’ve made this change and in some instances the spills were removed.
There are two cases where I’ve seen this problem frequently appear:
- Initializing a variable with 1 thread and then broadcasting it to the rest of the warp:
int x; // <--- if not initialized sometimes results in spills
if (warp_lane_is_first())
x = atomicAdd(y,z); // or some other operation
x = __shfl(x,0); // broadcast lane 0 to rest of warp
This bug has been around a long time.
- Declaring several variables, initializing the first N out of M with loads from global or shared memory and then jumping to process the first N:
while (true)
{
int a,b,c,d,e,f; // <--- if not initialized on SM_5x + 7.5 RC results in spills
a = mem_ptr[offset+WARP_SIZE*0];
if (rem == 1)
goto process;
b = mem_ptr[offset+WARP_SIZE*1];
if (rem == 2)
goto process;
...
e = mem_ptr[offset+WARP_SIZE*6];
if (rem == 7)
goto process;
f = mem_ptr[offset+WARP_SIZE*7];
process:
// do something with a
if (rem == 1)
break;
...
// do something with f
if (rem == 8)
break;
offset += 8 * WARP_SIZE;
rem -= 8;
}
This Duff-like idiom is pretty common and if properly implemented doesn’t result in compiler warnings. Note that a switch statement generates similar SASS. [ When are we getting indirect branches? :) ]
Yet, on Maxwell + 7.5 RC I’m seeing spillage unless the declared variables are initialized. Although the initialization in (2) squelches spills the SASS shows unnecessary initializations.
- If you’re using ‘pointer + constant offset’ addressing and you’re compiling for a 64-bit target then double-check the SASS and verify that you’re seeing those offsets.
If you’re not, then make sure you’re using a signed integer offset. Otherwise, you’ll see additional register pressure.
You should be seeing SASS like this:
LDG.E.64 R36, [R28+0x100];
LDG.E.64 R44, [R28+0x300];
LDG.E.64 R42, [R28+0x200];
LDG.E.64 R52, [R28+0x400];
Be safe out there. :)