CUDA 4.1 broke my kernel (won't be executed)
Hi,

My kernel is not working after I updated cuda from 4.0 to 4.1.

The original block dimension is 1024 threads and it work correctly under 4.0 but fail under 4.1(my kernel won't be executed by GPU).
I checked the register usage and each thread needs 46 registers.
I'm using GTX 580 as platform and there are 32768 registers per block on GTX580.

I've tried to use 512 threads per block then my kernel works correctly under 4.1.
It obviously is the resource shortage problem cause my kernel fail.

My questions are:
1- Why the register shortage problem doesn't shown under 4.0?
2- base on my knowledge, GPU will use global memory instead of register if the register usage is oversize. am I wrong??

Thanks in advance~

-----
I'm using:
GTX 580
driver: ver. 285.05.33 (64bit)
Toolkit: 4.1
OS: linux ubuntu 10.10 (Kernel: 2.6.35-30-generic)

<<this question also post on developer forum>>
Hi,



My kernel is not working after I updated cuda from 4.0 to 4.1.



The original block dimension is 1024 threads and it work correctly under 4.0 but fail under 4.1(my kernel won't be executed by GPU).

I checked the register usage and each thread needs 46 registers.

I'm using GTX 580 as platform and there are 32768 registers per block on GTX580.



I've tried to use 512 threads per block then my kernel works correctly under 4.1.

It obviously is the resource shortage problem cause my kernel fail.



My questions are:

1- Why the register shortage problem doesn't shown under 4.0?

2- base on my knowledge, GPU will use global memory instead of register if the register usage is oversize. am I wrong??



Thanks in advance~



-----

I'm using:

GTX 580

driver: ver. 285.05.33 (64bit)

Toolkit: 4.1

OS: linux ubuntu 10.10 (Kernel: 2.6.35-30-generic)



<<this question also post on developer forum>>

#1
Posted 02/07/2012 08:45 AM   
Use a [font="Courier New"]__launch_bounds__[/font] directive to tell the compiler about the intended execution configuration (so it can reduce register usage as appropriate). Check appendix B.18 of the Programming Guide.
Use a __launch_bounds__ directive to tell the compiler about the intended execution configuration (so it can reduce register usage as appropriate). Check appendix B.18 of the Programming Guide.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#2
Posted 02/07/2012 12:52 PM   
As tera point out, programmers can control register usage via the __launch_bounds__ attribute, so give that a try. In any event, running a single thread block of 1024 threads per SM may not give the best performance, in particular if the code uses __syncthreads().

How significant is the expansion in register use between CUDA 4.0 and CUDA 4.1? If it is fairly large factor, it would probably make sense to file a bug so the compiler team can investigate. I have not personally encountered any issues with significant increase in register pressure between CUDA 4.0 and CUDA 4.1 (in fact, I have seen a reduction in many cases), so I don't have any insights as to what the trigger for this may be.
As tera point out, programmers can control register usage via the __launch_bounds__ attribute, so give that a try. In any event, running a single thread block of 1024 threads per SM may not give the best performance, in particular if the code uses __syncthreads().



How significant is the expansion in register use between CUDA 4.0 and CUDA 4.1? If it is fairly large factor, it would probably make sense to file a bug so the compiler team can investigate. I have not personally encountered any issues with significant increase in register pressure between CUDA 4.0 and CUDA 4.1 (in fact, I have seen a reduction in many cases), so I don't have any insights as to what the trigger for this may be.

#3
Posted 02/07/2012 06:46 PM   
Scroll To Top