The latest profiler (nvvp 7.5) showed me hot spots in memory dependency. After checking the assembly codes, I found that “LDL” instructions was used in some of the register variable operations.
I then used -Xptxas -v,-abi=no option with nvcc and printed the local memory info, I got the following report:
ptxas warning : 'option -abi=no' might get deprecated in future
ptxas info : 0 bytes gmem, 18704 bytes cmem[2]
ptxas info : Compiling entry function '_Z13mcx_main_loopPhPfS0_PjP6float4S3_S3_S0_S1_S0_S0_S0_S0_' for 'sm_20'
ptxas info : Used 59 registers, 136 bytes cmem[0], 64 bytes cmem[16], 96 bytes lmem
using -Xptxas -v, I also see there was no register spilling:
ptxas info : 0 bytes gmem, 18704 bytes cmem[2]
ptxas info : Compiling entry function '_Z13mcx_main_loopPhPfS0_PjP6float4S3_S3_S0_S1_S0_S0_S0_S0_' for 'sm_20'
ptxas info : Function properties for _Z13mcx_main_loopPhPfS0_PjP6float4S3_S3_S0_S1_S0_S0_S0_S0_
96 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 61 registers, 136 bytes cmem[0], 60 bytes cmem[16]
if I compile it for sm_52, the register number increases to 64.
From what I read online, the max register/thread for sm_20 and sm_52 are more than 59 and 64, respectively.
Then, my question is, why nvcc uses lmem to store some of the registers in my kernel (thus LDL instruction)? how can I find more details about this?