If yes, it means that if my blockDim.x=55 and each thread uses 100 registers, I will need 64 (NOT 55) * 100 = 6400 registers per block?
Thanks!
Adding --ptxas-options -v when compiling using nvcc, and it outputs the detailed register usage. You can then check the exact register allocation for a thread block with 55 threads and each thread uses 100 register.