cudaErrorLaunchOutOfResources aka "too many resources requested for launch"
Greetings, I am getting error 7 (cudaErrorLaunchOutOfResources or "too many resources requested for launch") for the following config: GeForce GT 540M with 1GB. Compute Capability 2.1 (so 1024 threads should be possible, I have other kernels working just fine) block {1024, 1, 1} grid {1, 1, 1} No shared memory No texture Arguments to the kernel: (uint32_t , uint32_t , uint32_t , float2* , float2* ) Locals: 2 uint32_t, 4 float2, 1 float How large is the register file anyway? It runs fine when I drop the block.x to 512 and double the grid.x. Many thanks in advance.
Greetings,
I am getting error 7 (cudaErrorLaunchOutOfResources or "too many resources requested for launch") for the following config:
GeForce GT 540M with 1GB.
Compute Capability 2.1 (so 1024 threads should be possible, I have other kernels working just fine)
block {1024, 1, 1}
grid {1, 1, 1}
No shared memory
No texture
Arguments to the kernel: (uint32_t , uint32_t , uint32_t , float2* , float2* )
Locals: 2 uint32_t, 4 float2, 1 float
How large is the register file anyway?
It runs fine when I drop the block.x to 512 and double the grid.x.
Many thanks in advance.

#1
Posted 07/29/2013 12:17 AM   
Add the command line flag -Xptxas -v to the nvcc invocation to check how many registers the kernel is using. Note that simply multiplying the number of registers reported by the thread count can underestimate the total register usage, since architecture-specific granularity applies to register allocation. The occupancy calculator that ships with CUDA incorporates this granularity.
Add the command line flag -Xptxas -v to the nvcc invocation to check how many registers the kernel is using.

Note that simply multiplying the number of registers reported by the thread count can underestimate the total register usage, since architecture-specific granularity applies to register allocation. The occupancy calculator that ships with CUDA incorporates this granularity.

#2
Posted 07/29/2013 12:26 AM   
From compilation: ptxas info : Used 49 registers, 64 bytes cmem[0], 8 bytes cmem[14] From deviceQuery: Total number of registers available per block: 32768 49 x 1024 = 50176 > 32768 then goto resize block :-) Thank you very much njuffa! BTW which manual describes the compiler options?
From compilation:
ptxas info : Used 49 registers, 64 bytes cmem[0], 8 bytes cmem[14]

From deviceQuery:
Total number of registers available per block: 32768

49 x 1024 = 50176 > 32768 then goto resize block :-)

Thank you very much njuffa!
BTW which manual describes the compiler options?

#3
Posted 07/29/2013 12:48 AM   
The nvcc options are documented in CUDA_Compiler_Driver_NVCC.pdf (in the doc/ directory of the CUDA toolkit).
The nvcc options are documented in CUDA_Compiler_Driver_NVCC.pdf (in the doc/ directory of the CUDA toolkit).

#4
Posted 07/29/2013 03:15 PM   
Scroll To Top

Add Reply