Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.
The only way in which static memory might no be large enough is when the static allocation + memory allocation at kernel launch exceed the shared memory per SM.
You must Log In to send a PM.
Please Log In | Register to add a comment.
Not a member? Register Now