Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.
- Maybe L2 performance has improved/grown so L1/Shared is not as important?
You must Log In to add a comment.
New Private Message
Follow Us On
Copyright © 2014 NVIDIA Corporation