Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.
I think the best thing is for you to declare your arrays in shared memory and have full control over them.
You must Log In to send a PM.
Please Log In | Register to add a comment.
Not a member? Register Now