Strange behavior with CUDA Visual Profiler

I’ve implemented a simple program to load one slice from a 3D array and write that slice in a 2D output buffer. To check the results i copy the 2D array back to CPU and compare element by element with the corresponding slice in CPU. The elements are int32_t.

I’ve run the program 100 times individually in a GTX480 and a GTX580 using CUDA 4.0 for several days. The test was always passed. I’ve also run the program 100 times for both GPUs using cuda-memcheck and the test also passed and there were not errors reported by cuda-memcheck.

When i run the program in the Visual Profiler to analyze the kernel, the test fails sometimes!. Any hint about this strange behaviour with the Visual Profiler?

Thanks in advance