I have a program that requires repeatedly launching the same kernel and accumulating the results over time (a fairly typical Monte-Carlo type simulation). On OSX and Linux, everything runs fine and the kernel will successfully execute as many times as specified. On Windows, however, the kernel will successfully launch anywhere between once and a few dozen times before the program will crash with an access violation. The number of successful launches before a crash seems to be random.
Does anyone know of some possible reasons why this may happen?
“access violation” seems to hint at a problem in the host code rather than a failing kernel launch. Possible reasons could be failing memory allocation, out of bounds access, uninitialized data, race condition. Make sure the return status of all CUDA API calls is checked. Run the app with valgrind (or an equivalent tool on Windows).
Should I have misinterpreted the description and the problem is really a failing device kernel, run the app with cuda-memcheck, and also check kernel execution status carefully, there might be a timeout that occurs only on Windows, due to different time limits applied by the operating system’s watchdog timer.
Sounds like it could be fixed by disabling WDDM TDR in WIndows. If you have NSight Installed, there is an option from within Nsight Monitor to disable it. Otherwise, just disable it from the registry by either running this registry file, or navigating manually and adding/changing the TdrLevel value: