I have some additional information: another kernel in the same process does not have the crash, but it does not do much computation, only memory movement.
For anyone who comes across this problem, I needed to increase the timeout. To ensure the computer is still usable, the OS (Windows 10) was killing my kernel.
I fixed this by:
Running regedit from the search bar.
Navigating to "Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers"
Creating a new registry value called "TdrDelay" and setting it to 30 seconds.
Are you sure you are not overrunning the generous 10sec timeout? Do you have a way to try and reduce the amount of time in the kernel? Does your code run to completion outside of the profiler, if so how long does it take?
I am asking this as the error I got was related to the timeout on the GPU, so I am assuming that error code means that the GPU timedout.
Driver 384.98 shows the same behavior. Tried using 384.81, which is bundled with CUDA 9.0 toolkit, same issue. Is there any possibility to enable some extended logging to understand what causes this 4168 error?
I actually tried CUDA 9.0 toolkit with all the drivers (with the one bundled with the toolkit and the latest one) - both showed the same error.
Btw If I actually use the api calls cudaProfilerInitialize / cudaProfilerStart / cudaProfilerStop I get a valid output file, so it seems to be an issue with the profiling tool not the API as such (just my guess).
Win10, cuda9.1.85, drv 388.19, Visual Profiler works when running samples (tried simpleGL, transpose, freeImageInterop).
Win10, cuda9.1.85, drv 388.19, when trying to profile my custom application (with Visual Profiler) i get “Internal profiling error 4168:999”
EDIT:
Win10, cuda9.1.85, drv 388.19, when trying to profile my custom application (with nvprof from commandline, no additional options) i get “Internal profiling error 4168:999”