Can someone please let me know what this error means? I seems to get it by launching rtContextLaunch1D and my display driver stops working.
OptiX Error: Unknown Error (Details: Function “_rtContextLaunch1D” caught exception: Encountered a CUDA error: Kernel launch returned (702): Launch Timeout, [6619200])
That means that your OptiX/CUDA kernel did not finish before the operating system timeout kicked in assuming a kernel driver was hung and restarted the display driver to prevent a bluescreen. Microsoft calls that mechanism “Timeout Detection and Recovery” (TDR).
Under Windows Vista/7/8 Windows Display Driver Model (WDDM) that timeout is only two seconds!
Under Windows XP it was higher but then you got a bluescreen.
The possible solutions to these problems are
Do less work more often. (E.g. shoot fewer rays per launch, launch more often and combine the results.)
Use a Tesla board running in Tesla Compute Cluster (TCC) driver mode which is not affected by the OS display driver timeout.
Use faster GPUs.
Use more GPUs.
Increase the TDR timeout. (Absolutely not recommended for shipping applications!)
I’ve had the same issue, and in most cases I’ve solved it using rtContextSetTimeoutCallback(). From what I understand, OptiX calls the specified callback function at the time interval specified, which prevents timeouts.
However, sometimes (especially if I launch the same OptiX program repeatedly from my application) the callback function doesn’t seem to get called in time to prevent a timeout. Can someone explain the mechanism that OptiX uses to call the callback function, and what I can do to guarantee that it gets called before Windows calls its TDR mechanism?
Optix measures the time a warp has started against the time specified in timeout. Problems arise when for example you have a launch index that can take 0.5 seconds, the timeout is 1.8 seconds and the current elapsed time is 1.6 seconds. OptiX says it’s OK to proceed (1.6 < 1.8), but you get a timeout when you wanted to use 2.1 seconds.
Generally you want to set the timeout to be TDR - MaxTimeForSingleLaunchIndex. In the above example, you would set the timeout at 1.5 seconds if your TDR is 2 seconds.
Note that you can also adjust the TDR in your windows registry. There are instructions of how to do so in the CUDA Toolkit documentation.
In addition a Tesla device or a Quadro device put in TCC mode will not have the TDR limitation. I use a Tesla device, and I love it for OptiX work.