OptiX 4.0 runs slow

With OptiX 4.0, my applications are running over 100 times slower than they did with 3.9.1. The slow down seems to be mainly the first time each kernel runs. If a kernel runs multiple times, the slowdown on subsequent runs is pretty small, unless buffers and/or variables are updated, in which case it’s slow again. Any ideas what could be causing this? I’m not creating or destroying any new OptiX variables between kernel calls.

I can also hear the GPU fans revving up (they’re at 65% according to nvsmi, much louder than I’m used to).

One step I took was to update my Cmake code with the flag --gpu-architecture sm_30, which I hoped might give it a hint for more efficient ptx compilation. It didn’t seem to help.

I don’t see the same slowdown happening with the precompiled samples. The optixPathTracer example runs at 10 fps with OptiX 3.9.1 and 9 fps with OptiX 4.0 on my K4000.

I updated the drivers on both machines today - they’re the newest ones available.

Windows 7, Quadro K4000, driver 368.86, OptiX 4.0.0, CUDA Toolkit 7.5.
Windows 7, 2x Tesla K40, driver 354.92, OptiX 4.0.0, CUDA Toolkit 7.5.

Would you be able to provide a minimal reproducer OptiX API Capture (OAC) trace of that behaviour?

Instructions how to do that are in this thread: [url]https://devtalk.nvidia.com/default/topic/803116/?comment=4436953[/url]

I’ve just sent in two OAC traces from a minimal reproducer.

  • The first was created with OptiX 3.9.1 targeting sm_20. Median run time was under 1 second.
  • The second was created with OptiX 4.0.0 targeting sm_30 (although targeting sm_20 produces the same results). Median run time was 48 seconds.
  • I hope you can provide some guidance for how to speed up execution under the new OptiX version.

    It looks like most of the discrepency in runtime can be attributed to your use of optix exceptons. Your traces have enabled all optix exceptions – which is costly in both optix 3.9.1 and 4.0.0, similar to running your code in debug mode. Unfortunately, at this time 4.0 takes a bigger perf hit for exceptions than 3.9. For performance runs you should always revert to no exceptions or only stack overflow exceptions enabled.

    We will work on lowering the cost of optix exceptions in 4.0.