With OptiX 4.0, my applications are running over 100 times slower than they did with 3.9.1. The slow down seems to be mainly the first time each kernel runs. If a kernel runs multiple times, the slowdown on subsequent runs is pretty small, unless buffers and/or variables are updated, in which case it’s slow again. Any ideas what could be causing this? I’m not creating or destroying any new OptiX variables between kernel calls.
I can also hear the GPU fans revving up (they’re at 65% according to nvsmi, much louder than I’m used to).
One step I took was to update my Cmake code with the flag --gpu-architecture sm_30, which I hoped might give it a hint for more efficient ptx compilation. It didn’t seem to help.
I don’t see the same slowdown happening with the precompiled samples. The optixPathTracer example runs at 10 fps with OptiX 3.9.1 and 9 fps with OptiX 4.0 on my K4000.
I updated the drivers on both machines today - they’re the newest ones available.
Windows 7, Quadro K4000, driver 368.86, OptiX 4.0.0, CUDA Toolkit 7.5.
Windows 7, 2x Tesla K40, driver 354.92, OptiX 4.0.0, CUDA Toolkit 7.5.