Optix Prime Crash

I am seeing a crash when trying to cast a large amount of rays using the RTP_CONTEXT_TYPE_CPU. I am able to reproduce the crash with the unmodified primeSimple example and the command line argument -w 25000 (both the precompiled version and a version built locally). The example does not crash if I use -c cuda nor does my application crash if use RTP_CONTEXT_TYPE_CUDA. The debugger just reports an unhandled exception (access violation) in an optix_prime.1.dll thread.

My computer has 32 GB of RAM with a K4200 and a K5000. My driver version is 364.51. I am using CUDA 7.5. I have tried building against both Optix 3.9 and Optix 4.0 beta.

Please let me know if more information is needed.

Hi, we have not been able to repro this on Linux with 32 GB of memory. Can you insert printfs in primeSimple.cc and get an idea of where it’s crashing? That file is organized into logical blocks marked with comments like “Create buffers for geometry data”.

Do you still see the crash if you search for occurrences of “LOCKED” in primeSimple.cpp, and change them to “UNLOCKED”, then rebuild? This sample uses locked (pinned) host memory by default for buffers of rays and hits. With the “-w 25000” flag, it uses about 18 GB of pinned memory. I want to make sure you’re not hitting an OS limit.

I don’t think you’re running out of RAM, since the peak memory is about 22 GB and you have 32 GB. Wouldn’t hurt to monitor memory usage just to be sure.

The ray buffer size is computed using 32-bit integer math, and if you go higher than “-w 25000” you may eventually overflow the integer range and crash even on a machine with enough memory. For example “-w 100000” (100k) will overflow while filling the ray buffer on the host. That’s something we could improve or warn about.

-Dylan

Thanks for the response! It’s crashing after the rtpQueryExecute call (line 182 in primeSimple.cpp) which matches where it crashes in my application.

LOCKED or UNLOCKED, the crash still occurs. I was actually able to seemingly narrow down the exact amount that causes the crash. Using RTP_BUFFER_FORMAT_RAY_ORIGIN_TMIN_DIRECTION_TMAX, if I allocate 67,108,864 rays the query will succeed. If I try just one more (67,108,865), however, it will crash. I believe 67,108,864 rays * 8 floats/ray * 4 bytes/float = 2,147,486,648 bytes = 2 GB exactly. I tested this limit by switching my Ray format to RTP_BUFFER_FORMAT_RAY_ORIGIN_DIRECTION and was able to increase the total rays to 89,478,486 (89,478,486 rays * 6 floats/ray * 4 bytes/float = 2,147,483,664 byte = just 16 bytes over 2GB; 89,478,487 rays crashes). Changing the hit format did not seem to affect the crash. And these amounts obviously barely touch the total RAM available.

I am also on Windows 7 and building with Visual Studio 2013.

Ok, this reproduces on Windows consistently and it’s a bug.

The 2 GB address limit was a good clue. When using RTP_CONTEXT_TYPE_CPU we have an optional fast code path that uses 32-bit arithmetic for pointer math, in cases where we think it’s safe to do this. There are safety checks on the size of the mesh data, but we neglected to do safety checks on the ray and hit buffer sizes. I’ll file a bug against OptiX 4.0.

For now, you can work around this problem by splitting the ray and hit buffers into smaller batches. There are some other reasons to use batches:

  • avoids driver timeouts
  • controls device memory usage when using RTP_CONTEXT_TYPE_CUDA, for applications where the number of rays depends on user input.
  • allows overlapping device and host processing by doing asynchronous CUDA launches. The “primeMultiBuffering” example shows this.

The case you found still needs to be fixed, though. Thanks for reporting it!

-Dylan

Thanks for investigating. I look forward to the fix in 4.0. The ‘batches’ workaround is fine for my application.