I am running into memory issues and was hoping to get some feedback from others to make sure everything sounds like it is working as expected.
First, I’ll give some background on the systems I am working with. The main system I work on has a GTX Titan GPU (6GB memory), and 64GB RAM on the host. The other system I have tried has a Tesla K80 (12GB memory) , and 32GB RAM on host. In both cases I’m running OptiX 3.9 with CUDA 7.0.
I have been running some scaling tests, and querying both host and device memory to get an idea of when and where I run out of memory for very large scenes. In both cases, the OS plus building all the geometry in my API (pre-OptiX) uses roughly 3 or 4 GB of RAM on the host. On the K80/32GB system, I’m running out of memory on the host at about 4 million primitives, yet at this point I have only used about 3GB of memory on device (still about 8 or 9GB left). I can watch the memory on the host draw down as my OptiX program progresses. After adding the geometry in OptiX and compiling the context, I find that I’ve used about 20GB of RAM on the host. Then I do a dummy launch with a single ray to build the acceleration structures, and that is when it runs out of memory on the host. On this particular system, it errors out at that point because it is not allowed to go into swap space. On my other system, it seems to keep going but with a considerable performance hit as it moves into swap space. So on the TITAN/64GB system I can keep running bigger and bigger problems until eventually the device runs out of memory at about 6 million primitives (for this case it uses about 40GB of memory on the host before even getting to the acceleration structure build!). So I would conclude that more than half of the host memory the program uses happens between the time the OptiX context is initialized and when it is compiled. A little less than half of the host memory is used up during the ‘dummy’ launch. When I do a back-of-envelope calculation of how much memory I should be using for buffers and variables, it should only be on the order of about 500MB, so it doesn’t seem to be coming from there.
So my question is - does this all sound like expected behavior, or may there be a memory problem going on somewhere? It seems crazy to me that OptiX would use so much memory on the host, especially even before the acceleration structures have been built. It also seems that memory usage is exponential as number of primitives are increased.
Thank you in advance for any comments.