Thread count is being ignored (Optix Prime 4.0.2)

Summary:
After setting the thread count for an Optix Prime CPU context, the call to QueryExecute ignores this parameter and uses far more threads than were specified.

How to duplicate this behavior:
This can be duplicated by editing two lines in the primeSimple application that ships with the SDK.

  1. After creating the context (L114), explicitly set the number of threads.
    CHK_PRIME(rtpContextSetCpuThreads(context, 2)); // Max out at 2 threads
    
  2. Wrap the query execute in an infinite loop (L189).
    while(true) { CHK_PRIME( rtpQueryExecute( query, 0 /* hints */ ) ); }
    
  3. Recompile, then run the example with no additional arguments (i.e. ./primeSimple).
  4. Open a process monitor (e.g. top). The example process will be using more than 2 logical CPUs.

Environment:

  • Operating System = RHEL 7.3
  • Compiler = GNU 4.8.5
  • CPU = Intel Xeon E5-2683v4
  • OptiX Version = 4.0.2
  • Question:
    Is this behavior a bug, or have I clearly misunderstood the purpose of rtpContextSetCpuThreads?

    Thanks for the report. This does indeed look like a bug on our end – I was able to reproduce. We will look into it.

    This bug still exists in OptiX Prime version 4.1.1. Do you know if there are any plans to fix this in OptiX 5.0? I would like to benchmark running OptiX Prime with a single CPU thread, but it can’t be forced to do that with this problem.

    The documentation for rtpContextSetCpuThreads() states that by default one ray tracing thread is created per CPU core. This normally appears to be the case, but when I run a OpenMPI program that calls OptiX it seems to use a maximum of 2 cores, even though 32 are available. If I run the same program outside of mpirun, it appears to use a maximum of 32 cores.

    What function does OptiX use on Linux to programmatically determine the number of available cores?

    This bug is being tracked internally, but no promises on when it will be fixed.

    I PM’d you some info about how Prime determines number of available cores on Linux.

    Stepping back a second, it sounds like you might be intending to use Prime as a pure CPU raytracer, given that the machine specs above don’t include a GPU. Prime is first and foremost a GPU ray tracer. That’s where most of our engineering effort goes. I would expect your benchmarks to reflect this.

    We are using OptiX for ray tracing in order to enable the generation of optical and radar signatures. As part of this processing, we perform the ray tracing on GPUs (and fallback to CPUs when one is not available) and then perform additional processing on CPUs. We are capturing processing timelines for the typical case when a GPU is available, and we would also like to capture processing timelines for the case when the ray tracing happens on a CPU.

    The machine specs in the initial post were not for us, they were for the original poster (AndrewHardin).

    Thank you for your PM. By the way, I realized that OpenMPI was forcing only a certain number of threads to be available to each process that it spawns based on the mpirun options, so that behavior had nothing to do with OptiX.

    Ah right, you weren’t the original poster, my mistake!