Parallel compilation with NVRTC

NVRTC seems to be compiling programs in serial order even if it’s accessed from multiple threads. Is there a way to make runtime compilation with NVRTC parallel?

I am very much interested in this as well.

I did some tests with CUDA 11.8 (and 12.3), and there seems to be serialization happening between multiple threads calling nvrtcCompileProgram due to some shared resource contention.

On a PIX capture timing capture, we can see a lot of context switches happening during the calls to nvrtcCompileProgram:


image

I am using a single CUcontext that is set current on each thread and launch multiple programs compilation on multiple threads.

I tried to use one CUcontext per thread (not recommended), and the same contention happens.

Am I doing something wrong?

Also, is there a way to have the NVRTC symbols (I used the NVIDIA symbols driver in the capture: https://driver-symbols.nvidia.com).

I doubt you are doing anything wrong, and I don’t know of anywhere that it is claimed or documented by NVIDIA that NVRTC will run compilation in parallel. You can also find other reports like this one on forums. Furthermore, the general possibility for runtime and driver API calls to interlock or serialize is published.

If you desire a particular capability in CUDA, one way to express that is by filing a bug.

Thanks for the quick answer!

I wrongly assumed that parallel compilation of kernels would be supported (like for shaders with FXC/DXC)…

I stumbled upon this article explaining why we could see serialization: https://developer.nvidia.com/blog/reducing-application-build-times-using-cuda-c-compilation-aids/

And in the CUDA 12.3 release notes (CUDA 12.3 Update 2 Release Notes) there is a section telling that improved NVVM concurrency was added:

So I thought that maybe…

FWIW, I also found a “workaround” described in this paper where they spawn daemons that performs the compilation and use inter-process communication to retrieve the compiled kernels: https://www.researchgate.net/publication/317061936_Parallel_and_in-process_compilation_of_individuals_for_genetic_programming_on_GPU

I will be filling a bug/feature request.