Can nvcc generate code for multiple architectures in parallel

REPoore · April 7, 2014, 10:32pm

Cuda on Linux. I have a Cuda file with a very large function. Compiling that Cuda code dominates my build time when I’m able to run “make -j 8” so make can run 8 g++ compiles at a time for the rest of my program. I set up nvcc with the standard flags

-gencode=arch=compute_20,code=sm_20
-gencode=arch=compute_30,code=sm_30
-gencode=arch=compute_35,code=sm_35
-gencode=arch=compute_35,code=compute_35

to support multiple GPU architectures. But this results in one nvcc compile, which then generates the code for these four architectures one at a time. Is there any way to tell nvcc to run these four code generations in parallel, given I have enough CPU cores available?

njuffa · April 8, 2014, 12:04am

As far as I know, there is no way to parallelize the building of fat binaries when using a single nvcc invocation, with nvcc assigning the build for each target architecture to a different thread. Strikes me as an excellent suggestion.

I would suggest filing an enhancement request via the bug reporting form linked from the registered developer website. Please prefix the synopsis with “RFE:” so it is readily recognizable as a request for enhancement rather than a true bug. Thanks!

tera · April 19, 2014, 8:47pm

In the meantime you should be able to manually parallelize the build by looking at the output of nvcc --verbose or nvcc --dryrun and invoking those commands directly instead of nvcc.

REPoore · April 22, 2014, 8:27pm

Njuffa: Bug 1504822 submitted as an enhancement request.

Tera: Thanks for the idea, but too ugly for my production environment.