Cuda on Linux. I have a Cuda file with a very large function. Compiling that Cuda code dominates my build time when I’m able to run “make -j 8” so make can run 8 g++ compiles at a time for the rest of my program. I set up nvcc with the standard flags
to support multiple GPU architectures. But this results in one nvcc compile, which then generates the code for these four architectures one at a time. Is there any way to tell nvcc to run these four code generations in parallel, given I have enough CPU cores available?
As far as I know, there is no way to parallelize the building of fat binaries when using a single nvcc invocation, with nvcc assigning the build for each target architecture to a different thread. Strikes me as an excellent suggestion.
I would suggest filing an enhancement request via the bug reporting form linked from the registered developer website. Please prefix the synopsis with “RFE:” so it is readily recognizable as a request for enhancement rather than a true bug. Thanks!
In the meantime you should be able to manually parallelize the build by looking at the output of nvcc --verbose or nvcc --dryrun and invoking those commands directly instead of nvcc.