nvcc - build with local card max compute capablity

I posted the question here as well:

http://stackoverflow.com/questions/32995996/cuda-nvcc-build-with-local-card-max-compute-capablity

I can specify to the cuda nvcc compiler the compute capability, and the default is 2.0: -gencode=arch=compute_20,code="sm_20,compute_20".

I have two computers. One can do compute_20, the other can do compute_30. I am using visual studio. Is there away to specify to nvcc to use the maximum local card capability? Otherwise, I would need to have a separate project (.vcxproj) on each computer (specifying the max compute capability manually), which isn’t ideal.

Take a look at the cuda sample projects. They demonstrate how to do it.

I see that in the samples they set the arch to everything:

compute_11,sm_11;compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;

So the code would compile all options. Is the driver smart enough to use the highest one?
Is there a way to verify that (which code is picked in runtime)?

The driver is smart enough to pick the best one for your device (not necessarily the “highest one”). If you want to learn more about it, you could read various section of the nvcc manual, to learn about the fatbinary system.

If you wanted to verify at runtime, it would be fairly tedious, but you could create separate paths for the code based on the CUDA_ARCH macro, which is also discussed in the nvcc manual.

[url]http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#abstract[/url]

Thanks