Is it safe to use CUBIN objects created with Cuda 4.0 on a system with Cuda 4.1 ?

My application uses CUDA kernels for the bulk of the computations. For good reasons (out of scope of this question) I use a shared object/linking model to dynamically load the object files each of which contains 1 host function and 1 CUDA kernel. Since kernels can not be extern the basic structure of such a “kernel” is:

__global__ kernel() { ...code... }

extern "C" void call_kernel() {

  <<<GRID,BLOCK,SHMEM>>>kernel();

}

I use a host function which sole purpose is to call the kernel. For building the shared object i use:

nvcc -arch=sm_20 -m64 --compiler-options -fPIC,-shared -link -o kernel0.o kernel0.cu

The whole app uses lots of these kernels and they are loaded with dlopen(). The whole thing works fine if everything (building/loading/executing) stays on one machine A.

But when I compile/build the shared objects say on machine B (cuda 4.1, NVIDIA C2050), and dlopen them later on machine A (cuda 4.0, GTX 480) the computation does not yield the same result as if the shared objects were also build on machine A.

That sounds odd to me. Isn’t there a CUBIN object embedded in the .o file which contains instructions that are independent of the particular GPU architecture?

I know that it is advised to use the same compiler version for building and linking. Again, I have good reasons why not to build the shared objects on the same machine as they are executed.

Cross listed

Hi,
I can hardly tell about the compatibility in your case, but I just wanted to draw your attention to the fact that the GPU compiler for compute capability 2.0 onwards is LLVM starting with cuda 4.1. It might be the case that your differences in results come from this difference in compiler rather than from mixing objects from different version. I would encourage you to test your whole application on the 4.1 environment and to compare the results with the ones you get from the 4.0 environment.