Thrust seems to cause a problem for performing multiple device links in a single host executable. I’m not sure if this is a CUDA or a Thrust problem, so I’m posting it here.
What I’m doing roughly follows the scheme given in slide 12 of the NVIDIA presentation http://on-demand.gputechconf.com/gtc-express/2012/presentations/gpu-object-linking.pdf. Each intermediate device link creates the same symbol in the resulting object files, so when they’re linked together, a multiple definition error results. Here’s the code:
a.cu:
#include <thrust/device_vector.h>
void funcA(int n) {
thrust::device_vector<float> devVec;
devVec.resize(n);
}
b.cu:
#include <thrust/device_vector.h>
void funcB(int n) {
thrust::device_vector<double> devVec;
devVec.resize(n);
}
main.cxx:
void funcA(int n);
void funcB(int n);
int main(int argc, char** argv) {
funcA(10);
funcB(20);
return 0;
}
And the build script, build.sh:
#!/bin/bash
ARCH_FLAGS="--generate-code arch=compute_61,code=sm_61"
nvcc $ARCH_FLAGS -dc a.cu
nvcc $ARCH_FLAGS a.o -dlink -o linkA.o
nvcc $ARCH_FLAGS -dc b.cu
nvcc $ARCH_FLAGS b.o -dlink -o linkB.o
g++ -c -o main.o main.cxx
echo "=== Main link ==="
g++ main.o -o main a.o b.o linkA.o linkB.o \
-L/usr/local/cuda-8.0/lib64 -lcudart -lcudadevrt
The build output is as follows:
ubuntu16$ ./build.sh
=== Main link ===
linkB.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37':
link.stub:(.text+0x5b): multiple definition of `__cudaRegisterLinkedBinary_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37'
linkA.o:link.stub:(.text+0x5b): first defined here
collect2: error: ld returned 1 exit status
Using nm, I’ve verified that the offending symbol __cudaRegisterLinkedBinary… is produced in both link.o files. If the source .cu files don’t contain Thrust code, but instead have kernel or device function definitions, those symbols are not produced, and the link succeeds.
Is there some limitation of separable compilation I’m missing here?
This is with a GT 1030, CUDA 8.0.61, and Ubuntu 16.04. I’ve reproduced the problem with both gcc 4.9.3 and 5.4.0. Thanks in advance for any help.