Link failure with Thrust and separate compilation

Thrust seems to cause a problem for performing multiple device links in a single host executable. I’m not sure if this is a CUDA or a Thrust problem, so I’m posting it here.

What I’m doing roughly follows the scheme given in slide 12 of the NVIDIA presentation http://on-demand.gputechconf.com/gtc-express/2012/presentations/gpu-object-linking.pdf. Each intermediate device link creates the same symbol in the resulting object files, so when they’re linked together, a multiple definition error results. Here’s the code:

a.cu:

#include <thrust/device_vector.h>
void funcA(int n) {
  thrust::device_vector<float> devVec;
  devVec.resize(n);
}

b.cu:

#include <thrust/device_vector.h>
void funcB(int n) {
  thrust::device_vector<double> devVec;
  devVec.resize(n);
}

main.cxx:

void funcA(int n);
void funcB(int n);

int main(int argc, char** argv) {
  funcA(10);
  funcB(20);
  return 0;
}

And the build script, build.sh:

#!/bin/bash

ARCH_FLAGS="--generate-code arch=compute_61,code=sm_61"

nvcc $ARCH_FLAGS -dc a.cu
nvcc $ARCH_FLAGS a.o -dlink -o linkA.o

nvcc $ARCH_FLAGS -dc b.cu
nvcc $ARCH_FLAGS b.o -dlink -o linkB.o

g++ -c -o main.o main.cxx
echo "=== Main link ==="
g++ main.o -o main a.o b.o linkA.o linkB.o \
    -L/usr/local/cuda-8.0/lib64 -lcudart -lcudadevrt

The build output is as follows:

ubuntu16$ ./build.sh 
=== Main link ===
linkB.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37':
link.stub:(.text+0x5b): multiple definition of `__cudaRegisterLinkedBinary_66_tmpxft_00001a84_00000000_17_cuda_device_runtime_compute_61_cpp1_ii_8b1a5d37'
linkA.o:link.stub:(.text+0x5b): first defined here
collect2: error: ld returned 1 exit status

Using nm, I’ve verified that the offending symbol __cudaRegisterLinkedBinary… is produced in both link.o files. If the source .cu files don’t contain Thrust code, but instead have kernel or device function definitions, those symbols are not produced, and the link succeeds.

Is there some limitation of separable compilation I’m missing here?

This is with a GT 1030, CUDA 8.0.61, and Ubuntu 16.04. I’ve reproduced the problem with both gcc 4.9.3 and 5.4.0. Thanks in advance for any help.

I’ve reproduced similar behavior with CUDA 9.1

It looks like a defect of some sort to me.

I would suggest filing a bug at developer.nvidia.com

in the bug description you can just include a link back to this posting.

As a workaround (I’m sure this occurred to you already), if you combine your device-link operations that involve thrust code:

nvcc $ARCH_FLAGS a.o b.o -dlink -o link.o

g++ -c -o main.o main.cxx
echo "=== Main link ==="
g++ main.o -o main a.o b.o link.o \
    -L/usr/local/cuda-8.0/lib64 -lcudart -lcudadevrt

it seems to work around the issue for me.

Thanks txbob. I’ve filed the bug.

Yes, that workaround worked. Of course, this is a reduced example from a much larger code project that might benefit from multiple device links.

I also noticed in CUDA 9.1 that I get a “successful” link (albeit with warnings) if I just omit the $ARCH_FLAGS from the device link steps. The CUDA 9.1 problem I had was slightly different than yours, so I’m not sure what you’ll see with CUDA 8

Yep, I get a successful link with CUDA 8 when I remove $ARCH_FLAGS from the device link steps. Here’s the output:

ubuntu16$ ./build.sh 
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvlink warning : SM Arch ('sm_20') not found in 'a.o'
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvlink warning : SM Arch ('sm_20') not found in 'b.o'
=== Main link ===

I verified that the offending symbol isn’t in the intermediate link objects. I’m not clear on what the linker does with the architecture specifications, but that seems to have something to do with it.