Linker error building CUDA example file for dynamic parallelism

Hi,

I’m using a Linux Ubuntu System (16.04.2 LTS, Xenial), with Kernel 4.4.0-64-generic, and CUDA 7.5.
The normal CUDA system is up an running, but building code using dynamic parallelism fails to link
the runtime, although every library is present.

This is the example “testNvidia.cu” file I’m using (from the NVIDIA reference site):

#include <stdio.h> 

__global__ void childKernel() 
{ 
    printf("Hello "); 
} 

__global__ void parentKernel() 
{ 
    // launch child 
    childKernel<<<1,1>>>(); 
    if (cudaSuccess != cudaGetLastError()) { 
        return; 
    }

    // wait for child to complete 
    if (cudaSuccess != cudaDeviceSynchronize()) { 
        return; 
    } 

    printf("World!\n"); 
} 

int main(int argc, char *argv[]) 
{ 
    // launch parent 
    parentKernel<<<1,1>>>(); 
    if (cudaSuccess != cudaGetLastError()) { 
        return 1; 
    } 

    // wait for parent to complete 
    if (cudaSuccess != cudaDeviceSynchronize()) { 
        return 2; 
    } 

    return 0; 
}

Any combination I tried for compilation or linking failed. The linker has obviously problems to find some runtime functions.

Checking sanity of the libraries with ld -lcudadevrt --verbose
indicated a healthy environment.

The compilation and linking output is this:

nvcc -arch=compute_53 -code=sm_53 --std=c++11 -rdc=true testNVIDIA.cu --library-path=/usr/local/cuda-7.5/lib64 -lcudadevrt --verbose -lcudart

#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/lib/nvidia-cuda-toolkit/bin
#$ _THERE_=/usr/lib/nvidia-cuda-toolkit/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ NVVMIR_LIBRARY_DIR=/usr/lib/nvidia-cuda-toolkit/libdevice
#$ PATH=/usr/lib/nvidia-cuda-toolkit/bin:/home/andreas/software/circos/current/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin
#$ LIBRARIES=  -L/usr/lib/x86_64-linux-gnu/stubs
#$ gcc -std=c++11 -D__CUDA_ARCH__=530 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS  -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__  -D"__CUDACC_VER__=70517" -D"__CUDACC_VER_BUILD__=17" -D"__CUDACC_VER_MINOR__=5" -D"__CUDACC_VER_MAJOR__=7" -include "cuda_runtime.h" -m64 "testNVIDIA.cu" > "/tmp/tmpxft_00000b49_00000000-9_testNVIDIA.cpp1.ii" 
#$ cudafe --allow_managed --m64 --gnu_version=50400 --c++11 -tused --no_remove_unneeded_entities --device-c --gen_c_file_name "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.c" --stub_file_name "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.gpu" --nv_arch "compute_53" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00000b49_00000000-3_testNVIDIA.module_id" --include_file_name "tmpxft_00000b49_00000000-2_testNVIDIA.fatbin.c" "/tmp/tmpxft_00000b49_00000000-9_testNVIDIA.cpp1.ii" 
#$ gcc -D__CUDA_ARCH__=530 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS  -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__ -D__CUDANVVM__  -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -m64 "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.gpu" > "/tmp/tmpxft_00000b49_00000000-10_testNVIDIA.cpp2.i" 
#$ cudafe -w --allow_managed --m64 --gnu_version=50400 --c --device-c --gen_c_file_name "/tmp/tmpxft_00000b49_00000000-11_testNVIDIA.cudafe2.c" --stub_file_name "/tmp/tmpxft_00000b49_00000000-11_testNVIDIA.cudafe2.stub.c" --gen_device_file_name "/tmp/tmpxft_00000b49_00000000-11_testNVIDIA.cudafe2.gpu" --nv_arch "compute_53" --module_id_file_name "/tmp/tmpxft_00000b49_00000000-3_testNVIDIA.module_id" --include_file_name "tmpxft_00000b49_00000000-2_testNVIDIA.fatbin.c" "/tmp/tmpxft_00000b49_00000000-10_testNVIDIA.cpp2.i" 
#$ gcc -D__CUDA_ARCH__=530 -E -x c -DCUDA_DOUBLE_MATH_FUNCTIONS  -D__CUDABE__ -D__CUDANVVM__  -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -m64 "/tmp/tmpxft_00000b49_00000000-11_testNVIDIA.cudafe2.gpu" > "/tmp/tmpxft_00000b49_00000000-12_testNVIDIA.cpp3.i" 
#$ filehash -s "--compile-only " "/tmp/tmpxft_00000b49_00000000-12_testNVIDIA.cpp3.i" > "/tmp/tmpxft_00000b49_00000000-13_testNVIDIA.hash"
#$ gcc -std=c++11 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_RDC__  -D"__CUDACC_VER__=70517" -D"__CUDACC_VER_BUILD__=17" -D"__CUDACC_VER_MINOR__=5" -D"__CUDACC_VER_MAJOR__=7" -include "cuda_runtime.h" -m64 "testNVIDIA.cu" > "/tmp/tmpxft_00000b49_00000000-5_testNVIDIA.cpp4.ii" 
#$ cudafe++ --allow_managed --m64 --gnu_version=50400 --c++11 --parse_templates --device-c --gen_c_file_name "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.cpp" --stub_file_name "tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.stub.c" --module_id_file_name "/tmp/tmpxft_00000b49_00000000-3_testNVIDIA.module_id" "/tmp/tmpxft_00000b49_00000000-5_testNVIDIA.cpp4.ii" 
#$ cicc  -arch compute_53 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 -nvvmir-library "/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_30.10.bc"  --device-c --orig_src_file_name "testNVIDIA.cu"  "/tmp/tmpxft_00000b49_00000000-12_testNVIDIA.cpp3.i" -o "/tmp/tmpxft_00000b49_00000000-6_testNVIDIA.ptx"
#$ ptxas  -arch=sm_53 -m64 --compile-only "/tmp/tmpxft_00000b49_00000000-6_testNVIDIA.ptx"  -o "/tmp/tmpxft_00000b49_00000000-15_testNVIDIA.cubin" 
#$ fatbinary --create="/tmp/tmpxft_00000b49_00000000-2_testNVIDIA.fatbin" -64 --key="cdd232fc63f58489" --cmdline="--compile-only " "--image=profile=sm_53,file=/tmp/tmpxft_00000b49_00000000-15_testNVIDIA.cubin" --embedded-fatbin="/tmp/tmpxft_00000b49_00000000-2_testNVIDIA.fatbin.c" --cuda --device-c
#$ rm /tmp/tmpxft_00000b49_00000000-2_testNVIDIA.fatbin
#$ gcc -std=c++11 -D__CUDA_ARCH__=530 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS   -D__CUDA_PREC_DIV -D__CUDA_PREC_SQRT -m64 "/tmp/tmpxft_00000b49_00000000-4_testNVIDIA.cudafe1.cpp" > "/tmp/tmpxft_00000b49_00000000-16_testNVIDIA.ii"        
#$ gcc -std=c++11 -c -x c++ -fpreprocessed -m64 -o "/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o" "/tmp/tmpxft_00000b49_00000000-16_testNVIDIA.ii"                                                                                         
#$ nvlink --arch=sm_53 --register-link-binaries="/tmp/tmpxft_00000b49_00000000-7_a_dlink.reg.c" -m64 -L"/usr/local/cuda-7.5/lib64" -lcudadevrt -lcudart   -L/usr/lib/x86_64-linux-gnu/stubs -cpu-arch=X86_64 "/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o"  -lcudadevrt  -o "/tmp/tmpxft_00000b49_00000000-18_a_dlink.cubin"                                                                                                                                                          
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in '/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o'                                                                                                                       
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in '/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o'                                                                                                                             
nvlink error   : Undefined reference to 'cudaGetLastError' in '/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o'                                                                                                                               
nvlink error   : Undefined reference to 'cudaDeviceSynchronize' in '/tmp/tmpxft_00000b49_00000000-17_testNVIDIA.o'                                                                                                                          
# --error 0xff --

When I’m checking the library libcudadevrt.a, it appears that the symbols the linker misses are either not present or are not visible with nm.

nm /usr/local/cuda-7.5/lib64/libcudadevrt.a delivered:

cuda_device_runtime.o:

0000000000000010 B __CNPRT_VERSION_NUMBER__
00000000000000a0 b cpy_kernel32
00000000000000e0 b cpy_kernel64
                 U cudaLaunch
                 U __cudaRegisterFunction
                 U __cudaRegisterLinkedBinary_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37
                 U __cudaRegisterVar
0000000000000008 B cudartErrorCnpMap
0000000000000020 r cudartErrorCnpMapArr
0000000000000004 R cudartErrorCnpMapEntryCount
0000000000000000 B cudartErrorTable
0000000000000000 d cudartErrorTableArr
0000000000000000 R cudartErrorTableEntryCount
                 U cudaSetupArgument
0000000000000000 r fatbinData
0000000000000000 D __fatbinwrap_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37
00000000000055a0 t _GLOBAL__I_cudartErrorTable
                 U _GLOBAL_OFFSET_TABLE_
0000000000000000 r .LC0
00000000000000b0 r .LC1
00000000000006e0 r .LC10
0000000000000790 r .LC11
0000000000000840 r .LC12
00000000000008f0 r .LC13
00000000000009a0 r .LC14
0000000000000a50 r .LC15
0000000000000b00 r .LC16
0000000000000ba8 r .LC17
0000000000000c50 r .LC18
0000000000000cf8 r .LC19
0000000000000160 r .LC2
0000000000000da0 r .LC20
0000000000000e48 r .LC21
0000000000000ef0 r .LC22
0000000000000f98 r .LC23
0000000000001040 r .LC24
00000000000010e8 r .LC25
0000000000001190 r .LC26
0000000000001238 r .LC27
00000000000012e0 r .LC28
0000000000001388 r .LC29
0000000000000210 r .LC3
0000000000001430 r .LC30
00000000000014d8 r .LC31
0000000000001580 r .LC32
00000000000015f0 r .LC33
0000000000001660 r .LC34
00000000000016d0 r .LC35
0000000000001740 r .LC36
0000000000000000 r .LC37
0000000000000011 r .LC38
00000000000017b8 r .LC39
00000000000002c0 r .LC4
000000000000002c r .LC40
000000000000003e r .LC41
000000000000005a r .LC42
0000000000000370 r .LC5
0000000000000420 r .LC6
00000000000004d0 r .LC7
0000000000000580 r .LC8
0000000000000630 r .LC9
0000000000000000 r __module_id_str
0000000000000128 b __nv_fatbinhandle_for_managed_rt
0000000000000020 b set_kernel32
0000000000000060 b set_kernel64
0000000000000010 t _Z103__sti____cudaRegisterAll_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37v
0000000000003390 t _Z16memcpy_3d_deviceIjLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000003110 t _Z16memcpy_3d_deviceIjLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002e90 t _Z16memcpy_3d_deviceIjLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002c10 t _Z16memcpy_3d_deviceIjLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002990 t _Z16memcpy_3d_deviceIjLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002710 t _Z16memcpy_3d_deviceIjLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002490 t _Z16memcpy_3d_deviceIjLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000002210 t _Z16memcpy_3d_deviceIjLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000001f90 t _Z16memcpy_3d_deviceImLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000001d00 t _Z16memcpy_3d_deviceImLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000001a70 t _Z16memcpy_3d_deviceImLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
00000000000017e0 t _Z16memcpy_3d_deviceImLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000001550 t _Z16memcpy_3d_deviceImLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
00000000000012c0 t _Z16memcpy_3d_deviceImLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000001030 t _Z16memcpy_3d_deviceImLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000000da0 t _Z16memcpy_3d_deviceImLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_
0000000000005590 t _Z16memset_3d_deviceIjLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000005370 t _Z16memset_3d_deviceIjLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000005150 t _Z16memset_3d_deviceIjLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004f30 t _Z16memset_3d_deviceIjLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004d10 t _Z16memset_3d_deviceIjLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004af0 t _Z16memset_3d_deviceIjLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
00000000000048d0 t _Z16memset_3d_deviceIjLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
00000000000046b0 t _Z16memset_3d_deviceIjLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004490 t _Z16memset_3d_deviceImLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004270 t _Z16memset_3d_deviceImLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000004050 t _Z16memset_3d_deviceImLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000003e30 t _Z16memset_3d_deviceImLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000003c10 t _Z16memset_3d_deviceImLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
00000000000039f0 t _Z16memset_3d_deviceImLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
00000000000037d0 t _Z16memset_3d_deviceImLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
00000000000035b0 t _Z16memset_3d_deviceImLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_
0000000000000000 t _Z22____nv_dummy_param_refPv
0000000000000040 t _Z31__nv_cudaEntityRegisterCallbackPPv
0000000000005380 t _Z81__device_stub__Z16memset_3d_deviceIjLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
0000000000005160 t _Z81__device_stub__Z16memset_3d_deviceIjLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
0000000000004f40 t _Z81__device_stub__Z16memset_3d_deviceIjLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
0000000000004d20 t _Z81__device_stub__Z16memset_3d_deviceIjLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
0000000000004b00 t _Z81__device_stub__Z16memset_3d_deviceIjLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
00000000000048e0 t _Z81__device_stub__Z16memset_3d_deviceIjLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
00000000000046c0 t _Z81__device_stub__Z16memset_3d_deviceIjLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
00000000000044a0 t _Z81__device_stub__Z16memset_3d_deviceIjLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_
0000000000004280 t _Z81__device_stub__Z16memset_3d_deviceImLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
0000000000004060 t _Z81__device_stub__Z16memset_3d_deviceImLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
0000000000003e40 t _Z81__device_stub__Z16memset_3d_deviceImLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
0000000000003c20 t _Z81__device_stub__Z16memset_3d_deviceImLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
0000000000003a00 t _Z81__device_stub__Z16memset_3d_deviceImLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
00000000000037e0 t _Z81__device_stub__Z16memset_3d_deviceImLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
00000000000035c0 t _Z81__device_stub__Z16memset_3d_deviceImLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
00000000000033a0 t _Z81__device_stub__Z16memset_3d_deviceImLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_
0000000000003120 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000002ea0 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000002c20 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
00000000000029a0 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000002720 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
00000000000024a0 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000002220 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000001fa0 t _Z92__device_stub__Z16memcpy_3d_deviceIjLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_
0000000000001d10 t _Z92__device_stub__Z16memcpy_3d_deviceImLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000001a80 t _Z92__device_stub__Z16memcpy_3d_deviceImLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
00000000000017f0 t _Z92__device_stub__Z16memcpy_3d_deviceImLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000001560 t _Z92__device_stub__Z16memcpy_3d_deviceImLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
00000000000012d0 t _Z92__device_stub__Z16memcpy_3d_deviceImLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000001040 t _Z92__device_stub__Z16memcpy_3d_deviceImLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000000db0 t _Z92__device_stub__Z16memcpy_3d_deviceImLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000000b20 t _Z92__device_stub__Z16memcpy_3d_deviceImLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_
0000000000000018 b _ZZ22____nv_dummy_param_refPvE5__ref
0000000000000120 b _ZZ31__nv_cudaEntityRegisterCallbackPPvE5__ref
0000000000000130 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000138 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000140 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000148 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000150 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000158 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000160 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000168 b _ZZ81__device_stub__Z16memset_3d_deviceIjLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjjjjjjjjjjjjjjS_E3__f
0000000000000170 b _ZZ81__device_stub__Z16memset_3d_deviceImLi0ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
0000000000000178 b _ZZ81__device_stub__Z16memset_3d_deviceImLi0ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
0000000000000180 b _ZZ81__device_stub__Z16memset_3d_deviceImLi0ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
0000000000000188 b _ZZ81__device_stub__Z16memset_3d_deviceImLi0ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
0000000000000190 b _ZZ81__device_stub__Z16memset_3d_deviceImLi1ELi0ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
0000000000000198 b _ZZ81__device_stub__Z16memset_3d_deviceImLi1ELi0ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
00000000000001a0 b _ZZ81__device_stub__Z16memset_3d_deviceImLi1ELi1ELi0EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
00000000000001a8 b _ZZ81__device_stub__Z16memset_3d_deviceImLi1ELi1ELi1EEvPhhjT_S1_S1_S1_S1_jjjjjjjS1_S0_PhhjmmmmmjjjjjjjmS_E3__f
00000000000001b0 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001b8 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001c0 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001c8 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001d0 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001d8 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001e0 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001e8 b _ZZ92__device_stub__Z16memcpy_3d_deviceIjLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhjjjjjjjjjjjjjjjjS0_S1_E3__f
00000000000001f0 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi0ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
00000000000001f8 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi0ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000200 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi0ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000208 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi0ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000210 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi1ELi0ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000218 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi1ELi0ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000220 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi1ELi1ELi0EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f
0000000000000228 b _ZZ92__device_stub__Z16memcpy_3d_deviceImLi1ELi1ELi1EEvPKhPhT_S3_S3_S3_S3_S3_S3_jjjjjjjjS3_S1_S2_PKhPhmmmmmmmjjjjjjjjmS0_S1_E3__f

Does anybody have an idea how to check what is missing for successful linking here?

Thank you in advance,
Andreas

It looks to me like you have a strange machine config.

Where is CUDA installed on your machine? Do you have multiple installations? What is the output of:

which nvcc

?

When you built any of the cuda samples that depend on dynamic parallelism, did they compile correctly?

Hi txbob,

I took your hint with the multiple installations to remove anything build manually, remove everything with “cuda” in it with apt and reinstall cuda toolkit, debugger etc.

Normal compilation works. The libcudadevrt library is present and reachable.

which nvcc gives
/usr/bin/nvcc as output

Trying the advanced quicksort sample from the example library has this output

andreas@AntigoneLinux:~/CUDAsamples/NVIDIA_CUDA-7.5_Samples/6_Advanced/cdpAdvancedQuicksort$ make --trace
Makefile:233: Ziel „cdpAdvancedQuicksort.o“ wird aktualisiert wegen: cdpAdvancedQuicksort.cu
"/usr/lib/nvidia-cuda-toolkit"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -dc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o cdpAdvancedQuicksort.o -c cdpAdvancedQuicksort.cu
Makefile:236: Ziel „cdpBitonicSort.o“ wird aktualisiert wegen: cdpBitonicSort.cu
"/usr/lib/nvidia-cuda-toolkit"/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -dc -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o cdpBitonicSort.o -c cdpBitonicSort.cu
Makefile:239: Ziel „cdpAdvancedQuicksort“ wird aktualisiert wegen: cdpAdvancedQuicksort.o cdpBitonicSort.o
"/usr/lib/nvidia-cuda-toolkit"/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o cdpAdvancedQuicksort cdpAdvancedQuicksort.o cdpBitonicSort.o  -lcudadevrt
nvlink error   : Undefined reference to 'cudaStreamCreateWithFlags' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaGetParameterBufferV2' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaLaunchDeviceV2' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaMemcpyAsync' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaPeekAtLastError' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaGetLastError' in 'cdpAdvancedQuicksort.o' (target: sm_35)
nvlink error   : Undefined reference to 'cudaGetErrorString' in 'cdpAdvancedQuicksort.o' (target: sm_35)
Makefile:239: die Regel für Ziel „cdpAdvancedQuicksort“ scheiterte
make: *** [cdpAdvancedQuicksort] Fehler 255

When trying to do separate compilation and linking, compilation works, but linking fails.

Searching for files with locate produces

locate libcuda
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so.370.28
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.370.28
/usr/lib/x86_64-linux-gnu/libcudadevrt.a
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5
/usr/lib/x86_64-linux-gnu/libcudart.so.7.5.18
/usr/lib/x86_64-linux-gnu/libcudart_static.a
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/share/doc/libcuda1-370
/usr/share/doc/libcudart7.5
/usr/share/doc/libcuda1-370/changelog.Debian.gz
/usr/share/doc/libcuda1-370/copyright
/usr/share/doc/libcudart7.5/changelog.Debian.gz
/usr/share/doc/libcudart7.5/copyright
/usr/share/doc/nvidia-cuda-doc/examples/libcudacore.h
/usr/share/lintian/overrides/libcudart7.5
/usr/share/man/man7/libcuda.7.gz
/usr/share/man/man7/libcuda.so.7.gz
/usr/share/man/man7/libcudart.7.gz
/usr/share/man/man7/libcudart.so.7.gz
/var/cache/apt/archives/libcuda1-370_370.28-0ubuntu0~gpu16.04.3_amd64.deb
/var/cache/apt/archives/libcudart7.5_7.5.18-0ubuntu1_amd64.deb
/var/lib/dpkg/info/libcuda1-370.list
/var/lib/dpkg/info/libcuda1-370.md5sums
/var/lib/dpkg/info/libcuda1-370.shlibs
/var/lib/dpkg/info/libcuda1-370.triggers
/var/lib/dpkg/info/libcudart7.5:amd64.list
/var/lib/dpkg/info/libcudart7.5:amd64.md5sums
/var/lib/dpkg/info/libcudart7.5:amd64.shlibs
/var/lib/dpkg/info/libcudart7.5:amd64.symbols
/var/lib/dpkg/info/libcudart7.5:amd64.triggers

That doesn’t look right to me, but it may possibly be OK.

And your makefile is using:

“/usr/lib/nvidia-cuda-toolkit”/bin/nvcc

which doesn’t appear to be the same??

what is the output of:

nvcc --version

?

Normally I would expect CUDA to be installed at /usr/local/cuda, which is symlinked to your actual version e.g. /usr/local/cuda-7.5

which nvcc would then return:

/usr/local/cuda-7.5/bin/nvcc

I still suspect conflicting install. Focus on getting your install to the point that the cdpQuickSort example builds correctly. if it does not, your CUDA install (by definition) is broken. If need be, start over with a clean OS load (perhaps in a VM, so you can convince yourself it works) and load CUDA and prove to yourself that you can build that sample code correctly.

Hi,

I think I found the solution: I de-installed the cuda libraries from the Ubuntu Xenial repository,
added the NVIDIA repository as described on the developer site and downloaded the whole
toolkit for CUDA 8.0.
Now it works to compile with dynamic parallelism.

It appears that the Ubuntu packages are somehow flawed.

Thank you for your help!
Andreas

I would always recommend only getting your CUDA bits from [url]http://www.nvidia.com/getcuda[/url]

And always carefully follow the instructions in the relevant CUDA installation guide at http://docs.nvidia.com