The program crashes in cudart::globalState::registerEntryFunction function on DGX-1

Hi guys,

Greeting from me!

On my Arch Linux server, The CUDA version is:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

And gcc version is:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/8.2.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release --enable-default-pie --enable-default-ssp --enable-cet=auto
Thread model: posix
gcc version 8.2.1 20180831 (GCC)

I have 2 projects which use CMake to control compilation flow. The first project generates dynamic libraries which are feeded into the second project. Now I just copy the dynmaic libraries & header files from the first project to the second, then build the second project. It works OK!

On my DGX-1 server, The CUDA version is:

$ /usr/local/cuda-9.0/bin/nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

And gcc version is:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.4.0-17ubuntu1~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 6.4.0 20180424 (Ubuntu 6.4.0-17ubuntu1~16.04)

Then copy libraries from first to second doesn’t work. The program will crash in cudart::globalState::registerEntryFunction:

(gdb) bt
#0  0x00007ffff73fc559 in cudart::globalState::registerEntryFunction(void**, char const*, char*, char const*, int, uint3*, uint3*, dim3*, dim3*, int*) () from /home/xiaonan/dl2-he/3rdparty/libDSI_FV.so
#1  0x00007ffff73decbc in __cudaRegisterFunction () from /home/xiaonan/dl2-he/3rdparty/libDSI_FV.so
#2  0x00007ffff73d9098 in __nv_cudaEntityRegisterCallback(void**) () from /home/xiaonan/dl2-he/3rdparty/libDSI_FV.so
#3  0x00000000004283d6 in __cudaRegisterLinkedBinary(__fatBinC_Wrapper_t const*, void (*)(void**), void*) ()
#4  0x00000000004282e5 in __cudaRegisterLinkedBinary_66_tmpxft_00002dac_00000000_12_cuda_device_runtime_compute_70_cpp1_ii_8b1a5d37 ()
#5  0x00007ffff7de76ba in ?? () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff7de77cb in ?? () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7dd7c6a in ?? () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000001 in ?? ()
#9  0x00007fffffffe7e3 in ?? ()
#10 0x0000000000000000 in ?? ()

I check the assembly code:

(gdb) disassemble
Dump of assembler code for function _ZN6cudart11globalState21registerEntryFunctionEPPvPKcPcS4_iP5uint3S7_P4dim3S9_Pi:
   0x00007ffff73fc520 <+0>:     mov    %rbp,-0x20(%rsp)
   0x00007ffff73fc525 <+5>:     mov    %r12,-0x18(%rsp)
   0x00007ffff73fc52a <+10>:    xor    %eax,%eax
   0x00007ffff73fc52c <+12>:    mov    %r13,-0x10(%rsp)
   0x00007ffff73fc531 <+17>:    mov    %r14,-0x8(%rsp)
   0x00007ffff73fc536 <+22>:    mov    %rcx,%r13
   0x00007ffff73fc539 <+25>:    mov    %rbx,-0x28(%rsp)
   0x00007ffff73fc53e <+30>:    sub    $0x38,%rsp
   0x00007ffff73fc542 <+34>:    mov    (%rdi),%ecx
   0x00007ffff73fc544 <+36>:    mov    %rdx,%r14
   0x00007ffff73fc547 <+39>:    mov    %r8,%r12
   0x00007ffff73fc54a <+42>:    mov    %r9d,%ebp
   0x00007ffff73fc54d <+45>:    mov    0x10(%rdi),%rdi
   0x00007ffff73fc551 <+49>:    test   %ecx,%ecx
   0x00007ffff73fc553 <+51>:    jne    0x7ffff73fc5f0 <_ZN6cudart11globalState21registerEntryFunctionEPPvPKcPcS4_iP5uint3S7_P4dim3S9_Pi+208>
=> 0x00007ffff73fc559 <+57>:    mov    0x10(%rax),%rbx

(gdb) i registers
rax            0x0                 0
rbx            0xb734366d          3073652333
rcx            0x11                17
rdx            0x0                 0
rsi            0x753170            7680368
rdi            0x7529a0            7678368
rbp            0xffffffff          0xffffffff
rsp            0x7fffffffe420      0x7fffffffe420
r8             0x0                 0
r9             0x867de7ff          2256398335
r10            0x0                 0
r11            0xa3f3365           171914085
r12            0x7ffff7435790      140737341773712
r13            0x7ffff7435790      140737341773712
r14            0x7ffff73d9da0      140737341398432
r15            0x7ffff73d9da0      140737341398432
rip            0x7ffff73fc559      0x7ffff73fc559 <cudart::globalState::registerEntryFunction(void**, char const*, char*, char const*, int, uint3*, uint3*, dim3*, dim3*, int*)+57>
eflags         0x10246             [ PF ZF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0

The reason should be the first parameter of cudart::globalState::registerEntryFunction is 0. If I add the second project as a sub-directory of first project, the program runs fine. So I can’t figure out why copying dynamic libraries method doesn’t work on DGX-1. Because CUDA, gcc, or anything else?

Could someone give some clue? Thanks very much in advance!

Best Regards
Nan Xiao

Renaming *cpp to *.cu fix this problem.