I have found the same problem but have narrowed it down a little. The problem only appears for me when compiling to 32bit Linux. I have tested using the 195.17 driver and CUDA 3.0 Toolkit. When running the same code (equivalent to the original post) on Mac or 64bit Linux it does not cause a Segmentation Fault (SIGSEGV). However, if the application is compiled for 32bit Linux or as a 32bit binary on 64bit Linux, the program will fault after the return of main if the dynamic library is unloaded.
Is there anyone from nVidia who can comment on what may be happening here? Simply not including a call to dlclose() is not a viable work around.
Using valgrind I can see a suspicious ioctl warning about uninitialized memory on the first call to a CUDA API function that causes the creation of a CUDA context. Then there is a call into the libcuda.so after it has been unloaded. I’m not sure if they are related.
********************* The ioctl warning ****************************
==22055== Syscall param ioctl(generic) points to uninitialised byte(s)
==22055== at 0xA4B869: ioctl (in /lib/libc-2.5.so)
==22055== by 0x4291BE2: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4274C4B: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4248CC8: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4241196: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x42E65B0: cuCtxCreate (in /usr/lib/libcuda.so.195.17)
==22055== by 0x416DA19: (within /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x416E56B: (within /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x41504A8: cudaGetSymbolAddress (in /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x400BD69: cudaError cudaGetSymbolAddress(void**, int const&) (cuda_runtime.h:311)
==22055== by 0x400BCD8: simengine_runmodel (cudalibtest.cu:40)
==22055== by 0x804A13B: main (main.c:25)
**************************** Unloading of shared libraries followed by segfault **********************************
–22055-- Discarding syms at 0x400A000-0x400F000 in /tmp/cudalibtest.so due to munmap()
–22055-- Discarding syms at 0x4136000-0x417B000 in /usr/local/cuda/lib/libcudart.so.3.0.8 due to munmap()
–22055-- Discarding syms at 0x1C5000-0x2B0000 in /usr/lib/libstdc++.so.6.0.8 due to munmap()
–22055-- Discarding syms at 0x417B000-0x6C3B000 in /usr/lib/libcuda.so.195.17 due to munmap()
–22055-- Discarding syms at 0xAC8000-0xAEF000 in /lib/libm-2.5.so due to munmap()
–22055-- Discarding syms at 0xDD3000-0xDDF000 in /lib/libgcc_s-4.1.2-20080825.so.1 due to munmap()
==22055==
==22055== Jump to the invalid address stated on the next line
==22055== at 0x4251930: ??? <----------------------------------------- NOTE: This address is in the range for libcuda.so, the CUDA driver, above!!!
==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)
==22055== Address 0x4251930 is not stack’d, malloc’d or (recently) free’d
==22055==
==22055== Process terminating with default action of signal 11 (SIGSEGV)
==22055== Access not within mapped region at address 0x4251930
==22055== at 0x4251930: ???
==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)