load dynalic library use CUDA at run time cause segfalt
I am trying to compile a dynamic library with CUDA code, and load it at runtime.

It seems that the code run well, but it crash when I call dlclose(lib_handle);

the .so is OK when it was link to the executable. Only dynamic loading of the library have problem.

the .cu is like:

extern "C" int do_gpu_work()
{
// allocate and free the device memory is enough to reproduce the crash when call dlclose()
float* d_A;

cudaMalloc((void**)&d_A, size);
cudaFree(d_A);
}

the code is like:

#include <dlfcn.h>
int main()
{

void *lib_handle;
void (*fn)();

lib_handle = dlopen("/usr/local/lib/libcudasotest.so.0.0.0", RTLD_LAZY);

fn=(void (*)())dlsym( lib_handle, "do_gpu_work" );

(*fn)();

// segfault with this call

dlclose(lib_handle);
return 0;
}
I am trying to compile a dynamic library with CUDA code, and load it at runtime.



It seems that the code run well, but it crash when I call dlclose(lib_handle);



the .so is OK when it was link to the executable. Only dynamic loading of the library have problem.



the .cu is like:



extern "C" int do_gpu_work()

{

// allocate and free the device memory is enough to reproduce the crash when call dlclose()

float* d_A;



cudaMalloc((void**)&d_A, size);

cudaFree(d_A);

}



the code is like:



#include <dlfcn.h>

int main()

{



void *lib_handle;

void (*fn)();



lib_handle = dlopen("/usr/local/lib/libcudasotest.so.0.0.0", RTLD_LAZY);



fn=(void (*)())dlsym( lib_handle, "do_gpu_work" );



(*fn)();



// segfault with this call



dlclose(lib_handle);

return 0;

}

#1
Posted 06/09/2009 04:01 AM   
I have found the same problem but have narrowed it down a little. The problem only appears for me when compiling to 32bit Linux. I have tested using the 195.17 driver and CUDA 3.0 Toolkit. When running the same code (equivalent to the original post) on Mac or 64bit Linux it does not cause a Segmentation Fault (SIGSEGV). However, if the application is compiled for 32bit Linux or as a 32bit binary on 64bit Linux, the program will fault after the return of main if the dynamic library is unloaded.

Is there anyone from nVidia who can comment on what may be happening here? Simply not including a call to dlclose() is not a viable work around.

Using valgrind I can see a suspicious ioctl warning about uninitialized memory on the first call to a CUDA API function that causes the creation of a CUDA context. Then there is a call into the libcuda.so after it has been unloaded. I'm not sure if they are related.

********************* The ioctl warning ****************************
==22055== Syscall param ioctl(generic) points to uninitialised byte(s)
==22055== at 0xA4B869: ioctl (in /lib/libc-2.5.so)
==22055== by 0x4291BE2: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4274C4B: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4248CC8: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x4241196: (within /usr/lib/libcuda.so.195.17)
==22055== by 0x42E65B0: cuCtxCreate (in /usr/lib/libcuda.so.195.17)
==22055== by 0x416DA19: (within /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x416E56B: (within /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x41504A8: cudaGetSymbolAddress (in /usr/local/cuda/lib/libcudart.so.3.0.8)
==22055== by 0x400BD69: cudaError cudaGetSymbolAddress<int>(void**, int const&) (cuda_runtime.h:311)
==22055== by 0x400BCD8: simengine_runmodel (cudalibtest.cu:40)
==22055== by 0x804A13B: main (main.c:25)

**************************** Unloading of shared libraries followed by segfault **********************************
--22055-- Discarding syms at 0x400A000-0x400F000 in /tmp/cudalibtest.so due to munmap()
--22055-- Discarding syms at 0x4136000-0x417B000 in /usr/local/cuda/lib/libcudart.so.3.0.8 due to munmap()
--22055-- Discarding syms at 0x1C5000-0x2B0000 in /usr/lib/libstdc++.so.6.0.8 due to munmap()
--22055-- Discarding syms at [color="#FF0000"]0x417B000-0x6C3B000[/color] in /usr/lib/libcuda.so.195.17 due to munmap()
--22055-- Discarding syms at 0xAC8000-0xAEF000 in /lib/libm-2.5.so due to munmap()
--22055-- Discarding syms at 0xDD3000-0xDDF000 in /lib/libgcc_s-4.1.2-20080825.so.1 due to munmap()
==22055==
==22055== Jump to the invalid address stated on the next line
==22055== at [color="#FF0000"]0x4251930[/color]: ??? [color="#FF0000"]<----------------------------------------- NOTE: This address is in the range for libcuda.so, the CUDA driver, above!!!![/color]
==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)
==22055== Address 0x4251930 is not stack'd, malloc'd or (recently) free'd
==22055==
==22055== Process terminating with default action of signal 11 (SIGSEGV)
==22055== Access not within mapped region at address 0x4251930
==22055== at 0x4251930: ???
==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)
I have found the same problem but have narrowed it down a little. The problem only appears for me when compiling to 32bit Linux. I have tested using the 195.17 driver and CUDA 3.0 Toolkit. When running the same code (equivalent to the original post) on Mac or 64bit Linux it does not cause a Segmentation Fault (SIGSEGV). However, if the application is compiled for 32bit Linux or as a 32bit binary on 64bit Linux, the program will fault after the return of main if the dynamic library is unloaded.



Is there anyone from nVidia who can comment on what may be happening here? Simply not including a call to dlclose() is not a viable work around.



Using valgrind I can see a suspicious ioctl warning about uninitialized memory on the first call to a CUDA API function that causes the creation of a CUDA context. Then there is a call into the libcuda.so after it has been unloaded. I'm not sure if they are related.



********************* The ioctl warning ****************************

==22055== Syscall param ioctl(generic) points to uninitialised byte(s)

==22055== at 0xA4B869: ioctl (in /lib/libc-2.5.so)

==22055== by 0x4291BE2: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4274C4B: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4248CC8: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x4241196: (within /usr/lib/libcuda.so.195.17)

==22055== by 0x42E65B0: cuCtxCreate (in /usr/lib/libcuda.so.195.17)

==22055== by 0x416DA19: (within /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x416E56B: (within /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x41504A8: cudaGetSymbolAddress (in /usr/local/cuda/lib/libcudart.so.3.0.8)

==22055== by 0x400BD69: cudaError cudaGetSymbolAddress<int>(void**, int const&) (cuda_runtime.h:311)

==22055== by 0x400BCD8: simengine_runmodel (cudalibtest.cu:40)

==22055== by 0x804A13B: main (main.c:25)



**************************** Unloading of shared libraries followed by segfault **********************************

--22055-- Discarding syms at 0x400A000-0x400F000 in /tmp/cudalibtest.so due to munmap()

--22055-- Discarding syms at 0x4136000-0x417B000 in /usr/local/cuda/lib/libcudart.so.3.0.8 due to munmap()

--22055-- Discarding syms at 0x1C5000-0x2B0000 in /usr/lib/libstdc++.so.6.0.8 due to munmap()

--22055-- Discarding syms at 0x417B000-0x6C3B000 in /usr/lib/libcuda.so.195.17 due to munmap()

--22055-- Discarding syms at 0xAC8000-0xAEF000 in /lib/libm-2.5.so due to munmap()

--22055-- Discarding syms at 0xDD3000-0xDDF000 in /lib/libgcc_s-4.1.2-20080825.so.1 due to munmap()

==22055==

==22055== Jump to the invalid address stated on the next line

==22055== at 0x4251930: ??? <----------------------------------------- NOTE: This address is in the range for libcuda.so, the CUDA driver, above!!!!

==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)

==22055== Address 0x4251930 is not stack'd, malloc'd or (recently) free'd

==22055==

==22055== Process terminating with default action of signal 11 (SIGSEGV)

==22055== Access not within mapped region at address 0x4251930

==22055== at 0x4251930: ???

==22055== by 0x997E93: (below main) (in /lib/libc-2.5.so)

#2
Posted 01/14/2010 04:58 PM   
I have the same problem here.

I'm using cuda in a dynamic library (opened by scilab) , and when I close scilab, I got a segfault on exit function.
But if I call exit(); in my dynlib, I got no error message (but it's not a viable way to do things...).

This problem only occur on 32bits build. 64bits builds exit without problem.
I have the same problem here.



I'm using cuda in a dynamic library (opened by scilab) , and when I close scilab, I got a segfault on exit function.

But if I call exit(); in my dynlib, I got no error message (but it's not a viable way to do things...).



This problem only occur on 32bits build. 64bits builds exit without problem.

#3
Posted 05/05/2010 08:51 AM   
I have the same problem with Ubuntu 12.04 (64 bit) and CUDA 4.1. Has anyone fixed this issue?

[quote]

--3320-- Discarding syms at 0x294489e0-0x29489ef8 in /usr/local/cuda/lib64/libcudart.so.4.1.28 due to munmap()
--3320-- Discarding syms at 0x297541f0-0x29ccac18 in /usr/lib/nvidia-current-updates/libcuda.so.295.40 due to munmap()
==3320== Thread 7:
==3320== Jump to the invalid address stated on the next line
==3320== at 0x29CC849B: ???
==3320== by 0x25233E4B: ???
==3320== by 0x1E38021F0F: ???
==3320== Address 0x29cc849b is not stack'd, malloc'd or (recently) free'd
==3320==
==3320==
==3320== Process terminating with default action of signal 11 (SIGSEGV)
==3320== Access not within mapped region at address 0x29CC849B
==3320== at 0x29CC849B: ???
==3320== by 0x25233E4B: ???
==3320== by 0x1E38021F0F: ???
==3320== If you believe this happened as a result of a stack
==3320== overflow in your program's main thread (unlikely but
==3320== possible), you can try to increase the size of the
==3320== main thread stack using the --main-stacksize= flag.
==3320== The main thread stack size used in this run was 8388608.
[39m--3320-- Discarding syms at 0x76215d0-0x7622fa8 in /usr/lib/x86_64-linux-gnu/gconv/UTF-16.so due to munmap()
--3320-- Discarding syms at 0x21d6e260-0x21d73bc8 in /lib/x86_64-linux-gnu/libnss_compat-2.15.so due to munmap()
--3320-- Discarding syms at 0x221920b0-0x22198718 in /lib/x86_64-linux-gnu/libnss_nis-2.15.so due to munmap()
--3320-- Discarding syms at 0x21f7a060-0x21f87878 in /lib/x86_64-linux-gnu/libnsl-2.15.so due to munmap()
--3320-- Discarding syms at 0x2239e140-0x223a5a08 in /lib/x86_64-linux-gnu/libnss_files-2.15.so due to munmap()
[/quote]
I have the same problem with Ubuntu 12.04 (64 bit) and CUDA 4.1. Has anyone fixed this issue?







--3320-- Discarding syms at 0x294489e0-0x29489ef8 in /usr/local/cuda/lib64/libcudart.so.4.1.28 due to munmap()

--3320-- Discarding syms at 0x297541f0-0x29ccac18 in /usr/lib/nvidia-current-updates/libcuda.so.295.40 due to munmap()

==3320== Thread 7:

==3320== Jump to the invalid address stated on the next line

==3320== at 0x29CC849B: ???

==3320== by 0x25233E4B: ???

==3320== by 0x1E38021F0F: ???

==3320== Address 0x29cc849b is not stack'd, malloc'd or (recently) free'd

==3320==

==3320==

==3320== Process terminating with default action of signal 11 (SIGSEGV)

==3320== Access not within mapped region at address 0x29CC849B

==3320== at 0x29CC849B: ???

==3320== by 0x25233E4B: ???

==3320== by 0x1E38021F0F: ???

==3320== If you believe this happened as a result of a stack

==3320== overflow in your program's main thread (unlikely but

==3320== possible), you can try to increase the size of the

==3320== main thread stack using the --main-stacksize= flag.

==3320== The main thread stack size used in this run was 8388608.

[39m--3320-- Discarding syms at 0x76215d0-0x7622fa8 in /usr/lib/x86_64-linux-gnu/gconv/UTF-16.so due to munmap()

--3320-- Discarding syms at 0x21d6e260-0x21d73bc8 in /lib/x86_64-linux-gnu/libnss_compat-2.15.so due to munmap()

--3320-- Discarding syms at 0x221920b0-0x22198718 in /lib/x86_64-linux-gnu/libnss_nis-2.15.so due to munmap()

--3320-- Discarding syms at 0x21f7a060-0x21f87878 in /lib/x86_64-linux-gnu/libnsl-2.15.so due to munmap()

--3320-- Discarding syms at 0x2239e140-0x223a5a08 in /lib/x86_64-linux-gnu/libnss_files-2.15.so due to munmap()


#4
Posted 05/09/2012 08:26 PM   
Hi,
I had a similar problem and finally managed to narrow it down to a linking error of mine. He are the command lines I used to create the faulty library:

[code]
nvcc -c -arch=xxx -fpic *.cu
g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart
[/code]
And the dynamic library was working perfectly. Simply at unloading type, I experienced crashes with a cryptic message such as "pure virtual function called". And a colleague of mine finally identified the cause of it to the lack of explicit link to libcuda.so. By adding it in the above linking command line like this:

[code]
g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart -lcuda
[/code]
it solved the issue entirely.
I'm not sure whether it applies to your own problem, but just in case...
Hi,

I had a similar problem and finally managed to narrow it down to a linking error of mine. He are the command lines I used to create the faulty library:





nvcc -c -arch=xxx -fpic *.cu

g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart


And the dynamic library was working perfectly. Simply at unloading type, I experienced crashes with a cryptic message such as "pure virtual function called". And a colleague of mine finally identified the cause of it to the lack of explicit link to libcuda.so. By adding it in the above linking command line like this:





g++ -shared *.o -o libgpucode.so -L$CUDA_INSTALL_PATH/lib64 -lcudart -lcuda


it solved the issue entirely.

I'm not sure whether it applies to your own problem, but just in case...

#5
Posted 05/10/2012 03:31 AM   
Thanks Gilles for your answer. I did try your suggestion but sadly did not change anything for me. I don't experience any error message about a pure virtual function called though so it might not be the same issue.

I also implemented the simple code given in the first message above. And it does not segfault for me. I guess that it confirms the fact that it does work in 64 bit. I think the issue I have is closest to lebsack's. The valgrind outputs are very similar.

lebsack> Any chance that you eventually managed to fix it? Do you happen to use Qt in your application?

I ask because valgrind gives me errors during the creation of a QApplication object. I have also analysed the memory with the software TotalView and it finds the following error: "Allocator returned a misaligned block: heap may be corrupted". I even submitted a bug to Qt ([url="https://bugreports.qt-project.org/browse/QTBUG-25681"]bug report[/url]). If I removed the QApplication creation, I have no segmentation fault anymore. But I could just be lucky. Or it's really unrelated.

Also, to go even further that the valgrind output, the TotalView debbuger tells me that it seems to crash after the unload of libcuda in the call of the function clGetExtensionFunctionAddress (located in libcuda.so). It's a OpenCL function and I don't use OpenCL. Any idea of what could call it?
Thanks Gilles for your answer. I did try your suggestion but sadly did not change anything for me. I don't experience any error message about a pure virtual function called though so it might not be the same issue.



I also implemented the simple code given in the first message above. And it does not segfault for me. I guess that it confirms the fact that it does work in 64 bit. I think the issue I have is closest to lebsack's. The valgrind outputs are very similar.



lebsack> Any chance that you eventually managed to fix it? Do you happen to use Qt in your application?



I ask because valgrind gives me errors during the creation of a QApplication object. I have also analysed the memory with the software TotalView and it finds the following error: "Allocator returned a misaligned block: heap may be corrupted". I even submitted a bug to Qt (bug report). If I removed the QApplication creation, I have no segmentation fault anymore. But I could just be lucky. Or it's really unrelated.



Also, to go even further that the valgrind output, the TotalView debbuger tells me that it seems to crash after the unload of libcuda in the call of the function clGetExtensionFunctionAddress (located in libcuda.so). It's a OpenCL function and I don't use OpenCL. Any idea of what could call it?

#6
Posted 05/11/2012 09:53 PM   
Just to let you know. We found a "fix" to our problem (see valgrind output above). We added a call to a cuda function in our main (we added cuInit(0) even though we use the CUDA runtime API in the rest of our code). What we think is that our main executable now depends on libcuda and libcuda is therefore not unloaded from memory and the crash does not occur. Without this explicit dependence to libcuda, libcuda is unloaded too early and it crashes at the end of our main (maybe a static allocation). Well at least it fixed our issue in debug. In release we still have a segfault,

Maybe our build system is not good enough and we do something we shouldn't. But it's very difficult to pinpoint. Has someone already seen this type of problem?
Just to let you know. We found a "fix" to our problem (see valgrind output above). We added a call to a cuda function in our main (we added cuInit(0) even though we use the CUDA runtime API in the rest of our code). What we think is that our main executable now depends on libcuda and libcuda is therefore not unloaded from memory and the crash does not occur. Without this explicit dependence to libcuda, libcuda is unloaded too early and it crashes at the end of our main (maybe a static allocation). Well at least it fixed our issue in debug. In release we still have a segfault,



Maybe our build system is not good enough and we do something we shouldn't. But it's very difficult to pinpoint. Has someone already seen this type of problem?

#7
Posted 06/22/2012 06:44 PM   
Scroll To Top