Cuda Run time library unload

Hi,

I am running into cudaErrorCudartUnloading error in my application. I found that this occurs because I am calling cudaFree from destructors of global static variables, and the destructors may be getting called after the cuda run time library is unloaded.

Is there a robust way to force my application to keep the cuda run time library loaded, until after I call all the necessary cudaFree functions. Can I control/delay the unloading of cuda run time library by any means ? For instance, can I have my class maintain certain variables/handles that will force cuda run time library to stay loaded.

Thanks!

No. It is a bad design practice to put calls to the CUDA runtime API in constructors that may run before main and destructors that may run after main.

The run time API I am calling is essentially a cudaFree command. I want to deallocate device memory pointers owned by my static object, in the destructor of the object. And yes, I understand that the destructors of the static variable may be called in any order during application exit.

Is there a way to check in my destructor if the run time module is already unloaded, and if so suppress calls to cudaFree. I guess that if the module is already unloaded and the context is cleared, any memory should be getting implicitly deleted anyway, and I don’t have to use explicit cudaFree calls.

You’re already doing that. The error code returned by the runtime API call is exactly that indication.
In my view, you’re exploring UB as far as CUDA is concerned. I don’t see explicit use of UB as a good design practice. You may think I’m fundamentally wrong or disagree with me. That’s fine; it’s the nature of community. Anyway, the error code you received is the only thing you should expect after CUDA runtime shuts off. Any call into the CUDA runtime at that point would return that error code, even a call that tells you if the CUDA runtime is available or not (there is no such call to my knowledge, but the entire CUDA runtime API is documented: http://docs.nvidia.com/cuda/cuda-runtime-api/index.html#axzz4nzXQGo3P )

Hi @Robert_Crovella, could you please elaborate why it is a bad design practice to put calls to the CUDA runtime API in constructors that may run before main and destructors that may run after main?

If I put cudaDeviceReset() in constructor of a global object, it seems ok for the process to correctly finish, and no errors.

Here is a general write up. Before deciding that everything is OK with your code, make sure you are doing proper CUDA error checking, including on the calls in the constructor/destructors. The easiest to demonstrate hazard that I have seen is when CUDA calls are in destructors of global objects.