Are the driver API and the runtime API mutually exclusive? (pyCUDA FAQ)

The pyCUDA FAQ Frequently Asked Questions about PyCUDA - Andreas Klöckner's Former Wiki states

On the other hand, the CUDA C Programming guide says

Is one of these wrong?

I think there is probably some validity to both statements. In general, CUDA applications tend to be written either using the driver API or using the runtime API. For most purposes there is little need to use both. However, there may be situations where it is desirable to use both, and some steps would normally be taken (proper establishing/sharing of CUDA context, for example). They are not “typically” used together and they cannot be simply intermixed willy-nilly; appropriate steps must be followed. Therefore it’s not entirely off-base to say they are exclusive.

In general, I think the statement from the CUDA C programming guide should be considered valid, and if appropriate proper steps are taken, yet “sketchy stuff” happens, that would/should be considered a bug.

My recommendation is that beginners and those without a specific contrary need should use the runtime API. PyCUDA does some pretty cool stuff (e.g. runtime compilation of code), and in order to work its magic it needs to use the driver API.

One thing to note is that with scikits.cuda pycuda and cublas play together just fine. This is in fact how I validate and comparison benchmark my gemm implementations:

https://github.com/NervanaSystems/nervanagpu/blob/master/benchmarks/cublas.py
https://github.com/NervanaSystems/nervanagpu/blob/master/benchmarks/cublas2.py
https://github.com/NervanaSystems/nervanagpu/blob/master/benchmarks/cublas_test.py

I believe the runtime API is implemented on top of the driver API. So you basically just need to make sure the resources that are automatically managed by the runtime are still live and active when you try to use them with the driver API, particularly the context handle. Use cuCtxGetCurrent to get the current context after initialized from the runtime (which is done implicitly for you with any API call). Or if you first create the context with the driver API, the runtime will detect the active context and automatically bind itself to it.

Chapter 3 of the Cuda Handbook is an excellent resource on this subject.
http://www.cudahandbook.com/

Thank you, Bob & Scott!

Perhaps the FAQ is out-of-date. It mentions CUDA 2.2 and pyCUDA 0.9 (not sure how old that is).