Multi-threading and Driver API

Hi.
I’m experimenting with CUDA Driver API and I have some problems with multi-threading applications.

As I know, each host-thread need to initialize cuda device itself (cuInit(0)).
I created global variables for device, context, module and function.
Applications doesn’t work correctly, if threads use same variables. And it works fine, if each thread use their own global variable for module and function (and it work, if device and context is the same).
So I have 3 questions:

  1. Why it works with same context?

  2. If it works with same context, why it doesn’t work with same CUmodule and CUfunction? As I know modules and functions bound to the context.

  3. What is the best way to using driver api in multi-threading program?