cuda context creation and association in cuda runtime API applications

I want to understand how a cuda context is created and associated with a kernel in cuda runtime API applications?

I know it is done under the hood by driver APIs. But I would like to understand the timeline of the creation.

For a start I know cudaRegisterFatBinary is the first cuda api call made and it registers a fatbin file with the runtime. It is followed by a handful of cuda function registration APIs which calls cuModuleLoad in the driver layer. But then if my Cuda runtime API application invokes cudaMalloc how is the pointer provided to this function associated with the context, which I believe should have been created beforehand. How does one get a handle to this already created context and associate the future runtime API calls with it? Please demystify the internal workings.

Thank you!