Kernel arguments are passed in a particular bank of constant memory, for the architectures enumerated. Size limitation I don’t know off hand, but it is probably listed in the CUDA documentation. Have you checked it? You could also run an experiment using kernels with an increasing number of arguments.
When registers get spilled, their data is stored to local memory, which is a thread-local mapping of a portion of global memory. This applies to all the architectures enumerated.