A question on concurrent kernel execution

Hi, all!

I have a question on concurrent kernel execution (3.2.5.3 in NVIDIA CUDA C Programming Guide Version 4.1).

Quote : “Some devices of compute capability 2.x can execute multiple kernels concurrently.”

What’s the exact meaning of “concurrently”? I have two alternatives.

a) in parallel. One kernel might not be able to use all the compute resources on a GPU, the other kernels can be scheduled to use the remaining resources at the same time.

b) the kernels are scheduled like multitasks on a single GPU. It seems that they are running simultaneously, but the executions of the kernels are not overlapped in the time line.

Which one is currect? or neither.

Thank you in advance!

(a) is the correct interpretation. Note that compute capability 2.0 devices (and I assume CC 3.0 now) can only simultaneously execute kernels from the same CUDA context, but from different CUDA streams.

Thank you, Seibert!