Some questions on selecting a certain number of SM to simulate.

Dear CUDA programmers,
I am a CUDA newbie, and in my project I want to teat CUDA by simulating a heat conduction problem. More precisely, I want to select a certain number of SM to simulate the same heat conduction problem each time. Since I remember that in MPI I can use a command like “mpirun -np 4 helloworld” to assign a certain number of processors to run my program, I guess CUDA would also have this property. My GPU is Geforce 210M and it has 2 SM. I want to test only 1 SM to simulate my heat conduction problem. Would I accomplish this job? Is it possible? Thank you in advance!

The most important part here is that you probably do not want to do that. That MPI command would be more akin to having a multi-gpu system and choosing how many GPUs you want your application to run on, rather than having a single GPU and selecting how many SMs you want to use.

I do not know of a way to force multiple thread blocks to use a single SM (others will correct me on that one if I am wrong), so the only way that I know of to only use a single SM is to only launch a single thread block.

This does not scale, as I do not think it is absolutely guaranteed that if you launch 2 thread blocks they will be mapped to two different SMs (logic would say so, but I do not know that is it guaranteed).

Dear Ailleur,
Please see the following picture,
External Media
This is what I mean. I guess we can draw some analogy between CUDA programming and MPI programming. Is it correct? Or are there some API I can use to accomplish my job?