Confusion With the concepts

What do the following terms mean. I do know all these but I have confusion in what means what.

  1. Thread. How different is it from Core?
  2. Block. How different is it from Tile, and how are block and Tile different from SM?
  3. Grid. Is it Blocks*Threads or is it any different? What is the max grid size for a K20?
  4. Why and are threads/block and threads/SM different? K20 has 1024 threads/block and 2048 threads/SM. What does it mean? Can I associate 2048 threads to every SM? Then why do I need threads/block separately? Or if the other way, only if 1024 threads could be assigned using one block, then how can we deal with 2048 threads in one SM? (If Block is logical and SM is physical).
  5. And what exactly is kernel? If kernel is one call of grid and block dimensions, I have heard that K20 can run 2 Million threads per kernel. Then if threads/SM is only 2048 and as there are only 13 SMs in K20, 2048*13 is just about 20K threads. How is it justified?
  6. Finally, What are warps and why do we need only 32threads per warp, when an entire block is ready at our service?
  1. Thread: think of it as an instruction pointer or index. Forget about core for now.

  2. Block: block of threads, can be 1D, 2D or 3D

  3. Grid: grid of blocks, can be 1D, 2D or 3D.

  4. Each GPU generation/model/architecture is different. Query the device to understand it’s limits and use those limits to constrain/adept/modify the software.

  5. Kernel is a program which is executed on the GPU.

  6. A warp is a design attempt to execute the same instruction for 32 threads. Sometimes it’s possible (parallel execution) sometimes it’s not (threads will be serialized).

For more details/information consult manuals ;) :)