Note: I’m assuming you are asking about comparing clusters to GPUs. A cluster is a set of compute nodes connected by a high-bandwidth, low latency interconnect like infiniband or myrinet. These typically have bandwidths of 10Gbps at latencies in the microsecond range. Clusters are for running one single calculation at a speed much faster than a single node can do.
Grids of embarisingly parallel tasks such as Seti@home or folding@home are a completely different topic. They compute many hundreds of independent calculations.
The short answer to the trade offs between GPUs and clusters is that it depends on the application. Here is a breakdown for my application, molecular dynamics (MD).
On a cluster, MD is breaks up the problem across a number of nodes. Each node updates a small set of particles and then communicates with it’s neighboring nodes. This process intercommunication occurs quite often (up to 700 times per second in my systems). So, in MD there is a fairly high fraction of communication to processing time. As more nodes are used in a computation, the performance increases, but the communication overhead increases as well. For example, I typically run simulations on 64 or 128 processors and the comm overhead is 40%. If I use any more, the point of diminishing returns is reached, and the overall performance does not increase. Even with this, I’m still waiting days for sims to run.
So, for my application, the biggest disadvantage of running on a cluster is the communication efficiency. To get a single job done in a reasonable time, I need to use almost twice as many compute nodes as needed. Other disadvantages of clusters include administration, cost, and downtime. The advantages are that clusters have been around long enough that there plenty of places to buy them from and most of the scientific software out there supports them. So, one can get working software up and running quickly.
Running on a single GPU, the biggest advantage over the cluster would be the lack of the communication overhead => more efficient calculations. That, and cost, with a $500 GPU potentially having the performance of a 32 node cluster in my application.
Disadvantages of the GPU: 1) High development time. There isn’t a well established set of software out there for doing calculations on the GPU yet. 2) Lack of double precision math (though this will be changing in the near future) 3) Not every type of calculation is well suited to be done on the GPU
Of course, I one day hope to take the good with the bad and build a cluster of GPUs for some insane performance :) Though I half expect the communication overhead to be so huge that it isn’t worthwhile. I’ll need to do some testing first.