One powerful GPU vs. several low-end GPU's Which is better? For "embarassingly" parallel

With all due respect to the high-end processors like the Tesla series,

Why doesn’t everyone just buy several low-end GPUs instead?
Tesla C1060: 240 scalar processors, cost: $1250.00; cost per SP: $5.21
Geforce GT 220: 96 scalar processors, cost: $80.00; cost per SP: $0.83
I believe the processor speed is nearly the same.
The Tesla is more than 6 times more expensive for the same computing power.

I know PCI slots are limited, but there are boards around with plenty of slots.
I know that smaller device memory is an issue, but this is also an issue with the Tesla for extremely large data sets.

IMHO, it seems the price is not proportional to benefit. Could anyone prove me wrong (since I own a Tesla device)?

That’s not a fair comparison because Tesla users pay a reliability premium

I got a GTX 260 core 216 for $160 (used), so that would be $0.74 per SP.

It depends. Very often (most often) designing your app to intercommunicate and coordinate with multiple GPUs is a pretty significant coding issue… so in that sense, one big GPU is always better. For example, if you were cracking MD5 sums or something, there’s no intercommunication needed at all so lots of cheaper boards would work very very well.

But in the general case, it’s always easier to design solutions such that all the workers are on one board and therefore share the same device memory and can easily self-coordinate as necessary. Memory size is also often an issue, which is the 4GB Tesla’s strength… it can solve problems that even 20 GT220s would never solve because they just don’t have the memory to load large problems. Again, of course, this is always application dependent.

you’re confusing this product with the GT 240

Note that peak memory bandwidth on the GT 240 with (G)DDR3 memory is just 25.6 MB/sec

and 32 MB/sec at best with GDDR5 memory.

EDIT: above numbers are actually for the GT220 product, argh ;)

Compare this to the 120 GB/sec or so that I get from my GTX 260

Christian

One small typo: GDDR5 @ 128bits @ 1800MHz is ~= 56 GB/sec of bandwidth for a 28GB/sec memcopy. Bumping it up to 2000MHz gets you 62.5 GB/sec or about 50% of the 448 bit DDR3 GTX 260.

As you note, it’s nowhere near as powerful as the GTX 260 but it is surprisingly peppy and a fun/cheap card for CUDA. :)

bandwidthTest --device=2 --memory=pinned --dtod 

Device 2: GeForce GT 240

 Quick Mode

Device to Device Bandwidth, 1 Device(s)

   Transfer Size (Bytes)	Bandwidth(MB/s)

   33554432			 	26718.7

I would look at a gtx285 (which has 2GB versions) which is much more similar to the tesla (and actually out performs it if you are nitpicky, by 20%-30% depending on the vendor)

They are aimed at a different market though. With the GeForce, NVidia is in charge only on the GPU, not the other electronics, with the Tesla, NVidia takes care of everying. GeForces are overclocked (my 285 by 30%) and throttle themselves when hot. The Tesla is better tested for optimal clock rate and is designed to run 24/7 without throttling, unlike the geforce. On one of my machines that has a 285 and a tesla, the 285 takes two minutes (probably even less) to go over 80c, the tesla runs for much longer and stays under 70c with the same cooling.

If you build a desktop system that you don’t care about down time with, put geforces in, if you build a production or server system and/or need the 4gb ram then you need a tesla.

Not all GeForce cards are overclocked. I specifically purchase cards running at the standard NVIDIA reference clock rates for this reason. But yes, Tesla is supposed to get better testing. (Although I still wish I could see some kind of MTBF or annualized failure rate or some comparison to quantify this.)

Yes you’re right. I quoted peak bandwidths for the GT220 model here. This time it was me confusing the cards ;)

And I am biased because I have the GT220 for software development purposes.

Let me add something new to the discussion:

Four GTX240 have the potential to replace two older GTX260 models (192 shaders each) if one does not need double precision support. And four GT240 cards will not require any PCI express +12V connectors from the PSU. But is it actually safe to draw 4x70 Watts peak from the PCB only, given that the mainboard offers four physical PCI Express x16 slots? Could the mainboard take damage?

This is a good point. I’ve seen some motherboards that have a supplemental PCI-Express power feed on the board that takes a 4-pin molex. I suspect that without some kind of extra power source, no motherboard can provide 280W to the PCI-Express slots.

It’s a really good question. Something that I was concerned about too since I’m running 3 at once and am tempted to put in a fourth in the PCIe x1 slot (e.g. ZOTAC ION PCIe x1).

First, there are motherboards with auxiliary Molex or 4-pin floppy connectors next to each PCIe slot (DFI motherboards are a good example).

Second, the total 12V current that you can source through the ATX and 8-pin EPS connector is actually rather large: ATX [ 12 amps / 144 Watts ] + 8-pin EPS [ 28 amps / 336 Watts ] = 480 Watts

The issue then becomes whether your power supply can deliver that current (via those particular connectors) and whether your motherboard’s traces can deliver that much current (I suspect they can).

I think the ATX spec might have a total motherboard power recommendation but it doesn’t seem to be public and surely has been bent by some of the odder motherboards out there.

I haven’t smelled smoke yet. External Media