The point is I found no official statement that the gaming cards have reduced dp performance - they use the fermi chip, the same as the Tesla and the Quadro Cards will use. And the architecture of the fermi chip allows for double precision performance at half speed of single precision. So it is never a question of whether the tesla cards have increased DP performance but if the consumer cards have a reduced one. If they have less DP Performance than the Tesla cards, this will be because its either disabled through drivers or some kind of hardware jumper. So the question still remains if NVIDIA decided to “criple” the DP performance on the consumer cards in order to have one more advantage for the Tesla cards besides much more memory, higher reliability [I assume here that NVIDIA “hand picks” the chips for their professional cards and tests them more throughly than the chips for consumer cards] and ecc support.
The following answer I received a few months ago from James Wang, a technical marketing analyst from NVIDIA:
Q: In the GeForce family, double-precision throughput has been reduced to 25% of the full design. Was this decision made to discourage the use of these products for professional use (where Quadro and Tesla are targeted?) Considering the fused support of single- and double-precision calculations in the CUDA cores, how was this change even applied?
A: Yes, full-speed double precision performance is a feature we reserve for our professional customers. Consumer applications have little use for double precision, so this does not really affect GeForce users. Having differentiated features and pricing is actually fairer for all. Given the option of enabling all professional features on GeForce and having gamers pay for them, or disabling them on GeForce and offering a more compelling price, we feel the latter is the better choice.
Regarding the second part of the question, the architecture is designed to support this kind of configuration.
Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC). I never saw any compelling reason to put a C1060 into a developer workstation unless you needed 4 GB of memory.
My code continues to avoid double precision (mostly because development started on compute 1.0 devices), and it looks like it will be profitable to continue that trend when possible, if only to target GeForce cards.
oh shiit, I missed that part of their ‘strategy’. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.
still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn’t necessarily make sense to buy the overpriced C & S products.
the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???
I’m counting on a clever hack of a driver, some day by someone.
On a separate note, if someone needs support for a summer job dealing with toptimization of gpu drivers… :-)
True but the card price is not the only thing, since you need the pc as well where you put it in.
Thats a point I would guess is true. First of all I would definitely think that the professional cards (Tesla and Quadro) hae “hand picked” chips, and I guess they are better tested.
I wouldn’t count on that, since this could probably be implemented by an “hardware jumper”. For the newer quadro cards this is the way they made sure that you cannot use the quadro drivers (with much better performance in CAT etc.) with the consumer products.
Well, I don’t begrudge them for trying to make CUDA sustainable with non-gamer income. The gamer market is running out of steam to fund the R&D for better GPU Computing features, and there are not enough other consumer-aimed compute heavy tasks to pick up the slack. (I would argue the reception of the GTX 480/470 by the review sites is lukewarm for this reason.) The HPC community is much smaller, so you have to extract more $$$ per card to maintain the same income. If this is what it takes to keep CUDA alive, so be it. (Of course, I’d love for double precision to become a must-have feature for GeForce customers. Whoever can release those applications does all of us a favor.)
Tesla cards tend to run at lower clock rates than the top-of-the-line GeForce, probably in part for this reason. However, even if GeForce is less reliable, you would need the failure rate to be 5x the Tesla for that to be cost effective in a workstation where you don’t need extremely high uptime.
I read in one of the reviews that RD 5870 has about 2700 Gflops computational performance in single precision, while GTX 480 has about 1300, if that was true, would’nt 5870 beat the crap out of GTX 480 in every game?
It is very difficult to get anywhere near peak performance from the Evergreen/Cypress architecture
I’d bet on #2. Looking at the architecture, it seems that Cypress is designed explicitly for graphics (hence the 4-way VLIW execution units). And yet, even for games, Cypress is about even with Fermi. However, seemingly more efficient ($ and Watt) for graphics vs Fermi.
Hard to tell though because ATI/AMD has no decent documentation.
Where is this in writing, other than a forum post?! After pushing how great Fermi would be for CUDA, NVidia needs to be honest about the capabilities of the consumer cards. I’m not overly upset by the decision (nor surprised), but this needs to be clear.
But if I read the concerning threads correct this is a very specific example, and you don’t even compare apples with apples since in the Ati example the matrizes are in special order.
Also peak performance in some examples is actually not that important from my point of view. A very important question is how easy it is to programm, and how much effort you need to do to get close to it.