GTX 480 / 470 Double Precision Reduced?
  1 / 12    
Hi I know that question popped up before, but I hope if we ask enough someone official from NVIDIA will tell us how it is ;) .

Is the double precision performance of the consumer fermi cards reduced (by 75%) compared to that of the Tesla Line?

Best regards
Ceearem
Hi I know that question popped up before, but I hope if we ask enough someone official from NVIDIA will tell us how it is ;) .



Is the double precision performance of the consumer fermi cards reduced (by 75%) compared to that of the Tesla Line?



Best regards

Ceearem

#1
Posted 03/27/2010 07:23 PM   
[quote name='ceearem' post='1028373' date='Mar 27 2010, 08:23 PM']Is the double precision performance of the consumer fermi cards reduced (by 75%) compared to that of the Tesla Line?[/quote]

And if so, could it be re-enabled by flipping a bit in the driver? ;)
[quote name='ceearem' post='1028373' date='Mar 27 2010, 08:23 PM']Is the double precision performance of the consumer fermi cards reduced (by 75%) compared to that of the Tesla Line?



And if so, could it be re-enabled by flipping a bit in the driver? ;)

#2
Posted 03/27/2010 09:07 PM   
It seems that gaming cards such as GTX480 do not have increased dp performance. Are you sure that nvidia advertised such a feature in GTX480?
It seems that gaming cards such as GTX480 do not have increased dp performance. Are you sure that nvidia advertised such a feature in GTX480?

#3
Posted 03/28/2010 12:44 PM   
[quote name='TrekCZ' post='1028796' date='Mar 28 2010, 04:44 AM']It seems that gaming cards such as GTX480 do not have increased dp performance. Are you sure that nvidia advertised such a feature in GTX480?[/quote]
The point is I found no official statement that the gaming cards have reduced dp performance - they use the fermi chip, the same as the Tesla and the Quadro Cards will use. And the architecture of the fermi chip allows for double precision performance at half speed of single precision. So it is never a question of whether the tesla cards have increased DP performance but if the consumer cards have a reduced one. If they have less DP Performance than the Tesla cards, this will be because its either disabled through drivers or some kind of hardware jumper. So the question still remains if NVIDIA decided to "criple" the DP performance on the consumer cards in order to have one more advantage for the Tesla cards besides much more memory, higher reliability [I assume here that NVIDIA "hand picks" the chips for their professional cards and tests them more throughly than the chips for consumer cards] and ecc support.

Soooo ... tmurray any comment?? /stud.gif' class='bbc_emoticon' alt=':stud:' />
Best regards
Ceearem
[quote name='TrekCZ' post='1028796' date='Mar 28 2010, 04:44 AM']It seems that gaming cards such as GTX480 do not have increased dp performance. Are you sure that nvidia advertised such a feature in GTX480?

The point is I found no official statement that the gaming cards have reduced dp performance - they use the fermi chip, the same as the Tesla and the Quadro Cards will use. And the architecture of the fermi chip allows for double precision performance at half speed of single precision. So it is never a question of whether the tesla cards have increased DP performance but if the consumer cards have a reduced one. If they have less DP Performance than the Tesla cards, this will be because its either disabled through drivers or some kind of hardware jumper. So the question still remains if NVIDIA decided to "criple" the DP performance on the consumer cards in order to have one more advantage for the Tesla cards besides much more memory, higher reliability [I assume here that NVIDIA "hand picks" the chips for their professional cards and tests them more throughly than the chips for consumer cards] and ecc support.



Soooo ... tmurray any comment?? /stud.gif' class='bbc_emoticon' alt=':stud:' />

Best regards

Ceearem

#4
Posted 03/28/2010 12:54 PM   
I'd like to know it too.
I'd like to know it too.

#5
Posted 03/28/2010 02:40 PM   
The following answer I received a few months ago from James Wang, a technical marketing analyst from NVIDIA:



[b]Q:[/b] In the GeForce family, double-precision throughput has been reduced to 25% of the full design. Was this decision made to discourage the use of these products for professional use (where Quadro and Tesla are targeted?) Considering the fused support of single- and double-precision calculations in the CUDA cores, how was this change even applied?

[b]A:[/b] Yes, full-speed double precision performance is a feature we reserve for our professional customers. Consumer applications have little use for double precision, so this does not really affect GeForce users. Having differentiated features and pricing is actually fairer for all. Given the option of enabling all professional features on GeForce and having gamers pay for them, or disabling them on GeForce and offering a more compelling price, we feel the latter is the better choice.

Regarding the second part of the question, the architecture is designed to support this kind of configuration.
The following answer I received a few months ago from James Wang, a technical marketing analyst from NVIDIA:







Q: In the GeForce family, double-precision throughput has been reduced to 25% of the full design. Was this decision made to discourage the use of these products for professional use (where Quadro and Tesla are targeted?) Considering the fused support of single- and double-precision calculations in the CUDA cores, how was this change even applied?



A: Yes, full-speed double precision performance is a feature we reserve for our professional customers. Consumer applications have little use for double precision, so this does not really affect GeForce users. Having differentiated features and pricing is actually fairer for all. Given the option of enabling all professional features on GeForce and having gamers pay for them, or disabling them on GeForce and offering a more compelling price, we feel the latter is the better choice.



Regarding the second part of the question, the architecture is designed to support this kind of configuration.

GeForce Technical Marketing

#6
Posted 03/28/2010 02:54 PM   
Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC). I never saw any compelling reason to put a C1060 into a developer workstation unless you needed 4 GB of memory.

My code continues to avoid double precision (mostly because development started on compute 1.0 devices), and it looks like it will be profitable to continue that trend when possible, if only to target GeForce cards.
Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC). I never saw any compelling reason to put a C1060 into a developer workstation unless you needed 4 GB of memory.



My code continues to avoid double precision (mostly because development started on compute 1.0 devices), and it looks like it will be profitable to continue that trend when possible, if only to target GeForce cards.

#7
Posted 03/28/2010 03:59 PM   
[quote name='seibert' post='1028909' date='Mar 28 2010, 11:59 AM']Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC)..[/quote]

oh shiit, I missed that part of their 'strategy'. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.

still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn't necessarily make sense to buy the overpriced C & S products.

the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???

I'm counting on a clever hack of a driver, some day by someone.
On a separate note, if someone needs support for a summer job dealing with toptimization of gpu drivers... :-)
[quote name='seibert' post='1028909' date='Mar 28 2010, 11:59 AM']Argh, too bad. At least now there is a significant feature to drive individual Tesla sales aside from memory size (and ECC)..



oh shiit, I missed that part of their 'strategy'. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.



still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn't necessarily make sense to buy the overpriced C & S products.



the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???



I'm counting on a clever hack of a driver, some day by someone.

On a separate note, if someone needs support for a summer job dealing with toptimization of gpu drivers... :-)

#8
Posted 03/28/2010 04:34 PM   
[quote]still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn't necessarily make sense to buy the overpriced C & S products.[/quote]
True but the card price is not the only thing, since you need the pc as well where you put it in.

[quote]the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???[/quote]
Thats a point I would guess is true. First of all I would definitely think that the professional cards (Tesla and Quadro) hae "hand picked" chips, and I guess they are better tested.

[quote]I'm counting on a clever hack of a driver, some day by someone.[/quote]
I wouldn't count on that, since this could probably be implemented by an "hardware jumper". For the newer quadro cards this is the way they made sure that you cannot use the quadro drivers (with much better performance in CAT etc.) with the consumer products.
still, for the list price of c2060 you can have ~5 gtx480 (with 1/4 DP and 1/2 memory) so even for pure dp performance it doesn't necessarily make sense to buy the overpriced C & S products.


True but the card price is not the only thing, since you need the pc as well where you put it in.



the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???


Thats a point I would guess is true. First of all I would definitely think that the professional cards (Tesla and Quadro) hae "hand picked" chips, and I guess they are better tested.



I'm counting on a clever hack of a driver, some day by someone.


I wouldn't count on that, since this could probably be implemented by an "hardware jumper". For the newer quadro cards this is the way they made sure that you cannot use the quadro drivers (with much better performance in CAT etc.) with the consumer products.

#9
Posted 03/28/2010 05:20 PM   
I think the box with cpu etc. inside is ~$2k, so the multiple cards inside are by far more expensive (5+ times more for 4-card node).
I think the box with cpu etc. inside is ~$2k, so the multiple cards inside are by far more expensive (5+ times more for 4-card node).

#10
Posted 03/28/2010 05:27 PM   
[quote name='pawel_astro' post='1028928' date='Mar 28 2010, 10:34 AM']oh shiit, I missed that part of their 'strategy'. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.[/quote]

Well, I don't begrudge them for trying to make CUDA sustainable with non-gamer income. The gamer market is running out of steam to fund the R&D for better GPU Computing features, and there are not enough other consumer-aimed compute heavy tasks to pick up the slack. (I would argue the reception of the GTX 480/470 by the review sites is lukewarm for this reason.) The HPC community is much smaller, so you have to extract more $$$ per card to maintain the same income. If this is what it takes to keep CUDA alive, so be it. (Of course, I'd love for double precision to become a must-have feature for GeForce customers. Whoever can release those applications does all of us a favor.)


[quote]the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???[/quote]

Tesla cards tend to run at lower clock rates than the top-of-the-line GeForce, probably in part for this reason. However, even if GeForce is less reliable, you would need the failure rate to be 5x the Tesla for that to be cost effective in a workstation where you don't need extremely high uptime.
[quote name='pawel_astro' post='1028928' date='Mar 28 2010, 10:34 AM']oh shiit, I missed that part of their 'strategy'. i hate them for that :-(, and especially for lying about the supposed additional costs of NOT disabling DP, for which the gamers would have to pay.



Well, I don't begrudge them for trying to make CUDA sustainable with non-gamer income. The gamer market is running out of steam to fund the R&D for better GPU Computing features, and there are not enough other consumer-aimed compute heavy tasks to pick up the slack. (I would argue the reception of the GTX 480/470 by the review sites is lukewarm for this reason.) The HPC community is much smaller, so you have to extract more $$$ per card to maintain the same income. If this is what it takes to keep CUDA alive, so be it. (Of course, I'd love for double precision to become a must-have feature for GeForce customers. Whoever can release those applications does all of us a favor.)





the only serious reason would be if the air-cooled gtx cards fail [more than Teslas]. do they???




Tesla cards tend to run at lower clock rates than the top-of-the-line GeForce, probably in part for this reason. However, even if GeForce is less reliable, you would need the failure rate to be 5x the Tesla for that to be cost effective in a workstation where you don't need extremely high uptime.

#11
Posted 03/28/2010 05:40 PM   
[quote name='seibert' post='1028909' date='Mar 29 2010, 02:59 AM']My code continues to avoid double precision (mostly because development started on
compute 1.0 devices), and it looks like it will be profitable to continue that trend when
possible, if only to target GeForce cards.[/quote]
Still getting good mileage with [b]float[/b] here. And must admit, getting a bit agitated by numbers
like these: [url="http://www.anandtech.com/video/showdoc.aspx?i=3783&p=6"]folding on GTX 480[/url]. Those last two graphs, ray tracing and folding, for real? /omg.gif' class='bbc_emoticon' alt=':omg:' />
[quote name='seibert' post='1028909' date='Mar 29 2010, 02:59 AM']My code continues to avoid double precision (mostly because development started on

compute 1.0 devices), and it looks like it will be profitable to continue that trend when

possible, if only to target GeForce cards.

Still getting good mileage with float here. And must admit, getting a bit agitated by numbers

like these: folding on GTX 480. Those last two graphs, ray tracing and folding, for real? /omg.gif' class='bbc_emoticon' alt=':omg:' />

#12
Posted 03/28/2010 07:30 PM   
[quote name='nnunn' post='1029033' date='Mar 28 2010, 01:30 PM']Still getting good mileage with [b]float[/b] here. And must admit, getting a bit agitated by numbers
like these: [url="http://www.anandtech.com/video/showdoc.aspx?i=3783&p=6"]folding on GTX 480[/url]. Those last two graphs, ray tracing and folding, for real? /omg.gif' class='bbc_emoticon' alt=':omg:' />[/quote]

Such is the magic of an L2 cache when your working set of data (or some part of it) can fit inside the cache.
[quote name='nnunn' post='1029033' date='Mar 28 2010, 01:30 PM']Still getting good mileage with float here. And must admit, getting a bit agitated by numbers

like these: folding on GTX 480. Those last two graphs, ray tracing and folding, for real? /omg.gif' class='bbc_emoticon' alt=':omg:' />



Such is the magic of an L2 cache when your working set of data (or some part of it) can fit inside the cache.

#13
Posted 03/28/2010 09:49 PM   
[quote name='seibert' post='1029124' date='Mar 28 2010, 02:49 PM']Such is the magic of an L2 cache when your working set of data (or some part of it) can fit inside the cache.[/quote]
Not to forget the random access problem in main memory, and or atomic functions.
[quote name='seibert' post='1029124' date='Mar 28 2010, 02:49 PM']Such is the magic of an L2 cache when your working set of data (or some part of it) can fit inside the cache.

Not to forget the random access problem in main memory, and or atomic functions.

#14
Posted 03/28/2010 10:02 PM   
So on highly streamed non-branching double percision code which is faster, Fermi or 5xxx?

Will we see benchmark results showing Tesla>5xxx>gtx480 for double percision GFlops?

Its a pity Anandtech didn't include some double percision compute benchmarks both of raw performance and on more complex problems.
So on highly streamed non-branching double percision code which is faster, Fermi or 5xxx?



Will we see benchmark results showing Tesla>5xxx>gtx480 for double percision GFlops?



Its a pity Anandtech didn't include some double percision compute benchmarks both of raw performance and on more complex problems.

#15
Posted 03/28/2010 11:22 PM   
  1 / 12    
Scroll To Top