Faster card for SP GTX 680 or GTX 580
Hi all,
we are going to replace 4 old C1060 with less expensive and more performant cards,
what do you suggest take 4 GTX680 or 4 GTX580. I

I did ear that even in SP GTX680 are underperforming if compared with GTX580. Is that
the case ?


Regards
Hi all,

we are going to replace 4 old C1060 with less expensive and more performant cards,

what do you suggest take 4 GTX680 or 4 GTX580. I



I did ear that even in SP GTX680 are underperforming if compared with GTX580. Is that

the case ?





Regards

#1
Posted 04/30/2012 11:58 PM   
We [url="http://blog.accelereyes.com/blog/2012/04/26/benchmarking-kepler-gtx-680/"]posted some benchmarks[/url] that may be useful to you. Good luck!
We posted some benchmarks that may be useful to you. Good luck!

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

#2
Posted 05/01/2012 06:23 AM   
To test the GTX680 have you used CUDA4.2? I'm realy surprised on how the GTX680 behaves not that good compared to GTX580 (if not worst in some scenario).
To test the GTX680 have you used CUDA4.2? I'm realy surprised on how the GTX680 behaves not that good compared to GTX580 (if not worst in some scenario).

#3
Posted 05/01/2012 09:43 AM   
Yes, CUDA 4.2 was used.
Yes, CUDA 4.2 was used.

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

#4
Posted 05/01/2012 02:27 PM   
It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.
It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

#5
Posted 05/01/2012 03:51 PM   
[quote name='parallelis' date='01 May 2012 - 05:51 PM' timestamp='1335887467' post='1403052']
It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.
[/quote]

You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.

Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?
If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.
[quote name='parallelis' date='01 May 2012 - 05:51 PM' timestamp='1335887467' post='1403052']

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.





You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.



Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?

If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.

#6
Posted 05/01/2012 08:02 PM   
Scroll To Top