To test the GTX680 have you used CUDA4.2? I'm realy surprised on how the GTX680 behaves not that good compared to GTX580 (if not worst in some scenario).

To test the GTX680 have you used CUDA4.2? I'm realy surprised on how the GTX680 behaves not that good compared to GTX580 (if not worst in some scenario).

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

[quote name='parallelis' date='01 May 2012 - 05:51 PM' timestamp='1335887467' post='1403052']
It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.
[/quote]

You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.

Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?
If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.

[quote name='parallelis' date='01 May 2012 - 05:51 PM' timestamp='1335887467' post='1403052']

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.

Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?

If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.

we are going to replace 4 old C1060 with less expensive and more performant cards,

what do you suggest take 4 GTX680 or 4 GTX580. I

I did ear that even in SP GTX680 are underperforming if compared with GTX580. Is that

the case ?

Regards

we are going to replace 4 old C1060 with less expensive and more performant cards,

what do you suggest take 4 GTX680 or 4 GTX580. I

I did ear that even in SP GTX680 are underperforming if compared with GTX580. Is that

the case ?

Regards

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

Parallelis.com, Parallel-computing technologies and benchmarks. Current Projects: OpenCL Chess & OpenCL Benchmark

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

[/quote]

You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.

Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?

If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.

It's logical for GTX 680: less registers per cuda core so more pressure on "Local" memory, less cache per cuda core, and finally while peak SP performance is better on GTX680, DP performance is on a-par with GTX 580. So GTX 680 could not really beat the GTX 580, even on these kinds of non-divergent pure mathematical processing, and is naturally worse than GTX 580 for all other kind of processing, especially CPU-code ported to CUDA or OpenCL.

You can usually rewrite things to 32 bits integers, especially FFT's, also for matrix calculations etc. It's like 25% slower than double precision such FFT single precision, but advantage of integer transforms (NTT's) are usually they are lossless, so for the real big and important matrixcalculatoins one already has no choice except to use integer based FFT's, usually under the name in math called numeric theoretic transforms.

Important for all this is speed of 32 x 32 bits instruction == 64 bits precision. How fast is that one on the 680? Same speed like Fermi?

If so then 680 of course annihilates everything there, especailly the AMD cards, as they need 4 PE's (a core is a processing element in openCL) which form 1 compute core. So it's 1536 cores for GTX680 then or 3072 for the GTX690 versus 512 for latest AMD videocard. Like big slamdunk faster.