New 285 and 295 cards

Today’s the day the NDAs on the new NVidia boards are lifted. The 285 is a slightly boosted-clock, 55nm, version of the 280, you get about 10% speed boost and a little lower wattage.

The 295 is much more interesting, it’s like two 260s on one board, similar to the older 9800GX2.
One review, though of course more about game performance, is at HardOCP.

The 295 is much more interesting to CUDA programmers, because we’ll take all the FLOPS we can get!
One nice comment in the above review is that the 295’s SP shader speed could easily be overclocked to full 280 speed.

The first one to build a FASTRA-like box with 3 or 4 of these puppies in it will win the CUDA forum’s official awe and respect. The price for such a box would still be probably only $3500.

does it count if I do it first?

Not really, since you have our awe and respect already.

On a somewhat related note: Does anyone know if these new 100M-series chips are the long-awaited mobile version of the GT200? If so, are they compute capability 1.2 or 1.3?

There’s vague info here and more specific info here but when you compare to the chips they’re replacing, it looks like these are all just 55nm speed-boosted G92 layouts. This isn’t a bad thing, though.

BTW, you can’t even trust NVidia’s tech documents for this, there’s 112 SP’s listed for the 9800MGTX here but 64 listed here. 112 is correct.

I actually do a lot of CUDA development on my laptop, so I’d love to get more power, too!

I suspect Apple may try to push NVidia about laptop GPUs more with their thin/portable/small ethic but new focus on OpenCL in OSX 10.6.

to build a machine is no problem - the problem may be whether the actual applications will be forgiving towards the shared cpu-gpu bandwidth on today’s PCIe buses.

the sdk examples have some codes (nbody) which will likely be ok and some (fluidsGL) which are strongly bandwidth-dependent… they won’t like two gpus on 295 sharing one x16 slot, not to mention when you try to install 3 295 cards and 2 gpus on the third (and second, on some boards) card will have to fight for the shared x8 bandwidth.

whaddya think?

Well, there’s at least one application: The FASTRA computer runs 3D tomography calculations, and according to their technical FAQ, the problem factorizes so well, they see linear speedup even with 8 CUDA devices. (4 x 9800 GX2)

Assuming a computer with 4x GTX 295 cards didn’t melt or catch fire, it should be at least 2x faster than the FASTRA running their tomography applications.

NAMD also scales linearly in my testing; as far as I know it’s totally compute bound and bandwidth (either PCIe or within a device) doesn’t seem to have much impact on perf.

I was thinking of assembling a 3 x 295 machine and you’re now challenging me to build a 4 x 295… that’s a bit unfair :-)

But a challenge nonetheless ;)

On a topic more relevant to CUDA.

This may be something very well known, I just haven’t seen it stated anywhere. Anyway, I’m trying to figure out if the gtx295 cards require programming in CUDA as if there were 2 separate GPU boards, or if code I’ve written for a single 280 will immediately scale to the ‘internal’ multi-gpu of the gtx295. The latter would be significantly more pleasant. External Image

The GTX295 will almost certainly appear as two separate CUDA devices, as this is how the 9800 GX2 worked. Making the two GPUs appear as one to CUDA would require a significant (but really awesome) change in the software and hardware.

(OpenGL is high enough level that the driver can do the SLI magic invisible to the programmer. It would be very difficult to seamlessly extend a kernel to multiple GPUs in CUDA.)

…but unfortunately it’s not that way.
cuda sees them as two seperate devices.

I would say thankfully. If this would be done automatically, it would almost certainly be dog-slow ;)

disclaimer:
“unfortunately” refers to the phrase “significantly more pleasant” used in the previous post and does not reflect the author’s opinion. ;-)
yeah, i’m pleased it’s the way it is… even though it would be nice to have some (fast) way to communicate between those cards.

I think semi-automatic multiGPU usage (in some cases) is definitely a solvable problem, but I suspect we it will take another major software and hardware iteration or two before it could be implemented.

Check this out…GT212:

http://www.guru3d.com/news/nvidias-40nm-gt…-384-sps-gddr5/

Can anyone here estimate the expected Teraflops based on these “rumours”?

Wikipedia estimates 2930 GFLOPS:

http://en.wikipedia.org/wiki/Comparison_of…X_2xx.29_series

Hi,

thought it’d be a nice first post to tell you I have 4 of these (GTX295) here since Saturday. External Media

However, I have not had the chance to put all of them in one PC, yet. Will let you know, though.

Best regards,

joarf