New 285 and 295 cards
  1 / 4    
Today's the day the NDAs on the new NVidia boards are lifted. The 285 is a slightly boosted-clock, 55nm, version of the 280, you get about 10% speed boost and a little lower wattage.

The 295 is much more interesting, it's like two 260s on one board, similar to the older 9800GX2.
One review, though of course more about game performance, is at [url="http://enthusiast.hardocp.com/article.html?art=MTYwOCwxLCxoZW50aHVzaWFzdA=="]HardOCP.[/url]

The 295 is much more interesting to CUDA programmers, because we'll take all the FLOPS we can get!
One nice comment in the above review is that the 295's SP shader speed could easily be overclocked to full 280 speed.


The first one to build a [url="http://fastra.ua.ac.be/en/index.html"]FASTRA[/url]-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.
Today's the day the NDAs on the new NVidia boards are lifted. The 285 is a slightly boosted-clock, 55nm, version of the 280, you get about 10% speed boost and a little lower wattage.



The 295 is much more interesting, it's like two 260s on one board, similar to the older 9800GX2.

One review, though of course more about game performance, is at HardOCP.



The 295 is much more interesting to CUDA programmers, because we'll take all the FLOPS we can get!

One nice comment in the above review is that the 295's SP shader speed could easily be overclocked to full 280 speed.





The first one to build a FASTRA-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.

#1
Posted 01/08/2009 07:21 PM   
[quote name='SPWorley' post='488164' date='Jan 8 2009, 11:21 AM']The first one to build a [url="http://fastra.ua.ac.be/en/index.html"]FASTRA[/url]-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.[/quote]
does it count if I do it first?
[quote name='SPWorley' post='488164' date='Jan 8 2009, 11:21 AM']The first one to build a FASTRA-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.

does it count if I do it first?

#2
Posted 01/08/2009 07:43 PM   
[quote name='tmurray' post='488174' date='Jan 8 2009, 11:43 AM']does it count if I do it first?[/quote]

Not really, since you have our awe and respect already.
[quote name='tmurray' post='488174' date='Jan 8 2009, 11:43 AM']does it count if I do it first?



Not really, since you have our awe and respect already.

#3
Posted 01/09/2009 12:29 AM   
On a somewhat related note: Does anyone know if these new 100M-series chips are the long-awaited mobile version of the GT200? If so, are they compute capability 1.2 or 1.3?
On a somewhat related note: Does anyone know if these new 100M-series chips are the long-awaited mobile version of the GT200? If so, are they compute capability 1.2 or 1.3?

#4
Posted 01/09/2009 02:09 AM   
[quote name='seibert' post='488347' date='Jan 8 2009, 06:09 PM']On a somewhat related note: Does anyone know if these new 100M-series chips are the long-awaited mobile version of the GT200? If so, are they compute capability 1.2 or 1.3?[/quote]

There's vague info [url="http://www.nvidia.com/object/io_1231412564434.html"]here [/url] and more specific info [url="http://www.nvidia.com/object/product_geforce_gt_130m_us.html"]here [/url] but when you compare to the chips they're replacing, it looks like these are all just 55nm speed-boosted G92 layouts. This isn't a bad thing, though.
BTW, you can't even trust NVidia's tech documents for this, there's 112 SP's listed for the 9800MGTX [url="http://www.nvidia.com/object/product_geforce_9800m_gtx_us.html"]here[/url] but 64 listed [url="http://www.nvidia.com/object/geforce_m_series.html"]here [/url]. 112 is correct.

I actually do a lot of CUDA development on my laptop, so I'd love to get more power, too!
I suspect Apple may try to push NVidia about laptop GPUs more with their thin/portable/small ethic but new focus on OpenCL in OSX 10.6.
[quote name='seibert' post='488347' date='Jan 8 2009, 06:09 PM']On a somewhat related note: Does anyone know if these new 100M-series chips are the long-awaited mobile version of the GT200? If so, are they compute capability 1.2 or 1.3?



There's vague info here and more specific info here but when you compare to the chips they're replacing, it looks like these are all just 55nm speed-boosted G92 layouts. This isn't a bad thing, though.

BTW, you can't even trust NVidia's tech documents for this, there's 112 SP's listed for the 9800MGTX here but 64 listed here . 112 is correct.



I actually do a lot of CUDA development on my laptop, so I'd love to get more power, too!

I suspect Apple may try to push NVidia about laptop GPUs more with their thin/portable/small ethic but new focus on OpenCL in OSX 10.6.

#5
Posted 01/09/2009 02:23 AM   
[quote name='SPWorley' post='488164' date='Jan 8 2009, 02:21 PM']Today's the day the NDAs on the new NVidia boards are lifted. The 285 is a slightly boosted-clock, 55nm, version of the 280, you get about 10% speed boost and a little lower wattage.

The 295 is much more interesting, it's like two 260s on one board, similar to the older 9800GX2.
One review, though of course more about game performance, is at [url="http://enthusiast.hardocp.com/article.html?art=MTYwOCwxLCxoZW50aHVzaWFzdA=="]HardOCP.[/url]

The 295 is much more interesting to CUDA programmers, because we'll take all the FLOPS we can get!
One nice comment in the above review is that the 295's SP shader speed could easily be overclocked to full 280 speed.


The first one to build a [url="http://fastra.ua.ac.be/en/index.html"]FASTRA[/url]-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.[/quote]

to build a machine is no problem - the problem may be whether the actual applications will be forgiving towards the shared cpu-gpu bandwidth on today's PCIe buses.
the sdk examples have some codes (nbody) which will likely be ok and some (fluidsGL) which are strongly bandwidth-dependent.. they won't like two gpus on 295 sharing one x16 slot, not to mention when you try to install 3 295 cards and 2 gpus on the third (and second, on some boards) card will have to fight for the shared x8 bandwidth.
whaddya think?
[quote name='SPWorley' post='488164' date='Jan 8 2009, 02:21 PM']Today's the day the NDAs on the new NVidia boards are lifted. The 285 is a slightly boosted-clock, 55nm, version of the 280, you get about 10% speed boost and a little lower wattage.



The 295 is much more interesting, it's like two 260s on one board, similar to the older 9800GX2.

One review, though of course more about game performance, is at HardOCP.



The 295 is much more interesting to CUDA programmers, because we'll take all the FLOPS we can get!

One nice comment in the above review is that the 295's SP shader speed could easily be overclocked to full 280 speed.





The first one to build a FASTRA-like box with 3 or 4 of these puppies in it will win the CUDA forum's official awe and respect. The price for such a box would still be probably only $3500.



to build a machine is no problem - the problem may be whether the actual applications will be forgiving towards the shared cpu-gpu bandwidth on today's PCIe buses.

the sdk examples have some codes (nbody) which will likely be ok and some (fluidsGL) which are strongly bandwidth-dependent.. they won't like two gpus on 295 sharing one x16 slot, not to mention when you try to install 3 295 cards and 2 gpus on the third (and second, on some boards) card will have to fight for the shared x8 bandwidth.

whaddya think?

#6
Posted 01/09/2009 05:28 PM   
[quote name='pawel_astro' post='488668' date='Jan 9 2009, 12:28 PM']to build a machine is no problem - the problem may be whether the actual applications will be forgiving towards the shared cpu-gpu bandwidth on today's PCIe buses.
the sdk examples have some codes (nbody) which will likely be ok and some (fluidsGL) which are strongly bandwidth-dependent.. they won't like two gpus on 295 sharing one x16 slot, not to mention when you try to install 3 295 cards and 2 gpus on the third (and second, on some boards) card will have to fight for the shared x8 bandwidth.
whaddya think?[/quote]

Well, there's at least one application: The FASTRA computer runs 3D tomography calculations, and according to their technical FAQ, the problem factorizes so well, they see linear speedup even with 8 CUDA devices. (4 x 9800 GX2)

Assuming a computer with 4x GTX 295 cards didn't melt or catch fire, it should be at least 2x faster than the FASTRA running their tomography applications.
[quote name='pawel_astro' post='488668' date='Jan 9 2009, 12:28 PM']to build a machine is no problem - the problem may be whether the actual applications will be forgiving towards the shared cpu-gpu bandwidth on today's PCIe buses.

the sdk examples have some codes (nbody) which will likely be ok and some (fluidsGL) which are strongly bandwidth-dependent.. they won't like two gpus on 295 sharing one x16 slot, not to mention when you try to install 3 295 cards and 2 gpus on the third (and second, on some boards) card will have to fight for the shared x8 bandwidth.

whaddya think?



Well, there's at least one application: The FASTRA computer runs 3D tomography calculations, and according to their technical FAQ, the problem factorizes so well, they see linear speedup even with 8 CUDA devices. (4 x 9800 GX2)



Assuming a computer with 4x GTX 295 cards didn't melt or catch fire, it should be at least 2x faster than the FASTRA running their tomography applications.

#7
Posted 01/10/2009 03:01 AM   
NAMD also scales linearly in my testing; as far as I know it's totally compute bound and bandwidth (either PCIe or within a device) doesn't seem to have much impact on perf.
NAMD also scales linearly in my testing; as far as I know it's totally compute bound and bandwidth (either PCIe or within a device) doesn't seem to have much impact on perf.

#8
Posted 01/10/2009 07:17 AM   
I was thinking of assembling a 3 x 295 machine and you're now challenging me to build a 4 x 295... that's a bit unfair :-)
I was thinking of assembling a 3 x 295 machine and you're now challenging me to build a 4 x 295... that's a bit unfair :-)

#9
Posted 01/10/2009 07:38 AM   
[quote name='pawel_astro' post='488927' date='Jan 10 2009, 08:38 AM']I was thinking of assembling a 3 x 295 machine and you're now challenging me to build a 4 x 295... that's a bit unfair :-)[/quote]
But a challenge nonetheless ;)
[quote name='pawel_astro' post='488927' date='Jan 10 2009, 08:38 AM']I was thinking of assembling a 3 x 295 machine and you're now challenging me to build a 4 x 295... that's a bit unfair :-)

But a challenge nonetheless ;)

greets,
Denis

#10
Posted 01/10/2009 02:04 PM   
On a topic more relevant to CUDA.

This may be something very well known, I just haven't seen it stated anywhere. Anyway, I'm trying to figure out if the gtx295 cards require programming in CUDA as if there were 2 separate GPU boards, or if code I've written for a single 280 will immediately scale to the 'internal' multi-gpu of the gtx295. The latter would be significantly more pleasant. /thumbup.gif' class='bbc_emoticon' alt=':thumbup:' />
On a topic more relevant to CUDA.



This may be something very well known, I just haven't seen it stated anywhere. Anyway, I'm trying to figure out if the gtx295 cards require programming in CUDA as if there were 2 separate GPU boards, or if code I've written for a single 280 will immediately scale to the 'internal' multi-gpu of the gtx295. The latter would be significantly more pleasant. /thumbup.gif' class='bbc_emoticon' alt=':thumbup:' />

#11
Posted 01/10/2009 05:06 PM   
[quote name='ldura9t' post='489077' date='Jan 10 2009, 12:06 PM']On a topic more relevant to CUDA.

This may be something very well known, I just haven't seen it stated anywhere. Anyway, I'm trying to figure out if the gtx295 cards require programming in CUDA as if there were 2 separate GPU boards, or if code I've written for a single 280 will immediately scale to the 'internal' multi-gpu of the gtx295. The latter would be significantly more pleasant. /thumbup.gif' class='bbc_emoticon' alt=':thumbup:' />[/quote]

The GTX295 will almost certainly appear as two separate CUDA devices, as this is how the 9800 GX2 worked. Making the two GPUs appear as one to CUDA would require a significant (but really awesome) change in the software and hardware.

(OpenGL is high enough level that the driver can do the SLI magic invisible to the programmer. It would be very difficult to seamlessly extend a kernel to multiple GPUs in CUDA.)
[quote name='ldura9t' post='489077' date='Jan 10 2009, 12:06 PM']On a topic more relevant to CUDA.



This may be something very well known, I just haven't seen it stated anywhere. Anyway, I'm trying to figure out if the gtx295 cards require programming in CUDA as if there were 2 separate GPU boards, or if code I've written for a single 280 will immediately scale to the 'internal' multi-gpu of the gtx295. The latter would be significantly more pleasant. /thumbup.gif' class='bbc_emoticon' alt=':thumbup:' />



The GTX295 will almost certainly appear as two separate CUDA devices, as this is how the 9800 GX2 worked. Making the two GPUs appear as one to CUDA would require a significant (but really awesome) change in the software and hardware.



(OpenGL is high enough level that the driver can do the SLI magic invisible to the programmer. It would be very difficult to seamlessly extend a kernel to multiple GPUs in CUDA.)

#12
Posted 01/10/2009 05:19 PM   
...but unfortunately it's not that way.
cuda sees them as two seperate devices.
...but unfortunately it's not that way.

cuda sees them as two seperate devices.

#13
Posted 01/10/2009 05:30 PM   
[quote name='Ocire' post='489087' date='Jan 10 2009, 06:30 PM']...but unfortunately it's not that way.
cuda sees them as two seperate devices.[/quote]
I would say thankfully. If this would be done automatically, it would almost certainly be dog-slow ;)
[quote name='Ocire' post='489087' date='Jan 10 2009, 06:30 PM']...but unfortunately it's not that way.

cuda sees them as two seperate devices.

I would say thankfully. If this would be done automatically, it would almost certainly be dog-slow ;)

greets,
Denis

#14
Posted 01/11/2009 09:08 AM   
disclaimer:
"unfortunately" refers to the phrase "significantly more pleasant" used in the previous post and does not reflect the author's opinion. ;-)
yeah, i'm pleased it's the way it is... even though it would be nice to have some (fast) way to communicate between those cards.
disclaimer:

"unfortunately" refers to the phrase "significantly more pleasant" used in the previous post and does not reflect the author's opinion. ;-)

yeah, i'm pleased it's the way it is... even though it would be nice to have some (fast) way to communicate between those cards.

#15
Posted 01/11/2009 12:00 PM   
  1 / 4    
Scroll To Top