Sounds like GK208 laptops/cards will support most sm_35 features
NVIDIA blog post [url=http://blogs.nvidia.com/2013/06/hey-developers-gpu-at-heart-of-worlds-fastest-supercomputer-hits-laptops/]here[/url]. That's great news for developers as 255 registers, Dynamic Parallelism and HyperQ are major features. I assume 64-bit floating point will remain at 1/24th the throughput of single precision. Perhaps this might fit into the "sm_32" compute capability category that was lurking in the CUDA 5.5 include files directory?
NVIDIA blog post here.

That's great news for developers as 255 registers, Dynamic Parallelism and HyperQ are major features.

I assume 64-bit floating point will remain at 1/24th the throughput of single precision.

Perhaps this might fit into the "sm_32" compute capability category that was lurking in the CUDA 5.5 include files directory?

#1
Posted 06/06/2013 11:26 PM   
Unfortunately some laptops with GT 730M chips have been on the market for months, and it seems impossible to tell whether a model has the new GK208 chip or some earlier model. nVidia deliberately does not list detailed tech specs on its 730M pages, apparently that's exactly because of this product relabeling. Not only is it confusing very, it's not customer friendly: you won't know what you get.
Unfortunately some laptops with GT 730M chips have been on the market for months, and it seems impossible to tell whether a model has the new GK208 chip or some earlier model.

nVidia deliberately does not list detailed tech specs on its 730M pages, apparently that's exactly because of this product relabeling.

Not only is it confusing very, it's not customer friendly: you won't know what you get.

#2
Posted 06/07/2013 09:10 AM   
Can someone from NVIDIA clarify what the compute capability of GK208 is? The Kayla dev kit is suggesting the purchase of a GeForce GT 640, which is apparently using a GK208 chip now. The CUDA on ARM presentation makes it sound like dynamic parallelism will be supported on Kayla, so does this mean that GK208 is really sm_35/sm_32 in both desktop and mobile parts? Incidentally, if this is true, then it means that the model designator "GeForce GT 640" will have been sold with GPUs that have three different compute capabilities: 2.1, 3.0 and 3.2/3.5. That is absolutely crazy, to say the least. There are more three-digit numbers that start with a 6, so please use them. :)
Can someone from NVIDIA clarify what the compute capability of GK208 is? The Kayla dev kit is suggesting the purchase of a GeForce GT 640, which is apparently using a GK208 chip now. The CUDA on ARM presentation makes it sound like dynamic parallelism will be supported on Kayla, so does this mean that GK208 is really sm_35/sm_32 in both desktop and mobile parts?

Incidentally, if this is true, then it means that the model designator "GeForce GT 640" will have been sold with GPUs that have three different compute capabilities: 2.1, 3.0 and 3.2/3.5. That is absolutely crazy, to say the least.

There are more three-digit numbers that start with a 6, so please use them. :)

#3
Posted 06/11/2013 02:32 PM   
GK208 is sm35.
GK208 is sm35.

#4
Posted 06/11/2013 03:55 PM   
[quote="mfatica"]GK208 is sm35.[/quote] [url=http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#arithmetic-instructions__throughput-native-arithmetic-instructions]Table 2 in the CUDA 5.5 Programming Guide[/url] lists the arithmetic throughputs of different basic operations for different capabilities. sm_35 is listed as having 1/3 rate FP64 throughput, but GK208 has only 1/24 throughput. allanmac spotted a set of sm_32 intrinsic header files in the CUDA 5.5 toolkit. I hypothesized that GK208 would be sm_32, with sm_32 having all the sm_35 features except FP64 rate, similar to how sm_12 is the same as sm_13 except FP64 support. I assume the programming guide will be updated for CUDA 5.5? And that leaves the question: what is sm_32?
mfatica said:GK208 is sm35.


Table 2 in the CUDA 5.5 Programming Guide lists the arithmetic throughputs of different basic operations for different capabilities. sm_35 is listed as having 1/3 rate FP64 throughput, but GK208 has only 1/24 throughput.

allanmac spotted a set of sm_32 intrinsic header files in the CUDA 5.5 toolkit. I hypothesized that GK208 would be sm_32, with sm_32 having all the sm_35 features except FP64 rate, similar to how sm_12 is the same as sm_13 except FP64 support.


I assume the programming guide will be updated for CUDA 5.5?
And that leaves the question: what is sm_32?

#5
Posted 06/11/2013 04:58 PM   
So it also looks like you would be able to buy certain versions of GT 630 & GT 640 ( Rev. 2) aswell as GT GT 635 that all have the GK208 chip. At least according to this: http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units Hence you might want to have a go for those discrete cards instead of buying a new laptop..
So it also looks like you would be able to buy certain versions of GT 630 & GT 640 ( Rev. 2) aswell as GT GT 635 that all have the GK208 chip.

At least according to this: http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units


Hence you might want to have a go for those discrete cards instead of buying a new laptop..

#6
Posted 06/11/2013 05:47 PM   
@Jimmy, also notice that [url=http://en.wikipedia.org/wiki/GeForce_600_Series#GeForce_600_.286xx.29_series]Wikipedia[/url] reports that some of the GK208 discrete cards have PCIe 2.0 x8 while the single SMX GT 635 has PCIe 3.0 x16. Assuming Wikipedia is correct, I would be semi-disappointed in a PCIe 2.0 x8 board even though it probably doesn't matter at all. Another tell-tale for GK208 might be the default graphics MHz: 902 for DDR3 and 1046 for GDDR5 GT 640 Rev 2.'s. Of course all this is just speculation. Someone should buy one and report back to us if our hypotheses are correct. :)
@Jimmy, also notice that Wikipedia reports that some of the GK208 discrete cards have PCIe 2.0 x8 while the single SMX GT 635 has PCIe 3.0 x16. Assuming Wikipedia is correct, I would be semi-disappointed in a PCIe 2.0 x8 board even though it probably doesn't matter at all.

Another tell-tale for GK208 might be the default graphics MHz: 902 for DDR3 and 1046 for GDDR5 GT 640 Rev 2.'s.

Of course all this is just speculation. Someone should buy one and report back to us if our hypotheses are correct. :)

#7
Posted 06/11/2013 07:12 PM   
I think I might just do that. :) I've been wanting to try out dynamic parallelism for a while, but we don't have the budget in our lab for new GPUs at the moment. For $90, I can buy it myself. The device linked from the Kayla page is this one: http://www.newegg.com/Product/Product.aspx?Item=N82E16814121771 which seems to be the only GeForce GT 640 with GDDR5 memory on Newegg. The clock rate matches the GK208 listing on Wikipedia, so I think it is the right one.
I think I might just do that. :)

I've been wanting to try out dynamic parallelism for a while, but we don't have the budget in our lab for new GPUs at the moment. For $90, I can buy it myself. The device linked from the Kayla page is this one:


http://www.newegg.com/Product/Product.aspx?Item=N82E16814121771


which seems to be the only GeForce GT 640 with GDDR5 memory on Newegg. The clock rate matches the GK208 listing on Wikipedia, so I think it is the right one.

#8
Posted 06/11/2013 07:50 PM   
I was also perplexed by the specced PCIe 2.0. I'm wondering if they are using 2.0 to achieve insanely good performance/watt. Looking at [1] it has 697 GFLOPS @ 25 watt => 27.88 GFLOPS/watt which is probably some of the best I've seen for an AMD/Nvidia GPU. Seems to good to be true though... [1] http://www.techpowerup.com/gpudb/2396/geforce-gt-630-rev-2-pcie-x8.html
I was also perplexed by the specced PCIe 2.0. I'm wondering if they are using 2.0 to achieve insanely good performance/watt. Looking at [1] it has 697 GFLOPS @ 25 watt => 27.88 GFLOPS/watt which is probably some of the best I've seen for an AMD/Nvidia GPU. Seems to good to be true though...


[1] http://www.techpowerup.com/gpudb/2396/geforce-gt-630-rev-2-pcie-x8.html

#9
Posted 06/11/2013 08:09 PM   
Here's an existence proof. From the PCI ID's [url=http://pciids.sourceforge.net/v2.2/pci.ids]database[/url]: [code] 10de NVIDIA Corporation ... 1280 GK208 [GeForce GT 635] 1282 GK208 [GeForce GT 640 Rev. 2] 1284 GK208 [GeForce GT 630 Rev. 2] 1290 GK208M [GeForce GT 730M] 103c 2afa GeForce GT 730A 103c 2b04 GeForce GT 730A 1043 13ad GeForce GT 730M 1043 13cd GeForce GT 730M 1291 GK208M [GeForce GT 735M] 1292 GK208M [GeForce GT 740M] 1293 GK208M [GeForce GT 730M] 1294 GK208M [GeForce GT 740M] 12a0 GK208 [/code] Unfortunately it seems no one has run GPU-Z on a discrete GK208 in the wild as there is no record in the [url=http://www.techpowerup.com/gpuz/search.php]GPU-Z database[/url].
Here's an existence proof. From the PCI ID's database:

10de  NVIDIA Corporation
...
1280 GK208 [GeForce GT 635]
1282 GK208 [GeForce GT 640 Rev. 2]
1284 GK208 [GeForce GT 630 Rev. 2]
1290 GK208M [GeForce GT 730M]
103c 2afa GeForce GT 730A
103c 2b04 GeForce GT 730A
1043 13ad GeForce GT 730M
1043 13cd GeForce GT 730M
1291 GK208M [GeForce GT 735M]
1292 GK208M [GeForce GT 740M]
1293 GK208M [GeForce GT 730M]
1294 GK208M [GeForce GT 740M]
12a0 GK208


Unfortunately it seems no one has run GPU-Z on a discrete GK208 in the wild as there is no record in the GPU-Z database.

#10
Posted 06/11/2013 10:48 PM   
Leave it to NVIDIA to make one card name spawn 3 different compute capabilities... *sigh* incredibly confusing. That being said, it's nice that dynamic parallelism is coming to new and cheap cards! I just ordered a Lenovo Y410p, although it seems that the GT750m is a GK107 chip, at least according to notebookcheck.net. I'm going to give it a trial run regardless... Lenovo has a 30 day no questions asked return policy ;) To add to the discussion, here are 2 more desktop GT630 cards by Zotac that should be GK208 based, given the core count of 384. Both are available on NewEgg. [url]http://www.newegg.com/Product/Product.aspx?Item=N82E16814500305[/url] - 1 GB version [url]http://www.newegg.com/Product/Product.aspx?Item=N82E16814500304[/url] - 2 GB version
Leave it to NVIDIA to make one card name spawn 3 different compute capabilities... *sigh* incredibly confusing. That being said, it's nice that dynamic parallelism is coming to new and cheap cards!

I just ordered a Lenovo Y410p, although it seems that the GT750m is a GK107 chip, at least according to notebookcheck.net. I'm going to give it a trial run regardless... Lenovo has a 30 day no questions asked return policy ;)

To add to the discussion, here are 2 more desktop GT630 cards by Zotac that should be GK208 based, given the core count of 384. Both are available on NewEgg.

http://www.newegg.com/Product/Product.aspx?Item=N82E16814500305 - 1 GB version
http://www.newegg.com/Product/Product.aspx?Item=N82E16814500304 - 2 GB version

#11
Posted 06/13/2013 01:27 AM   
That Zotac GT 630 is also rated at max 25 watt... Extremely good performance / watt! I understand why NV has used the GK208 for laptops!
That Zotac GT 630 is also rated at max 25 watt... Extremely good performance / watt! I understand why NV has used the GK208 for laptops!

#12
Posted 06/13/2013 09:40 AM   
That being said, does anyone know of any 14" laptops with GK208? Answering my own question for now (albeit for 15"): [url]http://forum.notebookreview.com/hp-envy-hdx/717180-now-available-envy-15-jxxx-envy-17-jxxx-notebooks-2013-a-3.html#post9203097[/url] Seems like HP ENVY 15t-j000 has a GK208 according to the post above. Link for sale: [url]http://slickdeals.net/f/6087052-HP-ENVY-15t-j000-Quad-Laptop-i7-4700MQ-Haswell-8GB-Ram-1TB-HD-2GB-Nvidia-GT740M-15-6-1080P-etc-769-ship-tax[/url] The one above is either 1366x768 or 1980x1080 (in my opinion too high of a resolution for a 15.6"). I can't advocate HP laptops because they tend to whitelist their WiFI cards and I already have an Intel 7260 Dual Band AC card that I intend to use. From browsing a bit from news on Computex, Acer is releasing the S3-392, perhaps sometime in July? that sports a 1080p touchscreen in a 13.3" form factor, with a GT735m (GK208) chip: [url]http://blog.laptopmag.com/acer-aspire-s7-s3-ultrabooks-haswell[/url] There is also the VAIO Fit 14 (1600x900, Ivy Bridge) which can be configured with a GT735m chipset: [url]http://www.cnet.com/laptops/sony-vaio-fit-14/4505-3121_7-35757138-2.html[/url] For what it's worth, I dropped by a Sony Store the other day and inquired about what SSD choices would be included for the models that are configurable with a SSD -- apparently they are a proprietary interface and according to the tech the motherboard does not have a regular 2.5" slot. I stand to believe it given the model name of the hard drive as reported by device manager was a Samsung based SSD that did not show up on Google. The model with the "(5400rpm) + 8GB SSD hybrid hard drive" is a Toshiba MQ01ABD075H -- [url]http://storage.toshiba.eu/cms/en/hdd/hard_disk_drives/product_detail.jsp?productid=525[/url] (9.5mm height), so that gives plenty of options for upgrades (7.5mm w/ spacer and 9.5mm SSDs). Also from Sony, the VAIO Fit 14E (1600x900, Ivy Bridge) can be configured with a GT740m (1 or 2GB VRAM), however it's unclear if that model is using the GK208 chipset -- presumably it is, given I cannot find information about the model number anywhere, and Sony chat support mentions it is a model that has 'not released' yet, implying it is new and most likely will be using GK208 instead of GK107, but buyer beware! To me it doesn't make sense to me to upgrade *just* the video card, ideally I'd like a GK208 in a 14" form factor, either 1366x768 or 1600x900 with a Haswell (4th Gen) Core i5 or i7 processor, but nothing like that exists so far. Edit 1: I've confirmed another laptop model with GK208. This one is an Asus VivoBook S551LB -- [url]http://www.ultrabookreview.com/3084-asus-vivobook-s551-review/[/url] The author of the extensive review mentions it is sold in the US as the Asus Vivobook V551LB. It is available on Amazon and a few other retailers -- BestBuy.com or Rakuten.com are the ideal choice, as they offer 15-day and 45-day return policy by default with no restocking fees. After trying it out from Best Buy, besides losing VT-d, the trackpad is an Elantech one and it jumps pretty bad, regardless of what drivers I used. Too bad, because otherwise the laptop was pretty decent, but it's going back because of that issue alone. The screen is actually decent enough, despite the poor viewing angles, but I didn't see a problem with glare given how the reviewer mentioned it was quite reflective. Edit 2: Another laptop that has GK208 is the TOSHIBA Satellite S55-A5279 -- [url]http://www.rakuten.com/pr/product.aspx?sku=250975766[/url] and probably also the S55-A5276 -- [url]http://www.newegg.com/Product/Product.aspx?Item=N82E16834216541[/url]. I ordered the S55-A5279 from Rakuten, because of the generous 45-day return policy and so far the only gripe I have is the short battery life given the battery specs are 14.4V, 2838 mAh. It's Identified as a PEGA G71C000FP110 by BatteryInfoView software. Other than that, it seems to meet my expectations. The fan does get a bit loud when you push the CPU, but that's normal for pretty much any laptop. Compared to the VivoBook, this Toshiba S55 is thicker, and all plastic construction vs an aluminum top on the Vivobook. That being said, under normal browsing it keeps very cool -- CPUID HWMonitor sees about 4-5W of power use on the processor as I type this on the S55. The Toshiba's 4-core (8-thread) i7-4700MQ processor is not VT-d capable, however it *might* be upgradeable down the line, as Intel does ship i7-4800MQ and i7-4900MQ processors as boxed units. The Toshiba also has a VGA (RGB) port in addition to HDMI, which means it should be able to drive 2 monitors natively, but I will have to check this soon. On that note, I also want to see if I'm able to drive a 2560x1440 or 2560x1600 resolution via the HDMI port. A plus I saw on the Asus vs the Toshiba were a much better battery life -- The Asus has 3 cell, 11.1V, 4500 mAh, 50 Wh battery, vs the S55's 4 cell, 14.4V, 2838 mAh, 43Wh battery. For that matter, the max TDP on the Toshiba's 4700MQ is 47W vs 15W for the Asus' 4500U. The Asus has a 65W power brick -- 19 VDC @ 3.42 A, vs 120W on the Toshiba -- 19 VDC @ 6.32A, so the Toshiba definitely can draw a lot more power due to the beefy processor. Needless to say, the Asus beats Toshiba in battery life.
That being said, does anyone know of any 14" laptops with GK208?

Answering my own question for now (albeit for 15"):
http://forum.notebookreview.com/hp-envy-hdx/717180-now-available-envy-15-jxxx-envy-17-jxxx-notebooks-2013-a-3.html#post9203097

Seems like HP ENVY 15t-j000 has a GK208 according to the post above. Link for sale:
http://slickdeals.net/f/6087052-HP-ENVY-15t-j000-Quad-Laptop-i7-4700MQ-Haswell-8GB-Ram-1TB-HD-2GB-Nvidia-GT740M-15-6-1080P-etc-769-ship-tax

The one above is either 1366x768 or 1980x1080 (in my opinion too high of a resolution for a 15.6").
I can't advocate HP laptops because they tend to whitelist their WiFI cards and I already have an Intel 7260 Dual Band AC card that I intend to use.

From browsing a bit from news on Computex, Acer is releasing the S3-392, perhaps sometime in July? that sports a 1080p touchscreen in a 13.3" form factor, with a GT735m (GK208) chip:
http://blog.laptopmag.com/acer-aspire-s7-s3-ultrabooks-haswell

There is also the VAIO Fit 14 (1600x900, Ivy Bridge) which can be configured with a GT735m chipset:
http://www.cnet.com/laptops/sony-vaio-fit-14/4505-3121_7-35757138-2.html
For what it's worth, I dropped by a Sony Store the other day and inquired about what SSD choices would be included for the models that are configurable with a SSD -- apparently they are a proprietary interface and according to the tech the motherboard does not have a regular 2.5" slot. I stand to believe it given the model name of the hard drive as reported by device manager was a Samsung based SSD that did not show up on Google. The model with the "(5400rpm) + 8GB SSD hybrid hard drive" is a Toshiba MQ01ABD075H -- http://storage.toshiba.eu/cms/en/hdd/hard_disk_drives/product_detail.jsp?productid=525 (9.5mm height), so that gives plenty of options for upgrades (7.5mm w/ spacer and 9.5mm SSDs).

Also from Sony, the VAIO Fit 14E (1600x900, Ivy Bridge) can be configured with a GT740m (1 or 2GB VRAM), however it's unclear if that model is using the GK208 chipset -- presumably it is, given I cannot find information about the model number anywhere, and Sony chat support mentions it is a model that has 'not released' yet, implying it is new and most likely will be using GK208 instead of GK107, but buyer beware!

To me it doesn't make sense to me to upgrade *just* the video card, ideally I'd like a GK208 in a 14" form factor, either 1366x768 or 1600x900 with a Haswell (4th Gen) Core i5 or i7 processor, but nothing like that exists so far.

Edit 1: I've confirmed another laptop model with GK208. This one is an Asus VivoBook S551LB -- http://www.ultrabookreview.com/3084-asus-vivobook-s551-review/ The author of the extensive review mentions it is sold in the US as the Asus Vivobook V551LB. It is available on Amazon and a few other retailers -- BestBuy.com or Rakuten.com are the ideal choice, as they offer 15-day and 45-day return policy by default with no restocking fees. After trying it out from Best Buy, besides losing VT-d, the trackpad is an Elantech one and it jumps pretty bad, regardless of what drivers I used. Too bad, because otherwise the laptop was pretty decent, but it's going back because of that issue alone. The screen is actually decent enough, despite the poor viewing angles, but I didn't see a problem with glare given how the reviewer mentioned it was quite reflective.

Edit 2: Another laptop that has GK208 is the TOSHIBA Satellite S55-A5279 -- http://www.rakuten.com/pr/product.aspx?sku=250975766 and probably also the S55-A5276 -- http://www.newegg.com/Product/Product.aspx?Item=N82E16834216541. I ordered the S55-A5279 from Rakuten, because of the generous 45-day return policy and so far the only gripe I have is the short battery life given the battery specs are 14.4V, 2838 mAh. It's Identified as a PEGA G71C000FP110 by BatteryInfoView software. Other than that, it seems to meet my expectations. The fan does get a bit loud when you push the CPU, but that's normal for pretty much any laptop.

Compared to the VivoBook, this Toshiba S55 is thicker, and all plastic construction vs an aluminum top on the Vivobook. That being said, under normal browsing it keeps very cool -- CPUID HWMonitor sees about 4-5W of power use on the processor as I type this on the S55. The Toshiba's 4-core (8-thread) i7-4700MQ processor is not VT-d capable, however it *might* be upgradeable down the line, as Intel does ship i7-4800MQ and i7-4900MQ processors as boxed units.

The Toshiba also has a VGA (RGB) port in addition to HDMI, which means it should be able to drive 2 monitors natively, but I will have to check this soon. On that note, I also want to see if I'm able to drive a 2560x1440 or 2560x1600 resolution via the HDMI port.

A plus I saw on the Asus vs the Toshiba were a much better battery life -- The Asus has 3 cell, 11.1V, 4500 mAh, 50 Wh battery, vs the S55's 4 cell, 14.4V, 2838 mAh, 43Wh battery. For that matter, the max TDP on the Toshiba's 4700MQ is 47W vs 15W for the Asus' 4500U. The Asus has a 65W power brick -- 19 VDC @ 3.42 A, vs 120W on the Toshiba -- 19 VDC @ 6.32A, so the Toshiba definitely can draw a lot more power due to the beefy processor. Needless to say, the Asus beats Toshiba in battery life.

#13
Posted 06/13/2013 10:31 AM   
My GK208-based GT 640 arrived today (apparently I'm super close to a Newegg warehouse)! After upgrading my Ubuntu 12.04 x86_64 system to the CUDA 5.5 RC, I get the following results: deviceQuery: [code] Device 1: "GeForce GT 640" CUDA Driver Version / Runtime Version 5.5 / 5.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 1023 MBytes (1073020928 bytes) ( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1046 MHz (1.05 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 64-bit L2 Cache Size: 524288 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 4 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > [/code] bandwidthTest (my motherboard is PCI-E 2.0) [code] Device 1: GeForce GT 640 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3184.9 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3198.7 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 32036.5 [/code] ... and the various dynamic parallelism demos (cdp* in the bin release directory) work too. I am very surprised by the very low host/device memory bandwidth. That looks suspiciously like PCI-E 1.0 or PCI-E 2.0 with an x8 connection. I need to see if there is a good way to tell what PCI-E link settings were negotiated at bootup... Update: Although lspci -vv is reporting some strange information (like wrong link rates for cards I can verify are going at full PCI-E 2.0 speeds), it does seem to indicate that this card negotiated an x8 link with the host. This workstation should be able to do x16 on the slot I used, so I'll need to investigate what's going on. Update to the update: I just noticed that earlier in the thread allanmac mentioned rumors of these cards being PCI-E 2.0 x8. Although the ASUS card I bought doesn't explicitly say either way in the documentation I found (boo!), a very similar Gigabyte GT 640 does list the card as being PCI-E 2.0 x8. So, I think this is a real "feature" of these GK208 desktop cards.
My GK208-based GT 640 arrived today (apparently I'm super close to a Newegg warehouse)! After upgrading my Ubuntu 12.04 x86_64 system to the CUDA 5.5 RC, I get the following results:

deviceQuery:
Device 1: "GeForce GT 640"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 1023 MBytes (1073020928 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1046 MHz (1.05 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >


bandwidthTest (my motherboard is PCI-E 2.0)
Device 1: GeForce GT 640
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3184.9

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3198.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 32036.5


... and the various dynamic parallelism demos (cdp* in the bin release directory) work too.

I am very surprised by the very low host/device memory bandwidth. That looks suspiciously like PCI-E 1.0 or PCI-E 2.0 with an x8 connection. I need to see if there is a good way to tell what PCI-E link settings were negotiated at bootup...

Update: Although lspci -vv is reporting some strange information (like wrong link rates for cards I can verify are going at full PCI-E 2.0 speeds), it does seem to indicate that this card negotiated an x8 link with the host. This workstation should be able to do x16 on the slot I used, so I'll need to investigate what's going on.

Update to the update: I just noticed that earlier in the thread allanmac mentioned rumors of these cards being PCI-E 2.0 x8. Although the ASUS card I bought doesn't explicitly say either way in the documentation I found (boo!), a very similar Gigabyte GT 640 does list the card as being PCI-E 2.0 x8. So, I think this is a real "feature" of these GK208 desktop cards.

#14
Posted 06/13/2013 05:14 PM   
[quote="seibert"]My GK208-based GT 640 arrived today (apparently I'm super close to a Newegg warehouse)! After upgrading my Ubuntu 12.04 x86_64 system to the CUDA 5.5 RC, I get the following results: deviceQuery: [code] Device 1: "GeForce GT 640" CUDA Driver Version / Runtime Version 5.5 / 5.5 CUDA Capability Major/Minor version number: 3.5 Total amount of global memory: 1023 MBytes (1073020928 bytes) ( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores GPU Clock rate: 1046 MHz (1.05 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 64-bit L2 Cache Size: 524288 bytes Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096) Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Maximum sizes of each dimension of a block: 1024 x 1024 x 64 Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535 Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 4 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > [/code] bandwidthTest (my motherboard is PCI-E 2.0) [code] Device 1: GeForce GT 640 Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3184.9 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 3198.7 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 32036.5 [/code] ... and the various dynamic parallelism demos (cdp* in the bin release directory) work too. I am very surprised by the very low host/device memory bandwidth. That looks suspiciously like PCI-E 1.0 or PCI-E 2.0 with an x8 connection. I need to see if there is a good way to tell what PCI-E link settings were negotiated at bootup... Update: Although lspci -vv is reporting some strange information (like wrong link rates for cards I can verify are going at full PCI-E 2.0 speeds), it does seem to indicate that this card negotiated an x8 link with the host. This workstation should be able to do x16 on the slot I used, so I'll need to investigate what's going on. Update to the update: I just noticed that earlier in the thread allanmac mentioned rumors of these cards being PCI-E 2.0 x8. Although the ASUS card I bought doesn't explicitly say either way in the documentation I found (boo!), a very similar Gigabyte GT 640 does list the card as being PCI-E 2.0 x8. So, I think this is a real "feature" of these GK208 desktop cards.[/quote] Thanks Seibert! Very interesting! Really kind of seems to me that they've been cutting the D2H & H2D bandwidth in a potential effort to save on power consumption?
seibert said:My GK208-based GT 640 arrived today (apparently I'm super close to a Newegg warehouse)! After upgrading my Ubuntu 12.04 x86_64 system to the CUDA 5.5 RC, I get the following results:

deviceQuery:
Device 1: "GeForce GT 640"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 1023 MBytes (1073020928 bytes)
( 2) Multiprocessors x (192) CUDA Cores/MP: 384 CUDA Cores
GPU Clock rate: 1046 MHz (1.05 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >


bandwidthTest (my motherboard is PCI-E 2.0)
Device 1: GeForce GT 640
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3184.9

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3198.7

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 32036.5


... and the various dynamic parallelism demos (cdp* in the bin release directory) work too.

I am very surprised by the very low host/device memory bandwidth. That looks suspiciously like PCI-E 1.0 or PCI-E 2.0 with an x8 connection. I need to see if there is a good way to tell what PCI-E link settings were negotiated at bootup...

Update: Although lspci -vv is reporting some strange information (like wrong link rates for cards I can verify are going at full PCI-E 2.0 speeds), it does seem to indicate that this card negotiated an x8 link with the host. This workstation should be able to do x16 on the slot I used, so I'll need to investigate what's going on.

Update to the update: I just noticed that earlier in the thread allanmac mentioned rumors of these cards being PCI-E 2.0 x8. Although the ASUS card I bought doesn't explicitly say either way in the documentation I found (boo!), a very similar Gigabyte GT 640 does list the card as being PCI-E 2.0 x8. So, I think this is a real "feature" of these GK208 desktop cards.


Thanks Seibert! Very interesting! Really kind of seems to me that they've been cutting the D2H & H2D bandwidth in a potential effort to save on power consumption?

#15
Posted 06/13/2013 06:11 PM   
Scroll To Top

Add Reply