18 GPUs in a single rig and it works
I often hear people say there is a 16-GPU limit for NVIDIA video cards, is it true? I searched google a bit but looks like no one has really made a test for it. Now, after successfully building a rig with more than 16 GPUs by myself, I can tell you this rumor is not true, the so-called 16-GPU limit (for NVIDIA cards) doesn't exist at all. The rig I built is a GPU monster with 11 NVIDIA cards (4x GTX660 Ti, 5x GTX295, and 2x 9800 GX2). You see, I used some old cards to save money, and 7 of those are dual-GPU cards, so the total GPU number is 18. The motherboard's model is Supermicro X9DRX+-F and it has 11 pci-e slots, but all of them are x8 slots. With similar method as FASTRA II used, some pci-e extenders are employed to make it possible to connect these cards onto the motherboard. Here is some detailed system information for the 18-GPU monster: [code] root@server:~# dmesg | grep "DMI:" [ 0.000000] DMI: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013[code][/code] [/code] [code] root@server:~# lspci | grep NVIDIA | grep -v bridge | grep -v Audio 03:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2) 04:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2) 07:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 08:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 0b:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 0c:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 0f:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2) 10:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2) 11:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1) 83:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 84:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 85:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1) 88:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 89:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 8c:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 8d:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1) 8e:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1) 8f:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1) [/code] [code]root@server:~# nvidia-smi -pm 1 Persistence mode is already Enabled for GPU 0000:03:00.0. Persistence mode is already Enabled for GPU 0000:04:00.0. Persistence mode is already Enabled for GPU 0000:07:00.0. Persistence mode is already Enabled for GPU 0000:08:00.0. Persistence mode is already Enabled for GPU 0000:0B:00.0. Persistence mode is already Enabled for GPU 0000:0C:00.0. Persistence mode is already Enabled for GPU 0000:0F:00.0. Persistence mode is already Enabled for GPU 0000:10:00.0. Persistence mode is already Enabled for GPU 0000:11:00.0. Persistence mode is already Enabled for GPU 0000:83:00.0. Persistence mode is already Enabled for GPU 0000:84:00.0. Persistence mode is already Enabled for GPU 0000:85:00.0. Persistence mode is already Enabled for GPU 0000:88:00.0. Persistence mode is already Enabled for GPU 0000:89:00.0. Persistence mode is already Enabled for GPU 0000:8C:00.0. Persistence mode is already Enabled for GPU 0000:8D:00.0. Persistence mode is already Enabled for GPU 0000:8E:00.0. Persistence mode is already Enabled for GPU 0000:8F:00.0. All done. [/code] [code] root@server:~# nvidia-smi Fri Nov 29 08:35:14 2013 +------------------------------------------------------+ | NVIDIA-SMI 5.319.37 Driver Version: 319.37 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce 9800 GX2 On | 0000:03:00.0 N/A | N/A | | N/A 53C N/A N/A / N/A | 3MB / 511MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce 9800 GX2 On | 0000:04:00.0 N/A | N/A | | 80% 56C N/A N/A / N/A | 3MB / 511MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 295 On | 0000:07:00.0 N/A | N/A | | N/A 50C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 295 On | 0000:08:00.0 N/A | N/A | | 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 295 On | 0000:0B:00.0 N/A | N/A | | N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 5 GeForce GTX 295 On | 0000:0C:00.0 N/A | N/A | | 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 6 GeForce 9800 GX2 On | 0000:0F:00.0 N/A | N/A | | N/A 55C N/A N/A / N/A | 3MB / 511MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 7 GeForce 9800 GX2 On | 0000:10:00.0 N/A | N/A | | 80% 53C N/A N/A / N/A | 3MB / 511MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 8 GeForce GTX 660 Ti On | 0000:11:00.0 N/A | N/A | | 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 9 GeForce GTX 295 On | 0000:83:00.0 N/A | N/A | | N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 10 GeForce GTX 295 On | 0000:84:00.0 N/A | N/A | | 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 11 GeForce GTX 660 Ti On | 0000:85:00.0 N/A | N/A | | 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 12 GeForce GTX 295 On | 0000:88:00.0 N/A | N/A | | N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 13 GeForce GTX 295 On | 0000:89:00.0 N/A | N/A | | 41% 51C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 14 GeForce GTX 295 On | 0000:8C:00.0 N/A | N/A | | N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 15 GeForce GTX 295 On | 0000:8D:00.0 N/A | N/A | | 41% 50C N/A N/A / N/A | 3MB / 895MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 16 GeForce GTX 660 Ti On | 0000:8E:00.0 N/A | N/A | | 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default | +-------------------------------+----------------------+----------------------+ | 17 GeForce GTX 660 Ti On | 0000:8F:00.0 N/A | N/A | | 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 Not Supported | | 1 Not Supported | | 2 Not Supported | | 3 Not Supported | | 4 Not Supported | | 5 Not Supported | | 6 Not Supported | | 7 Not Supported | | 8 Not Supported | | 9 Not Supported | | 10 Not Supported | | 11 Not Supported | | 12 Not Supported | | 13 Not Supported | | 14 Not Supported | | 15 Not Supported | | 16 Not Supported | | 17 Not Supported | +-----------------------------------------------------------------------------+ [/code] [code] root@server:~# deviceQuery | head -n39 deviceQuery Starting... CUDA Device Query (Driver API) statically linked version Detected 18 CUDA Capable device(s) Device 0: "GeForce GTX 660 Ti" CUDA Driver Version: 5.5 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2048 MBytes (2147287040 bytes) ( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores GPU Clock rate: 1084 MHz (1.08 GHz) Memory Clock rate: 3104 Mhz Memory Bus Width: 192-bit L2 Cache Size: 393216 bytes Max Texture Dimension Sizes 1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 17 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > [/code] [code] root@server:~# deviceQuery | grep "Device [0-9]" Device 0: "GeForce GTX 660 Ti" Device 1: "GeForce 9800 GX2" Device 2: "GeForce GTX 295" Device 3: "GeForce GTX 295" Device 4: "GeForce GTX 295" Device 5: "GeForce GTX 295" Device 6: "GeForce 9800 GX2" Device 7: "GeForce 9800 GX2" Device 8: "GeForce 9800 GX2" Device 9: "GeForce GTX 295" Device 10: "GeForce GTX 295" Device 11: "GeForce GTX 660 Ti" Device 12: "GeForce GTX 295" Device 13: "GeForce GTX 295" Device 14: "GeForce GTX 295" Device 15: "GeForce GTX 295" Device 16: "GeForce GTX 660 Ti" Device 17: "GeForce GTX 660 Ti" [/code] [code] root@server:~# ./nbody --device=17 --numbodies=65536 --benchmark | tail -n 6 gpuDeviceInit() CUDA Device [17]: "GeForce GTX 660 Ti > Compute 3.0 CUDA device: [GeForce GTX 660 Ti] number of bodies = 65536 65536 bodies, total time for 10 iterations: 693.433 ms = 61.938 billion interactions per second = 1238.754 single-precision GFLOP/s at 20 flops per interaction [/code] [code] root@server:~# bandwidthTest --device=17 [CUDA Bandwidth Test] - Starting... Running on... Device 17: GeForce GTX 660 Ti Quick Mode Host to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 381.1 Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 398.5 Device to Device Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 115168.9 Result = PASS [/code]
I often hear people say there is a 16-GPU limit for NVIDIA video cards, is it true? I searched google a bit but looks like no one has really made a test for it.
Now, after successfully building a rig with more than 16 GPUs by myself, I can tell you this rumor is not true, the so-called 16-GPU limit (for NVIDIA cards) doesn't exist at all.
The rig I built is a GPU monster with 11 NVIDIA cards (4x GTX660 Ti, 5x GTX295, and 2x 9800 GX2). You see, I used some old cards to save money, and 7 of those are dual-GPU cards, so the total GPU number is 18.
The motherboard's model is Supermicro X9DRX+-F and it has 11 pci-e slots, but all of them are x8 slots. With similar method as FASTRA II used, some pci-e extenders are employed to make it possible to connect these cards onto the motherboard.

Here is some detailed system information for the 18-GPU monster:

root@server:~# dmesg | grep "DMI:"
[ 0.000000] DMI: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013


root@server:~# lspci | grep NVIDIA | grep -v bridge | grep -v Audio
03:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
04:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
07:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0c:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0f:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
10:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
11:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
85:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
88:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8c:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8d:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8e:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
8f:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)


root@server:~# nvidia-smi -pm 1
Persistence mode is already Enabled for GPU 0000:03:00.0.
Persistence mode is already Enabled for GPU 0000:04:00.0.
Persistence mode is already Enabled for GPU 0000:07:00.0.
Persistence mode is already Enabled for GPU 0000:08:00.0.
Persistence mode is already Enabled for GPU 0000:0B:00.0.
Persistence mode is already Enabled for GPU 0000:0C:00.0.
Persistence mode is already Enabled for GPU 0000:0F:00.0.
Persistence mode is already Enabled for GPU 0000:10:00.0.
Persistence mode is already Enabled for GPU 0000:11:00.0.
Persistence mode is already Enabled for GPU 0000:83:00.0.
Persistence mode is already Enabled for GPU 0000:84:00.0.
Persistence mode is already Enabled for GPU 0000:85:00.0.
Persistence mode is already Enabled for GPU 0000:88:00.0.
Persistence mode is already Enabled for GPU 0000:89:00.0.
Persistence mode is already Enabled for GPU 0000:8C:00.0.
Persistence mode is already Enabled for GPU 0000:8D:00.0.
Persistence mode is already Enabled for GPU 0000:8E:00.0.
Persistence mode is already Enabled for GPU 0000:8F:00.0.
All done.

root@server:~# nvidia-smi
Fri Nov 29 08:35:14 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.319.37 Driver Version: 319.37 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 9800 GX2 On | 0000:03:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce 9800 GX2 On | 0000:04:00.0 N/A | N/A |
| 80% 56C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 295 On | 0000:07:00.0 N/A | N/A |
| N/A 50C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 295 On | 0000:08:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 295 On | 0000:0B:00.0 N/A | N/A |
| N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 295 On | 0000:0C:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce 9800 GX2 On | 0000:0F:00.0 N/A | N/A |
| N/A 55C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce 9800 GX2 On | 0000:10:00.0 N/A | N/A |
| 80% 53C N/A N/A / N/A | 3MB / 511MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 8 GeForce GTX 660 Ti On | 0000:11:00.0 N/A | N/A |
| 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 9 GeForce GTX 295 On | 0000:83:00.0 N/A | N/A |
| N/A 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 10 GeForce GTX 295 On | 0000:84:00.0 N/A | N/A |
| 41% 49C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 11 GeForce GTX 660 Ti On | 0000:85:00.0 N/A | N/A |
| 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 12 GeForce GTX 295 On | 0000:88:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 13 GeForce GTX 295 On | 0000:89:00.0 N/A | N/A |
| 41% 51C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 14 GeForce GTX 295 On | 0000:8C:00.0 N/A | N/A |
| N/A 53C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 15 GeForce GTX 295 On | 0000:8D:00.0 N/A | N/A |
| 41% 50C N/A N/A / N/A | 3MB / 895MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 16 GeForce GTX 660 Ti On | 0000:8E:00.0 N/A | N/A |
| 30% 31C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 17 GeForce GTX 660 Ti On | 0000:8F:00.0 N/A | N/A |
| 30% 32C N/A N/A / N/A | 7MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
| 2 Not Supported |
| 3 Not Supported |
| 4 Not Supported |
| 5 Not Supported |
| 6 Not Supported |
| 7 Not Supported |
| 8 Not Supported |
| 9 Not Supported |
| 10 Not Supported |
| 11 Not Supported |
| 12 Not Supported |
| 13 Not Supported |
| 14 Not Supported |
| 15 Not Supported |
| 16 Not Supported |
| 17 Not Supported |
+-----------------------------------------------------------------------------+

root@server:~# deviceQuery | head -n39
deviceQuery Starting...

CUDA Device Query (Driver API) statically linked version
Detected 18 CUDA Capable device(s)

Device 0: "GeForce GTX 660 Ti"
CUDA Driver Version: 5.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147287040 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Clock rate: 1084 MHz (1.08 GHz)
Memory Clock rate: 3104 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 393216 bytes
Max Texture Dimension Sizes 1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Texture alignment: 512 bytes
Maximum memory pitch: 2147483647 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 17 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

root@server:~# deviceQuery | grep "Device [0-9]"
Device 0: "GeForce GTX 660 Ti"
Device 1: "GeForce 9800 GX2"
Device 2: "GeForce GTX 295"
Device 3: "GeForce GTX 295"
Device 4: "GeForce GTX 295"
Device 5: "GeForce GTX 295"
Device 6: "GeForce 9800 GX2"
Device 7: "GeForce 9800 GX2"
Device 8: "GeForce 9800 GX2"
Device 9: "GeForce GTX 295"
Device 10: "GeForce GTX 295"
Device 11: "GeForce GTX 660 Ti"
Device 12: "GeForce GTX 295"
Device 13: "GeForce GTX 295"
Device 14: "GeForce GTX 295"
Device 15: "GeForce GTX 295"
Device 16: "GeForce GTX 660 Ti"
Device 17: "GeForce GTX 660 Ti"

root@server:~# ./nbody --device=17 --numbodies=65536 --benchmark | tail -n 6
gpuDeviceInit() CUDA Device [17]: "GeForce GTX 660 Ti
> Compute 3.0 CUDA device: [GeForce GTX 660 Ti]
number of bodies = 65536
65536 bodies, total time for 10 iterations: 693.433 ms
= 61.938 billion interactions per second
= 1238.754 single-precision GFLOP/s at 20 flops per interaction

root@server:~# bandwidthTest --device=17
[CUDA Bandwidth Test] - Starting...
Running on...

Device 17: GeForce GTX 660 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 381.1

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 398.5

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 115168.9

Result = PASS

#1
Posted 11/29/2013 09:52 AM   
Thanks for sharing. As far as I am aware, limits on the number of GPUs in a system are usually due to the system BIOS and have nothing to do with the GPUs. In addition, power constraints often make a very large number of GPUs in a single system impractical. There is also the potential issue of a significant PCIe bottleneck. This is the first I hear of someone getting more than 16 GPUs to work in a single system. Are you at liberty to divulge what the purpose of this monster rig is, or was this simply an attempt to prove that there is no hard 16-GPU limit?
Thanks for sharing. As far as I am aware, limits on the number of GPUs in a system are usually due to the system BIOS and have nothing to do with the GPUs. In addition, power constraints often make a very large number of GPUs in a single system impractical. There is also the potential issue of a significant PCIe bottleneck.

This is the first I hear of someone getting more than 16 GPUs to work in a single system. Are you at liberty to divulge what the purpose of this monster rig is, or was this simply an attempt to prove that there is no hard 16-GPU limit?

#2
Posted 11/29/2013 07:25 PM   
We are planning to use it to do some CUDA software develepment under Linux, putting so many GPUs in the rig is mainly just for fun. Now the monster has been running stably for near 24 hours, and no issues occur. (except for some warnings in dmesg, looks like harmless) [code]root@server:~# uptime 06:41:17 up 23:18, 1 user, load average: 0.96, 2.58, 1.55[/code] [code]root@server:~# dmesg | tail [ 56.915953] nvidia 0000:8f:00.0: PCI INT A -> GSI 66 (level, low) -> IRQ 66 [ 56.916264] nvidia 0000:8f:00.0: setting latency timer to 64 [ 56.916268] vgaarb: device changed decodes: PCI:0000:8f:00.0,olddecodes=io+mem,decodes=none:owns=none [ 56.917120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013 [ 3522.766116] NVRM: os_pci_init_handle: invalid context! [ 3522.766356] NVRM: os_pci_init_handle: invalid context! [ 4061.694390] NVRM: os_pci_init_handle: invalid context! [ 4061.694616] NVRM: os_pci_init_handle: invalid context! [83666.236316] NVRM: os_pci_init_handle: invalid context! [83666.236543] NVRM: os_pci_init_handle: invalid context![/code]
We are planning to use it to do some CUDA software develepment under Linux, putting so many GPUs in the rig is mainly just for fun.
Now the monster has been running stably for near 24 hours, and no issues occur.
(except for some warnings in dmesg, looks like harmless)

root@server:~# uptime
06:41:17 up 23:18, 1 user, load average: 0.96, 2.58, 1.55


root@server:~# dmesg | tail
[ 56.915953] nvidia 0000:8f:00.0: PCI INT A -> GSI 66 (level, low) -> IRQ 66
[ 56.916264] nvidia 0000:8f:00.0: setting latency timer to 64
[ 56.916268] vgaarb: device changed decodes: PCI:0000:8f:00.0,olddecodes=io+mem,decodes=none:owns=none
[ 56.917120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013
[ 3522.766116] NVRM: os_pci_init_handle: invalid context!
[ 3522.766356] NVRM: os_pci_init_handle: invalid context!
[ 4061.694390] NVRM: os_pci_init_handle: invalid context!
[ 4061.694616] NVRM: os_pci_init_handle: invalid context!
[83666.236316] NVRM: os_pci_init_handle: invalid context!
[83666.236543] NVRM: os_pci_init_handle: invalid context!

#3
Posted 11/30/2013 02:19 AM   
Eighteen GPUs in one rig?! Definitely the first time ever that I hear of a rig like yours. I wonder how it could actually be implemented for effective use... Sounds like fun though!
Eighteen GPUs in one rig?! Definitely the first time ever that I hear of a rig like yours.

I wonder how it could actually be implemented for effective use... Sounds like fun though!

#4
Posted 12/02/2013 05:10 PM   
[quote="zzz1000"]We are planning to use it to do some CUDA software develepment under Linux, putting so many GPUs in the rig is mainly just for fun. Now the monster has been running stably for near 24 hours, and no issues occur. (except for some warnings in dmesg, looks like harmless) [code]root@server:~# uptime 06:41:17 up 23:18, 1 user, load average: 0.96, 2.58, 1.55[/code] [code]root@server:~# dmesg | tail [ 56.915953] nvidia 0000:8f:00.0: PCI INT A -> GSI 66 (level, low) -> IRQ 66 [ 56.916264] nvidia 0000:8f:00.0: setting latency timer to 64 [ 56.916268] vgaarb: device changed decodes: PCI:0000:8f:00.0,olddecodes=io+mem,decodes=none:owns=none [ 56.917120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013 [ 3522.766116] NVRM: os_pci_init_handle: invalid context! [ 3522.766356] NVRM: os_pci_init_handle: invalid context! [ 4061.694390] NVRM: os_pci_init_handle: invalid context! [ 4061.694616] NVRM: os_pci_init_handle: invalid context! [83666.236316] NVRM: os_pci_init_handle: invalid context! [83666.236543] NVRM: os_pci_init_handle: invalid context![/code][/quote] Have you tried changing your Nvidia graphics drivers? If I recall correctly, I've seen a case where reverting to older drivers solved the problem of getting the os_pci_init_handle: invalid context! errors. Pretty sure 18 GPUs weren't used in that situation, so it might not help... :D [b]Edit:[/b] Perhaps look at this thread: http://nvnews.net/vbulletin/showthread.php?p=2576162#post2576162
zzz1000 said:We are planning to use it to do some CUDA software develepment under Linux, putting so many GPUs in the rig is mainly just for fun.
Now the monster has been running stably for near 24 hours, and no issues occur.
(except for some warnings in dmesg, looks like harmless)

root@server:~# uptime
06:41:17 up 23:18, 1 user, load average: 0.96, 2.58, 1.55


root@server:~# dmesg | tail
[ 56.915953] nvidia 0000:8f:00.0: PCI INT A -> GSI 66 (level, low) -> IRQ 66
[ 56.916264] nvidia 0000:8f:00.0: setting latency timer to 64
[ 56.916268] vgaarb: device changed decodes: PCI:0000:8f:00.0,olddecodes=io+mem,decodes=none:owns=none
[ 56.917120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 319.37 Wed Jul 3 17:08:50 PDT 2013
[ 3522.766116] NVRM: os_pci_init_handle: invalid context!
[ 3522.766356] NVRM: os_pci_init_handle: invalid context!
[ 4061.694390] NVRM: os_pci_init_handle: invalid context!
[ 4061.694616] NVRM: os_pci_init_handle: invalid context!
[83666.236316] NVRM: os_pci_init_handle: invalid context!
[83666.236543] NVRM: os_pci_init_handle: invalid context!



Have you tried changing your Nvidia graphics drivers? If I recall correctly, I've seen a case where reverting to older drivers solved the problem of getting the os_pci_init_handle: invalid context! errors. Pretty sure 18 GPUs weren't used in that situation, so it might not help... :D

Edit: Perhaps look at this thread:

http://nvnews.net/vbulletin/showthread.php?p=2576162#post2576162

#5
Posted 12/02/2013 05:21 PM   
Thank you, realbigdreamer. After some BIOS tweaking the issue seemed to have been fixed, everything is working perfectly now. [quote="realbigdreamer"]Have you tried changing your Nvidia graphics drivers? If I recall correctly, I've seen a case where reverting to older drivers solved the problem of getting the os_pci_init_handle: invalid context! errors. Pretty sure 18 GPUs weren't used in that situation, so it might not help... :D [b]Edit:[/b] Perhaps look at this thread: http://nvnews.net/vbulletin/showthread.php?p=2576162#post2576162[/quote]
Thank you, realbigdreamer.
After some BIOS tweaking the issue seemed to have been fixed, everything is working perfectly now.

realbigdreamer said:Have you tried changing your Nvidia graphics drivers? If I recall correctly, I've seen a case where reverting to older drivers solved the problem of getting the os_pci_init_handle: invalid context! errors. Pretty sure 18 GPUs weren't used in that situation, so it might not help... :D Edit: Perhaps look at this thread: http://nvnews.net/vbulletin/showthread.php?p=2576162#post2576162

#6
Posted 12/07/2013 04:46 PM   
(I'm probably replying to this too late, but if anyone wants to share their thoughts or experiences putting many GPUs into one system, I'm very interested!) Thanks for posting the bandwidth tests! [code] Device to Host Bandwidth, 1 Device(s) PINNED Memory Transfers Transfer Size (Bytes) Bandwidth(MB/s) 33554432 398.5 [/code] The GTX 660 Ti uses PCIe 3.0, so in an x8 slot, it should have had close to 8GB/s bandwidth, instead of 0.4GB/s -- nearly 20-fold loss! If you used an x1 to x16 riser instead of an x8 to x16 one, it looks like you are still experiencing a very significant bandwidth loss. Were your PCI extenders shielded or impedance matched?
(I'm probably replying to this too late, but if anyone wants to share their thoughts or experiences putting many GPUs into one system, I'm very interested!)

Thanks for posting the bandwidth tests!

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 398.5


The GTX 660 Ti uses PCIe 3.0, so in an x8 slot, it should have had close to 8GB/s bandwidth, instead of 0.4GB/s -- nearly 20-fold loss! If you used an x1 to x16 riser instead of an x8 to x16 one, it looks like you are still experiencing a very significant bandwidth loss.

Were your PCI extenders shielded or impedance matched?

#7
Posted 08/10/2014 11:55 PM   
Scroll To Top

Add Reply