18 GPUs in a single rig and it works

I often hear people say there is a 16-GPU limit for NVIDIA video cards, is it true? I searched google a bit but looks like no one has really made a test for it.
Now, after successfully building a rig with more than 16 GPUs by myself, I can tell you this rumor is not true, the so-called 16-GPU limit (for NVIDIA cards) doesn’t exist at all.
The rig I built is a GPU monster with 11 NVIDIA cards (4x GTX660 Ti, 5x GTX295, and 2x 9800 GX2). You see, I used some old cards to save money, and 7 of those are dual-GPU cards, so the total GPU number is 18.
The motherboard’s model is Supermicro X9DRX±F and it has 11 pci-e slots, but all of them are x8 slots. With similar method as FASTRA II used, some pci-e extenders are employed to make it possible to connect these cards onto the motherboard.

Here is some detailed system information for the 18-GPU monster:

root@server:~# dmesg | grep "DMI:"
[    0.000000] DMI: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
root@server:~# lspci | grep NVIDIA | grep -v bridge | grep -v Audio
03:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
04:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
07:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0b:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0c:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
0f:00.0 3D controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
10:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 9800 GX2] (rev a2)
11:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
83:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
84:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
85:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
88:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
89:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8c:00.0 3D controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8d:00.0 VGA compatible controller: NVIDIA Corporation GT200b [GeForce GTX 295] (rev a1)
8e:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
8f:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
root@server:~# nvidia-smi -pm 1
Persistence mode is already Enabled for GPU 0000:03:00.0.
Persistence mode is already Enabled for GPU 0000:04:00.0.
Persistence mode is already Enabled for GPU 0000:07:00.0.
Persistence mode is already Enabled for GPU 0000:08:00.0.
Persistence mode is already Enabled for GPU 0000:0B:00.0.
Persistence mode is already Enabled for GPU 0000:0C:00.0.
Persistence mode is already Enabled for GPU 0000:0F:00.0.
Persistence mode is already Enabled for GPU 0000:10:00.0.
Persistence mode is already Enabled for GPU 0000:11:00.0.
Persistence mode is already Enabled for GPU 0000:83:00.0.
Persistence mode is already Enabled for GPU 0000:84:00.0.
Persistence mode is already Enabled for GPU 0000:85:00.0.
Persistence mode is already Enabled for GPU 0000:88:00.0.
Persistence mode is already Enabled for GPU 0000:89:00.0.
Persistence mode is already Enabled for GPU 0000:8C:00.0.
Persistence mode is already Enabled for GPU 0000:8D:00.0.
Persistence mode is already Enabled for GPU 0000:8E:00.0.
Persistence mode is already Enabled for GPU 0000:8F:00.0.
All done.
root@server:~# nvidia-smi
Fri Nov 29 08:35:14 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.319.37   Driver Version: 319.37         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 9800 GX2    On   | 0000:03:00.0     N/A |                  N/A |
| N/A   53C  N/A     N/A /  N/A |        3MB /   511MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce 9800 GX2    On   | 0000:04:00.0     N/A |                  N/A |
| 80%   56C  N/A     N/A /  N/A |        3MB /   511MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 295     On   | 0000:07:00.0     N/A |                  N/A |
| N/A   50C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 295     On   | 0000:08:00.0     N/A |                  N/A |
| 41%   49C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 295     On   | 0000:0B:00.0     N/A |                  N/A |
| N/A   51C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   5  GeForce GTX 295     On   | 0000:0C:00.0     N/A |                  N/A |
| 41%   49C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   6  GeForce 9800 GX2    On   | 0000:0F:00.0     N/A |                  N/A |
| N/A   55C  N/A     N/A /  N/A |        3MB /   511MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce 9800 GX2    On   | 0000:10:00.0     N/A |                  N/A |
| 80%   53C  N/A     N/A /  N/A |        3MB /   511MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   8  GeForce GTX 660 Ti  On   | 0000:11:00.0     N/A |                  N/A |
| 30%   32C  N/A     N/A /  N/A |        7MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   9  GeForce GTX 295     On   | 0000:83:00.0     N/A |                  N/A |
| N/A   51C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  10  GeForce GTX 295     On   | 0000:84:00.0     N/A |                  N/A |
| 41%   49C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  11  GeForce GTX 660 Ti  On   | 0000:85:00.0     N/A |                  N/A |
| 30%   31C  N/A     N/A /  N/A |        7MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  12  GeForce GTX 295     On   | 0000:88:00.0     N/A |                  N/A |
| N/A   53C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  13  GeForce GTX 295     On   | 0000:89:00.0     N/A |                  N/A |
| 41%   51C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  14  GeForce GTX 295     On   | 0000:8C:00.0     N/A |                  N/A |
| N/A   53C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  15  GeForce GTX 295     On   | 0000:8D:00.0     N/A |                  N/A |
| 41%   50C  N/A     N/A /  N/A |        3MB /   895MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  16  GeForce GTX 660 Ti  On   | 0000:8E:00.0     N/A |                  N/A |
| 30%   31C  N/A     N/A /  N/A |        7MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|  17  GeForce GTX 660 Ti  On   | 0000:8F:00.0     N/A |                  N/A |
| 30%   32C  N/A     N/A /  N/A |        7MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
|    1            Not Supported                                               |
|    2            Not Supported                                               |
|    3            Not Supported                                               |
|    4            Not Supported                                               |
|    5            Not Supported                                               |
|    6            Not Supported                                               |
|    7            Not Supported                                               |
|    8            Not Supported                                               |
|    9            Not Supported                                               |
|   10            Not Supported                                               |
|   11            Not Supported                                               |
|   12            Not Supported                                               |
|   13            Not Supported                                               |
|   14            Not Supported                                               |
|   15            Not Supported                                               |
|   16            Not Supported                                               |
|   17            Not Supported                                               |
+-----------------------------------------------------------------------------+
root@server:~# deviceQuery | head -n39
deviceQuery Starting...

CUDA Device Query (Driver API) statically linked version
Detected 18 CUDA Capable device(s)

Device 0: "GeForce GTX 660 Ti"
  CUDA Driver Version:                           5.5
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147287040 bytes)
  ( 7) Multiprocessors, (192) CUDA Cores/MP:     1344 CUDA Cores
  GPU Clock rate:                                1084 MHz (1.08 GHz)
  Memory Clock rate:                             3104 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 393216 bytes
  Max Texture Dimension Sizes                    1D=(65536) 2D=(65536, 65536) 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size (x,y,z):    (2147483647, 65535, 65535)
  Texture alignment:                             512 bytes
  Maximum memory pitch:                          2147483647 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           17 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
root@server:~# deviceQuery | grep "Device [0-9]"
Device 0: "GeForce GTX 660 Ti"
Device 1: "GeForce 9800 GX2"
Device 2: "GeForce GTX 295"
Device 3: "GeForce GTX 295"
Device 4: "GeForce GTX 295"
Device 5: "GeForce GTX 295"
Device 6: "GeForce 9800 GX2"
Device 7: "GeForce 9800 GX2"
Device 8: "GeForce 9800 GX2"
Device 9: "GeForce GTX 295"
Device 10: "GeForce GTX 295"
Device 11: "GeForce GTX 660 Ti"
Device 12: "GeForce GTX 295"
Device 13: "GeForce GTX 295"
Device 14: "GeForce GTX 295"
Device 15: "GeForce GTX 295"
Device 16: "GeForce GTX 660 Ti"
Device 17: "GeForce GTX 660 Ti"
root@server:~# ./nbody --device=17 --numbodies=65536 --benchmark | tail -n 6
gpuDeviceInit() CUDA Device [17]: "GeForce GTX 660 Ti
> Compute 3.0 CUDA device: [GeForce GTX 660 Ti]
number of bodies = 65536
65536 bodies, total time for 10 iterations: 693.433 ms
= 61.938 billion interactions per second
= 1238.754 single-precision GFLOP/s at 20 flops per interaction
root@server:~# bandwidthTest --device=17
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 17: GeForce GTX 660 Ti
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     381.1

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     398.5

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(MB/s)
   33554432                     115168.9

Result = PASS

Thanks for sharing. As far as I am aware, limits on the number of GPUs in a system are usually due to the system BIOS and have nothing to do with the GPUs. In addition, power constraints often make a very large number of GPUs in a single system impractical. There is also the potential issue of a significant PCIe bottleneck.

This is the first I hear of someone getting more than 16 GPUs to work in a single system. Are you at liberty to divulge what the purpose of this monster rig is, or was this simply an attempt to prove that there is no hard 16-GPU limit?

We are planning to use it to do some CUDA software develepment under Linux, putting so many GPUs in the rig is mainly just for fun.
Now the monster has been running stably for near 24 hours, and no issues occur.
(except for some warnings in dmesg, looks like harmless)

root@server:~# uptime
 06:41:17 up 23:18,  1 user,  load average: 0.96, 2.58, 1.55
root@server:~# dmesg | tail
[   56.915953] nvidia 0000:8f:00.0: PCI INT A -> GSI 66 (level, low) -> IRQ 66
[   56.916264] nvidia 0000:8f:00.0: setting latency timer to 64
[   56.916268] vgaarb: device changed decodes: PCI:0000:8f:00.0,olddecodes=io+mem,decodes=none:owns=none
[   56.917120] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  319.37  Wed Jul  3 17:08:50 PDT 2013
[ 3522.766116] NVRM: os_pci_init_handle: invalid context!
[ 3522.766356] NVRM: os_pci_init_handle: invalid context!
[ 4061.694390] NVRM: os_pci_init_handle: invalid context!
[ 4061.694616] NVRM: os_pci_init_handle: invalid context!
[83666.236316] NVRM: os_pci_init_handle: invalid context!
[83666.236543] NVRM: os_pci_init_handle: invalid context!

Eighteen GPUs in one rig?! Definitely the first time ever that I hear of a rig like yours.

I wonder how it could actually be implemented for effective use… Sounds like fun though!

Have you tried changing your Nvidia graphics drivers? If I recall correctly, I’ve seen a case where reverting to older drivers solved the problem of getting the os_pci_init_handle: invalid context! errors. Pretty sure 18 GPUs weren’t used in that situation, so it might not help… :D

Edit: Perhaps look at this thread:

http://nvnews.net/vbulletin/showthread.php?p=2576162#post2576162

Thank you, realbigdreamer.
After some BIOS tweaking the issue seemed to have been fixed, everything is working perfectly now.

(I’m probably replying to this too late, but if anyone wants to share their thoughts or experiences putting many GPUs into one system, I’m very interested!)

Thanks for posting the bandwidth tests!

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
             33554432           398.5

The GTX 660 Ti uses PCIe 3.0, so in an x8 slot, it should have had close to 8GB/s bandwidth, instead of 0.4GB/s – nearly 20-fold loss! If you used an x1 to x16 riser instead of an x8 to x16 one, it looks like you are still experiencing a very significant bandwidth loss.

Were your PCI extenders shielded or impedance matched?