SLI use with CUDA programming General CUDA programming
I was reviewing the NVIDIA website and the stated specifications for the 8800 GTX read
"NVIDIA® SLI™ Technology1:
Delivers up to 2x the performance of a single graphics card configuration for unequaled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today's hottest games." at location [url="http://www.nvidia.com/page/8800_features.html"]http://www.nvidia.com/page/8800_features.html[/url] .

Similarly, for the Quadro FX 4600 and 5600:
"NVIDIA SLI Technology
NVIDIA® SLI™ technology enables dynamically
scalable graphics performance, enhanced image
quality, and expanded display real-estate." at location [url="http://www.nvidia.com/docs/IO/40049/quadro_fx_5600_datasheet.pdf"]http://www.nvidia.com/docs/IO/40049/quadro...0_datasheet.pdf[/url]

Further documentation at location [url="http://www.nvidia.com/object/quadro_sli.html"]http://www.nvidia.com/object/quadro_sli.html[/url]

"SLI Frame Rendering: Combines two identical NVIDIA Quadro PCI Express graphics cards with an SLI connector to transparently scale application performance on a single display by presenting them as a single graphics card to the operating system."

Therefore, can I use SLI in conjunction with CUDA to have two identical cards on my machine (any 8800 or Quadro 5600 or 4600) and program 256 multiprocessors as though they were one GPU?

Please assume (somehow) that I can obtain the hardware that is compliant and has sufficient requirements to mount and run the two GPU cards.
I was reviewing the NVIDIA website and the stated specifications for the 8800 GTX read

"NVIDIA® SLI™ Technology1:

Delivers up to 2x the performance of a single graphics card configuration for unequaled gaming experiences by allowing two graphics cards to run in parallel. The must-have feature for performance PCI Express® graphics, SLI dramatically scales performance on today's hottest games." at location http://www.nvidia.com/page/8800_features.html .



Similarly, for the Quadro FX 4600 and 5600:

"NVIDIA SLI Technology

NVIDIA® SLI™ technology enables dynamically

scalable graphics performance, enhanced image

quality, and expanded display real-estate." at location http://www.nvidia.com/docs/IO/40049/quadro...0_datasheet.pdf



Further documentation at location http://www.nvidia.com/object/quadro_sli.html



"SLI Frame Rendering: Combines two identical NVIDIA Quadro PCI Express graphics cards with an SLI connector to transparently scale application performance on a single display by presenting them as a single graphics card to the operating system."



Therefore, can I use SLI in conjunction with CUDA to have two identical cards on my machine (any 8800 or Quadro 5600 or 4600) and program 256 multiprocessors as though they were one GPU?



Please assume (somehow) that I can obtain the hardware that is compliant and has sufficient requirements to mount and run the two GPU cards.

#1
Posted 03/15/2007 02:18 PM   
SLI and CUDA are orthogonal concepts. The first is for automatic distribution of rasterization, the second is for addressing direct execution of code on the GPU. CUDA is not used for rendering (on- or offscreen). That is when using CUDA you can simply list all available cards in the machine and directly submit code to execute. This code has nothing to do with shader code - it is C-like. So you have a lot more control of what happens where and when.

Peter
SLI and CUDA are orthogonal concepts. The first is for automatic distribution of rasterization, the second is for addressing direct execution of code on the GPU. CUDA is not used for rendering (on- or offscreen). That is when using CUDA you can simply list all available cards in the machine and directly submit code to execute. This code has nothing to do with shader code - it is C-like. So you have a lot more control of what happens where and when.



Peter

#2
Posted 03/15/2007 02:29 PM   
Thanks Peter. I get it better now.
Thanks Peter. I get it better now.

#3
Posted 03/15/2007 02:30 PM   
[quote name='gerdw' date='Mar 15 2007, 09:18 AM']Therefore, can I use SLI in conjunction with CUDA to have two identical cards on my machine (any 8800 or Quadro 5600 or 4600) and program 256 multiprocessors as though they were one GPU?
[/quote]
You cannot treat two 8800 cards as a single set of 256 processors. You can, however, threat them as two sets of 128 processors each (you'd need to have two threads, each of which would copy the necessary data and launch a kernel on a respective card). Similarly, you can take advantage of 3 cards. One of the reasons could be that cards do not really have shared memory in SLI mode - shared data must be copied from one to the other via the bus. So, if a "unified" look at the two SLI'ed cards were allowed, accessing different global memory addresses could have very different latencies.

Paulius

P.S. The 8800 has 16 multiprocessors, each with 8 stream processors.
[quote name='gerdw' date='Mar 15 2007, 09:18 AM']Therefore, can I use SLI in conjunction with CUDA to have two identical cards on my machine (any 8800 or Quadro 5600 or 4600) and program 256 multiprocessors as though they were one GPU?



You cannot treat two 8800 cards as a single set of 256 processors. You can, however, threat them as two sets of 128 processors each (you'd need to have two threads, each of which would copy the necessary data and launch a kernel on a respective card). Similarly, you can take advantage of 3 cards. One of the reasons could be that cards do not really have shared memory in SLI mode - shared data must be copied from one to the other via the bus. So, if a "unified" look at the two SLI'ed cards were allowed, accessing different global memory addresses could have very different latencies.



Paulius



P.S. The 8800 has 16 multiprocessors, each with 8 stream processors.

#4
Posted 03/15/2007 09:35 PM   
[quote name='prkipfer' date='Mar 15 2007, 07:29 AM']SLI and CUDA are orthogonal concepts. The first is for automatic distribution of rasterization, the second is for addressing direct execution of code on the GPU. CUDA is not used for rendering (on- or offscreen). That is when using CUDA you can simply list all available cards in the machine and directly submit code to execute. This code has nothing to do with shader code - it is C-like. So you have a lot more control of what happens where and when.

Peter
[right][snapback]171745[/snapback][/right]
[/quote]

Can one ship data from one GPU to another without going through the host, or faster than going through the host?
[quote name='prkipfer' date='Mar 15 2007, 07:29 AM']SLI and CUDA are orthogonal concepts. The first is for automatic distribution of rasterization, the second is for addressing direct execution of code on the GPU. CUDA is not used for rendering (on- or offscreen). That is when using CUDA you can simply list all available cards in the machine and directly submit code to execute. This code has nothing to do with shader code - it is C-like. So you have a lot more control of what happens where and when.



Peter

[snapback]171745[/snapback]






Can one ship data from one GPU to another without going through the host, or faster than going through the host?

#5
Posted 04/03/2007 03:22 PM   
Not in the current beta release of CUDA, but this is planned for a future release.

Mark
Not in the current beta release of CUDA, but this is planned for a future release.



Mark

#6
Posted 04/03/2007 03:35 PM   
[quote name='Mark Harris' date='Apr 3 2007, 08:35 AM']Not in the current beta release of CUDA, but this is planned for a future release.

Mark
[right][snapback]179324[/snapback][/right]
[/quote]

what sort of speeds (or speed-ups vis-a-vis PCI express) are expected for data transfers on this channel?
[quote name='Mark Harris' date='Apr 3 2007, 08:35 AM']Not in the current beta release of CUDA, but this is planned for a future release.



Mark

[snapback]179324[/snapback]






what sort of speeds (or speed-ups vis-a-vis PCI express) are expected for data transfers on this channel?

#7
Posted 04/03/2007 09:48 PM   
[quote="Mark Harris"]Not in the current beta release of CUDA, but this is planned for a future release. Mark [/quote] Very old thread, but any updates on this? Can CUDA 6 use SLI to move data between devices without going through PCIe and CPU?
Mark Harris said:Not in the current beta release of CUDA, but this is planned for a future release.


Mark


Very old thread, but any updates on this? Can CUDA 6 use SLI to move data between devices without going through PCIe and CPU?

#8
Posted 07/28/2014 09:36 AM   
Looks like the answer is no. CUDA 6.0 programming manual, search for "peer": [url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#peer-to-peer-memory-access[/url] [url]http://docs.nvidia.com/cuda/cuda-samples/index.html#new-cuda-code-samples-in-cuda-6-0[/url] Do not use SLI in that case: [url]http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-general-known-issues[/url] >>Peer access is disabled between two devices if either of them is in SLI mode.<<
Looks like the answer is no. CUDA 6.0 programming manual, search for "peer":
http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#peer-to-peer-memory-access
http://docs.nvidia.com/cuda/cuda-samples/index.html#new-cuda-code-samples-in-cuda-6-0

Do not use SLI in that case:
http://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-general-known-issues
>>Peer access is disabled between two devices if either of them is in SLI mode.<<

#9
Posted 07/28/2014 11:43 AM   
Thanks!
Thanks!

#10
Posted 07/29/2014 02:58 AM   
Scroll To Top