[SOLVED] Run CUDA on dedicated NVIDIA GPU while connecting monitors to Intel HD graphics, is this possible?

FangQ · February 3, 2017, 8:03pm

hi everyone

I have a number of CUDA development machines running Ubuntu Linux with NVIDIA GPUs (various GPUs, including GTX 980Ti, Titan X, 1080, 590 etc). In the past, I have always installed at least two graphics card on each machine - use the low-end card for display, and high-end card for dedicated computing.

I recently came across this post:

http://osdf.github.io/blog/intel-integrated-graphics-dedicated-gpu-for-cuda-and-ubuntu-1310.html

it looks like it is possible to use Intel HD GPU to handle displays, while reserving the NVIDIA devices for computing on the same machine. However, after following the procedures on two 14.04 boxes, I could not get this to work - either the Intel driver is used, or nvidia driver, but not the same time.

Is this approach feasible at all? does it work with only hybrid graphics systems? anyone has experience?

thanks

Robert_Crovella · February 3, 2017, 8:49pm

Are you talking about laptops only?

On a desktop system with intel integrated HD graphics, it should definitely be possible to use the intel graphics as the display while using the CUDA GPU for compute.

This will require that the system BIOS allow the integrated graphics to be enabled even if an extra GPU (the CUDA GPU) is plugged in. Some system BIOSes automatically detect the presence of a VGA device in the add in card slot and automatically disable the integrated GPU.

For optimus laptops, the situation is more complicated, and it may not be possible to use the devices separately, depending on the laptop design.

If you leave the NVIDIA GPU out of the desktop system while setting up linux, and can set up a functional display on the integrated graphics, then power down the system and add the NVIDIA GPU.

If, when you power up the system, things are still working normally with your display, and you can see both devices with lspci, then there’s a good chance you can use both.

When installing the NVIDIA display driver, be sure to:

not install the openGL libs (there are command line options with driver runfile installers or CUDA runfile installers to allow this)
make sure not to make any changes to the xorg.conf configuration.

FangQ · February 6, 2017, 11:19pm

thanks for the quick reply. I’d like to add some new findings.

first of all, yes, I was talking about desktop computers. The one being tested is a desktop with 980Ti and Intel 6700k (HD graphics). It has two monitors (1920x1200/HDMI-2 and 3440x1440/DP-1) connected to the motherboard’s video outputs.

my OS is Ubuntu 14.04, I installed both nvidia/cuda driver (via NVIDIA’s apt repo) and xorg intel’s driver. the nvidia driver did not give me an option to disable OpenGL libraries when installed.

In the bios, I did enable onboard GPU.

here are some updated findings:

If I start nvidia-settings and set NVIDIA (Performance mode) as the GPU, I was able to get video output from the monitor (1920x1200) connecting to the motherboard via HDMI. I could run my CUDA code and list the 980Ti and run simulations on the 980Ti. I could also run glxinfo and run glgears/glmark2 benchmarks - all listing nvidia OpenGL. However, I can not enable the 3440x1440 display, even though I could see it in “xrandr -q” output and “arandr” gui, when enabling it, it gives an error

xrandr: can not find mode 3440x1440

just want to mention this again: no cable is connected to the 980Ti, displays are only connect to the motherboard’s video connectors!

see my first attachment.

If I select Intel (Power Saving Mode) from PRIME settings in nvidia-settings, and restart X, I could use both displays, I could also run glxinfo/glgears/glmarks, and it lists Intel’s GPU properly. However, I could not run nvidia-smi or my CUDA program, it returns no NVIDIA GPU found!

In either case, I feel my mouse a bit sticky from time to time.

My questions are:

in case 1, which GPU is handling the graphics? my monitors are connected to the motherboard, but it looks like nvidia driver thinks it is handling the graphics.
how can I get both monitors to work with nvidia driver?

Robert_Crovella · February 7, 2017, 4:58am

To access the options to not install the openGL graphics files that I mentioned, you would need to use the driver runfile installer or the CUDA runfile installer. You will need to find the command line switches to disable installation of the openGL files, this can be done with command line help for these installers.

FangQ · February 7, 2017, 5:33pm

@txbob. thanks again for the help. I am not sure if OpenGL library matters here.

two new findings:

when using the PRIME setting in nvidia-settings to enable NVIDIA GPU, it looks like my 980Ti is actually handling the graphics, despite my display cables are connected to the motherboard. The nvidia-smi shows xorg is running on 980Ti, I also ran my CUDA code, the screen freezes when it is running, just like I connect my display to the 980Ti, as if it is the only GPU on the system. This is definitely not what I wanted. I want the GPU to be dedicated for compute, without subject to the watchdog time limit.
when switching to the Intel GPU, all GL programs report Intel OpenGL, for example, glmark2 prints:

fangq@taote:~/space/git/Project$ glmark2
=======================================================
    glmark2 2012.08
=======================================================
    OpenGL Information
    GL_VENDOR:     Intel Open Source Technology Center
    GL_RENDERER:   Mesa DRI Intel(R) HD Graphics 530 (Skylake GT2) 
    GL_VERSION:    3.0 Mesa 11.2.0
=======================================================

so I think nvidia-prime switch all GL libraries accordingly.

Now, given my observations, what I really wanted to achieve is to use CUDA/NVIDIA GPU under the Intel GPU mode. however, in this case, nvidia-smi simply gives me

fangq@taote:~/space/git/Project$ nvidia-smi
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.

and my CUDA code (MCX) failed to list any valid GPU.

anyone know how to access nvidia hardware/cuda library when prime sets intel as the default GPU?

FangQ · February 8, 2017, 6:27pm

An update, problem solved.

All I need to do is to add cuda driver’s path (in my case /usr/lib/nvidia-375) to the LD_LIBRARY_PATH.

A side problem is the libGL.* in nvidia driver will take priority over mesa libGL. The fix is to remove or rename/relocate the libGL.so*/libGLX.so*/libGLdispatch.so* to a different folder, or not to install them if you install from the .run file (I prefer to install from apt-get).

In summary, in order to make this to work, you need to

make sure you have enabled onboard graphics in the BIOS settings (or set it as primary)
install both xorg intel driver and nvidia/cuda drivers
start nvidia-settings, and go to the PRIME settings page, set Intel (Power Saving Mode) as default
modify your .bashrc and set LD_LIBRARY_PATH to at least contain /usr/local/cuda/lib64:/usr/lib/nvidia-XXX where XXX in my case is 375.
logout to restart X or reboot
run ldd $( which glxinfo ) to make sure your GL libraries point to mesa, or run glmark2 to confirm GL status
(update) if the libGL printed from step 5 points to nvidia’s driver folder, you need to remove/rename the libGL.so*/libGLX.so*/libGLdispatch.so* under nvidia driver folder so that your OS can pick up the mesa libGL library.
run nvidia-smi to list your dedicated NVIDIA GPU, and run your CUDA program, you should not see any errors.

Chaostone · March 27, 2017, 9:47am

Hi FangQ,

Thanks for the info here. I encountered the same question.

I have another question though. If I want to switch back to using Nvidia GPU for display, rendering, and gaming etc., do I have to change the libGL.so* names back and reboot? Do I need to change the nvidia-settings also?

Thanks

An update, problem solved.

All I need to do is to add cuda driver’s path (in my case /usr/lib/nvidia-375) to the LD_LIBRARY_PATH.

A side problem is the libGL.* in nvidia driver will take priority over mesa libGL. The fix is to remove or rename/relocate the libGL.so*/libGLX.so*/libGLdispatch.so* to a different folder, or not to install them if you install from the .run file (I prefer to install from apt-get).

In summary, in order to make this to work, you need to

make sure you have enabled onboard graphics in the BIOS settings (or set it as primary)

install both xorg intel driver and nvidia/cuda drivers

start nvidia-settings, and go to the PRIME settings page, set Intel (Power Saving Mode) as default

modify your .bashrc and set LD_LIBRARY_PATH to at least contain /usr/local/cuda/lib64:/usr/lib/nvidia-XXX where XXX in my case is 375.

logout to restart X or reboot

run ldd $( which glxinfo ) to make sure your GL libraries point to mesa, or run glmark2 to confirm GL status

(update) if the libGL printed from step 5 points to nvidia’s driver folder, you need to remove/rename the libGL.so*/libGLX.so*/libGLdispatch.so* under nvidia driver folder so that your OS can pick up the mesa libGL library.

run nvidia-smi to list your dedicated NVIDIA GPU, and run your CUDA program, you should not see any errors.

FangQ · March 27, 2017, 1:45pm

yes, you need to launch nvidia-settings, change display to NVIDIA card in PRIME, and then reboot. I suspect you will also need to move the libGL* files back to the nvidia driver folder.

THChew · June 27, 2017, 9:37am

An update, problem solved.

All I need to do is to add cuda driver’s path (in my case /usr/lib/nvidia-375) to the LD_LIBRARY_PATH.

A side problem is the libGL.* in nvidia driver will take priority over mesa libGL. The fix is to remove or rename/relocate the libGL.so*/libGLX.so*/libGLdispatch.so* to a different folder, or not to install them if you install from the .run file (I prefer to install from apt-get).

In summary, in order to make this to work, you need to

make sure you have enabled onboard graphics in the BIOS settings (or set it as primary)

install both xorg intel driver and nvidia/cuda drivers

start nvidia-settings, and go to the PRIME settings page, set Intel (Power Saving Mode) as default

modify your .bashrc and set LD_LIBRARY_PATH to at least contain /usr/local/cuda/lib64:/usr/lib/nvidia-XXX where XXX in my case is 375.

logout to restart X or reboot

run ldd $( which glxinfo ) to make sure your GL libraries point to mesa, or run glmark2 to confirm GL status

(update) if the libGL printed from step 5 points to nvidia’s driver folder, you need to remove/rename the libGL.so*/libGLX.so*/libGLdispatch.so* under nvidia driver folder so that your OS can pick up the mesa libGL library.

run nvidia-smi to list your dedicated NVIDIA GPU, and run your CUDA program, you should not see any errors.

Thanks for the guide. Manage to use this to enable Gromacs utilizing NVIDIA GPUs.

THChew · June 27, 2017, 9:42am

An update, problem solved.

All I need to do is to add cuda driver’s path (in my case /usr/lib/nvidia-375) to the LD_LIBRARY_PATH.

A side problem is the libGL.* in nvidia driver will take priority over mesa libGL. The fix is to remove or rename/relocate the libGL.so*/libGLX.so*/libGLdispatch.so* to a different folder, or not to install them if you install from the .run file (I prefer to install from apt-get).

In summary, in order to make this to work, you need to

make sure you have enabled onboard graphics in the BIOS settings (or set it as primary)

install both xorg intel driver and nvidia/cuda drivers

start nvidia-settings, and go to the PRIME settings page, set Intel (Power Saving Mode) as default

modify your .bashrc and set LD_LIBRARY_PATH to at least contain /usr/local/cuda/lib64:/usr/lib/nvidia-XXX where XXX in my case is 375.

logout to restart X or reboot

run ldd $( which glxinfo ) to make sure your GL libraries point to mesa, or run glmark2 to confirm GL status

(update) if the libGL printed from step 5 points to nvidia’s driver folder, you need to remove/rename the libGL.so*/libGLX.so*/libGLdispatch.so* under nvidia driver folder so that your OS can pick up the mesa libGL library.

run nvidia-smi to list your dedicated NVIDIA GPU, and run your CUDA program, you should not see any errors.

Thanks for the guide. Manage to use this to enable Gromacs utilizing NVIDIA GPUs.

YAFU · July 17, 2017, 5:48pm

The last time I had managed to run CUDA with intel iGPU as a primary display (intel - Power Saving Mode) on Linux was with driver 361.42. From that version the drivers are broken or you need to do very difficult workarounds like those described in this thread .
When does nvidia plan to fix this?

Still broken in 384.47. Kubuntu 16.04 64bits

EMCP · September 5, 2017, 11:32am

When you installed the NVIDIA driver, I see the option to disable opengl file installation … but did you also disable nouveau checks?? I imagine we want to keep nouveau installation if it is to render the screen, correct?

as of now… when doing just --disable-opengl-files , I am failing the pre-install script checks… other options I imagine I could use are

–no-nouveau-check

don’t check if nouveau is running

–disable-nouveau

if nouveau is detected, offer to disable it…

It was not clear if you followed all the modprobe instructions in the links you did… but in lieu of an answer from you I will try the blacklist modprobe approach

Update :

I tried installing latest 384 driver… I choose --no-opengl-files , installed cuda 8.0 and choose not to install 375…

when I got back out there was no /usr/lib/nvidia-XXX folder created … makes me think I will retry CUDA install with the 375 driver and recheck

EMCP · September 5, 2017, 3:02pm

[s]Omg, I don’t know how… but I ignored the 375 driver… it failed to install inside CUDA’s installer anyway…

I think the magic started with I hit nvidia modprode in the terminal… suddenly everything worked like magic… I’ve got glmark2 showing intel, meanwhile nvidia-smi is working too…

THANK YOU YAFU[/s]

Edit : I wrote a bit soon. After rebooting the system a few times, I no longer can get nvidia-smi to work … seems it worked fine for a while but rebooting it did something… I feel like I need a desktop with 2 nvidia gpus now and this is just not worth the hassle, but would be nice to understand why I went wrong…

one big difference is, I do not have the NVIDIA-XXXX folder that you and many other instructions cite… im thinking of retrying with the 375 driver instead of getting fancy with 384, and reinstalling cuda

nachitox · January 27, 2018, 3:39pm

An update, problem solved.

All I need to do is to add cuda driver’s path (in my case /usr/lib/nvidia-375) to the LD_LIBRARY_PATH.

A side problem is the libGL.* in nvidia driver will take priority over mesa libGL. The fix is to remove or rename/relocate the libGL.so*/libGLX.so*/libGLdispatch.so* to a different folder, or not to install them if you install from the .run file (I prefer to install from apt-get).

In summary, in order to make this to work, you need to

make sure you have enabled onboard graphics in the BIOS settings (or set it as primary)

install both xorg intel driver and nvidia/cuda drivers

start nvidia-settings, and go to the PRIME settings page, set Intel (Power Saving Mode) as default

modify your .bashrc and set LD_LIBRARY_PATH to at least contain /usr/local/cuda/lib64:/usr/lib/nvidia-XXX where XXX in my case is 375.

logout to restart X or reboot

run ldd $( which glxinfo ) to make sure your GL libraries point to mesa, or run glmark2 to confirm GL status

(update) if the libGL printed from step 5 points to nvidia’s driver folder, you need to remove/rename the libGL.so*/libGLX.so*/libGLdispatch.so* under nvidia driver folder so that your OS can pick up the mesa libGL library.

run nvidia-smi to list your dedicated NVIDIA GPU, and run your CUDA program, you should not see any errors.

This worked great!

But I have a problem with nvidia-settings. I’m trying to overclock my GTX 1070 running

sudo nvidia-settings -a GPUMemoryTransferRateOffset[3]=1300

but this is the output

ERROR: Error querying enabled displays on GPU 0 (Missing Extension).
ERROR: Error querying connected displays on GPU 0 (Missing Extension).
ERROR: Error resolving target specification '' (No targets match target specification), specified in assignment 'GPUMemoryTransferRateOffset[3]=1300'.

Any idea why?

joseph.r.crawford · April 12, 2018, 3:23pm

I was able to get the above working and run my displays via the internal Intel graphics. I was also able to use CUDA on the nvidia card for the purpose of running Tensorflow code.

However, I can’t seem to get docker to also work. When I run nvidia-smi I see:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P5000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P0    28W /  N/A |      0MiB / 16278MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

But if I try to run the GPU enabled tensorflow docker container I get:

$ docker run -d --ipc=host --runtime=nvidia tensorflow/tensorflow:latest-devel-gpu-py3
ac17804a4bf0744c8a3615897d613487c860e9ca06e5c04db8fcbd5ba8032e2e
docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=30067 /var/lib/docker/overlay2/9aa62f5fc81a92581ac2e233e578903e08aafc28e417fefeb4ee69055e6c1b2a/merged]\\nnvidia-container-cli: initialization error: driver error: failed to process request\\n\\"\"": unknown.

Is there a way to point docker to the right driver?

cjmcc · December 9, 2018, 7:27pm

Any updates on this? I am having the same issue as EMCP where there is no nvidia-XXX (410 in my case). I installed the nvidia drivers from the graphics ppa.