As CUDA 6.5 is officially released, I started a fresh AWS EC2 g2.2xlarge instance, installed Ubuntu 14.04.1 LTS.
I fully upgraded the OS. I installed the 4 packages Ubuntu wanted to withdraw from upgrade (linux-virtual, linux-kernel-virtual, etc.)
Then I installed cuda using the official .deb. Now again, whatever I do (nvidia-modprobe, nvidia-smi), I get the error message
modprobe: ERROR: could not insert ‘nvidia_340’: Unknown symbol in module, or unknown parameter (see dmesg)
I thus checked dmesg, I found the cause was that drm.ko was missing. I googled on web but I don’t find any solution. CUDA 6.0 works well with Ubuntu 12.04 on AWS EC2 because the OS was able to launch both the nvidia and drm kernel modules.
I’m not even sure whether I should ask Ubuntu, AWS, or Nvidia for help.
It seems the runfile wants to access the kernel source, he gave me
The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.
If you know that the kernel source packages are installed and set up correctly, you may pass the location of the kernel source with the ‘–kernel-source-path’ flag.
I did sudo apt-get source linux-image-uname -r and it downloaded, unpacked the source into home/ubuntu/Downloads/linux-3.13.0
I ran again sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/home/ubuntu/Downloads/linux-3.13.0
It gave me the error above. Could you help me @txbob?
select kernel source, or kernel development, as one of the things you want to do when installing ubuntu. The kernel source packages have to be “installed and set up correctly” not just unpacked into a folder.
Alternatively, follow a proper method to install the kernel sources on ubuntu, like this:
@txbob, I checked out ubuntu-trusty repository, compiled all the flavours, but it seems that the kernel.h under the generated linux-headers-xxxx-generic was wrongly “ln”-ed to a missing kernel.h so the .run file cannot accept it.
Then I found if you apt-get install linux-headers, the files under /usr/src are acceptable for the .run file. I used that to run it.
It seems the compilation went OK. However during the installation it seems it first shuts down AppArmor then tried to invoke drm.ko again. I posted the question on Stack Overflow now
Actually it seems it doesn’t matter if you do .deb or .run, the thing is that drm invocation always fails. If you find a way to successfully install it, could you let me know? Thanks @txbob
@txbob it was a hell of experience but I solved it.
Right after the fresh launch of instance, ‘apt-get upgrade’ wanted to keep back 4 kernel packages as linux-image-virtual etc. I still installed them so that I got strictly nothing more to upgrade.
The problem is linux-image-virtual is a lean build without drm.ko. I did apt-get install linux-image-extra-virtual and installed CUDA with .deb (I reckon .deb and .run were similar, so did a test.)
At this point the issue remains.
Let’s compile a fresh kernel with built-in drm :
sudo apt-get build-dep linux-image-$(uname -r)
apt-get source linux-image-uname -r
cd linux-3.13.0
chmod a+x debian/scripts/*
chmod a+x debian/scripts/misc/*
fakeroot debian/rules clean
fakeroot debian/rules editconfigs
Edit the right conf for your architecture (default amd64 flavor)
Build drm.ko in kernel rather than as a module
In Devices > Graphics Support > [*] Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)
Then build kernel:
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic
If the build is successful, a set of three .deb binary package files will be produced in the directory above the build root directory :
cd …
ls *.deb
linux-headers-…_all.deb
linux-headers-…_amd64.deb
linux-image-…_amd64.deb
sudo dpkg -i linux*.deb
sudo reboot
apt-get -f install to deal with linux-cloud-tools missing dep
Verify it works :
sudo nvidia-smi
should display card and no running processes.