Recently converted a Windows 8.1 laptop to a dual-boot with Ubuntu 14.04 in order to work on some linux open source neural net software.
Need CUDA and thus an nvidia driver. Have it working except for two inconveniences.
Screen occasionally freezes. I’ve found that I can get the screen working again by 1) ctrl-alt-f1 to drop into a tty then 2) ctrl-alt-f7 to switch back to X - at which point the driver is responsive again.
The other problem is when I need to reboot. The nvidia driver built a new version of the kernel: 3.16.0-45-generic where the old one is 30. If I try to boot directly into 45, screen freezes before the logon screen. I boot first into 30 then restart into 45 at which point the logon screen comes up and I can login.
It’s nice to have found the workarounds and there’s not a need to use them frequently however it would be better to understand what’s going on and fix it.
“The nvidia driver built a new version of the kernel” is a little surprising. I’m not sure how this will be interpreted on this forum, but my best advice is to not use the .run file driver installer if at all possible. This is something that tends to trip up people who are coming to Linux from Windows, who are used to having separate installers for individuals drivers and programs, and it’s really not a sensible way of doing things when you a Linux package manager you can use instead. The only companies that even package Linux software with executable .run installers are generally those that maintain Windows code and are just used to doing it that way, and it’s almost always going to introduce more complications than necessary (like your new kernel).
Generally, Nvidia drivers, being proprietary, will not in the default set of sources configured by a given distro’s package manager, so you’ll need to add some additional repos. In Ubuntu, these come in the form of additional PPAs, which you can add via apt-get at the command line, or in the synaptic package manager. For Nvidia drivers in Ubuntu, there’s this new semi-official PPA, which was created specifically in response to too much confusion as to how they’re supposed to work:
As well as this other one I’ve been using for a while, that looks like it’s just been deprecated in favour of the new one:
I can see the argument for “I’d rather get something directly from the vendor rather than through some guy who happens to maintain a build pipeline for Ubuntu,” but trust me, Linux is all about the package manager. For example, I have no idea how to advise you on removing the driver version you installed via the .run file; if it were installed via the package manager, it’d be a simple apt-get remove xyz.
No I didn’t install from the .run package. Or rather I did and suffered the apparently predictable consequence and, shortly thereafter, re-installed Ubuntu. Live and learn.
I attempted many things before I got what’s working now. What finally did the trick was:
sudo apt-get install cuda-6.5
I tried cuda 7.0 and it didn’t work either (frozen/hung machine). So I thought I’d back up a version and 6.5 did work. More or less works with the inconveniences that I mentioned.
I’m currently running 346.82. I see that at the url you’ve provided that there are driver versions up to 355 and at some point, I’ll try out more recent versions.
rebuilt the kernel.
to be more exact, apt-get install fired builds created new initrd* and vmlinuz* files in /boot. I have the original version 30 group and, as I was mucking around trying to get cuda installed, a 45 group got built and rebuilt several times. Very long ago (decades) I had to rebuild the kernel to install drivers and I know that drivers, for the most part, now dynamically load and given what’s become the very approximate state of my linux systems knowledge I called new files in /boot rebuilding the kernel, which is possibly incorrect.
Yup, it’s just building modules against your current installed kernel(s) – which is something that any functioning install-from-package-manager script would do too. Even that’s not so typical these days given that the kernel itself contains pretty much all the GPL drivers you’d ever need; the only frequent exceptions I can think of are Virtualbox (which must be non-GPL or something?) and nvidia binary.
I made considerable progress by upgrading everything, in particular, reinstalling the OS with Ubuntu 15.04 instead of 14.04. I had previously to install cuda 6.5 I can now install cuda 7.0. Which results in nvidia driver 346.59.
All previous driver-related problems disappeared with the exception of one. I have two kernel versions (or what I maybe mistakenly identify as kernel versions) in /boot: 3.19.0.15 and 3.19.0.26. It’s 26 that has cuda and the nvidia driver installed. The problem is I can’t boot directly into 26. If I try to boot directly into 26, it hangs at the screen that shows just “Ubuntu” with 5 white/red dots below it. The dots are a progress bar of course.
What I have to do is use the advanced boot menu to boot into 15, once the login screen arrives I do a quick restart into 26 and it will then successfully proceed to the login screen.
So there’s a video driver hang prior to getting to the login screen. It seems to me that what I’m doing is resetting state somewhere when I temporarily backup to 15 which then allows a boot into 26. But state just where?
I’d like to capture the state from the first frozen boot into 26 and found a file /var/log/boot.log however that seems to be only the current boot.
Any other pointers to tools that would give me low-level info about a previous boot would be appreciated.
menomnon, can you press the escape key at the Ubuntu screen with the dots and if so, do you see any messages that indicate what’s going wrong? If you can get a login prompt that way (you may also need to press Ctrl-Alt-F1), please run “sudo nvidia-bug-report.sh” and attach the resulting nvidia-bug-report.log.gz file here.
Running “sudo nvidia-uninstall” should take care of that.
Can you get into the GRUB boot menu? I know Ubuntu makes that annoyingly difficult but I’m sure there’s a way. If you can interrupt the boot process there and edit the boot entry, you can try removing the “quiet” and “splash” options before booting the kernel. That might provide more information about what’s going wrong.
Successive reboots seem to overwrite /var/log/boot.log so I realized I could boot into the hang boot then reboot into Windows (this is a dualboot system) and since I have an extfs driver, I’d be able to save off bad files.
As it happens, /var/log/boot.log showed no differences between bad and good boots. So I thought: hmm, downstream of that? Which would probably mean X.
Went through various X files to see what I might find. Xorg.0.log may pinpoint the problem. In particular in the bad case you don’t get to the second of these two lines:
[ 3.595] (II) Module "ramdac" already built-in
[ 3.596] (II) intel(G0): Using Kernel Mode Setting driver: i915, version 1.6.0 20141121
It would appear that in the bad case things hang just after ramdac.
I’ll attach a zip file with two Xorg.0.log: one good, one bad.
Sounds like you have onboard Intel graphics that your machine might be trying to use instead or on top of the nVidia graphics? Might want to try disabling those in the BIOS/UEFI
Yes, it’s an Asus N550JV-DB71 and has 2 graphical systems: Intel and Nvidia. I’ll take a look to see if I can somehow turn off the Intel onboard graphics.
As best I can tell, both researching it on the web as well as going through the BIOS closely, I can’t simply turn off the Intel graphics. It was a nice idea. I’ll guess that it may have made the laptop more expensive had the two graphical systems been fully switchable in this sense.
So where does that leave me? I can continue to shall we say double-boot, that is, boot first into 3.19.0.15 and only then into 26. Tedious but I’ve pushed on it quite a bit and it seems reliable. But again, it’s telling me something about state change and if I can just run that down, I’ll be closer to a solution. The code for these drivers is open-sourced (I believe) and so conceivably I could build the driver for myself and figure out how to debug it.
Oh, it’s a notebook, eh? Right, I doubt you’d be able to outright disable one of the GPUs in that case. There’s a name for switchable Intel/nVidia setups – Bumblebee or Optimus or something – that you can read up on, and I’m sure someone maintains a Ubuntu PPA that will set everything up for you automatically. Good luck with that; it seems more complicated than necessary to me (my past several notebooks have all been exclusively Intel graphics because I don’t mind having also having a desktop to put a discrete GPU into), but it’s not that uncommon.
The code is not open sourced and it was probably naive to think it is. I believe this is one of the more important locations for nvidia linux drivers?
And (duh) the page is marked as proprietary drivers.
That said, there is driver source code on the machine under /usr/src/nvidia-346-346.59 and it looks like that at the most recent install this code may have built and been deployed from. I doubt though that this is the full driver and the more proprietary bits would be locked in libraries.
The driver remains an interesting research project but I think the things I should really be concerned with lie elsewhere. So for the time being not only is the machine a dualboot but the linux part of it is a double-boot. I may or may not have a revelation at some point as to what the state change is that occurs between 15 and 26.
Even if it isn’t open sourced I’d expect there to be a PPA that provides binary packages and/or automatic configuration scripts. Google around for Bumblebee PPA or something of the like.
I have the ppa config’ed on my system and so I can see it in apt-cache search.
I wondered: what is Bumblebee? Nvidia Optimus it seems and in particular Optimus is for laptops that have dual graphics systems - such as mine. Ah, that’s interesting - it’s precisely my dual graphics systems that the driver is stumbling upon currently.
But is the Bumblebee driver compatible with cuda 7.0? I’m googling now to that effect but this question is less obvious.
So it’s not a driver (probably) and I shouldn’t expect it to solve my driver problem.
It seems you can test it with the following command - (except it fails in the way shown)
MyId@MyId-N550JV:/usr/local/cuda-7.0/samples/1_Utilities/bandwidthTest$ optirun glxgears -info
[ 1825.473926] [ERROR]Cannot access secondary GPU, secondary X is not active.
[ 1825.473974] [ERROR]Aborting because fallback start is disabled.
So it is there (daemon) but it doesn’t work … but I’m not sure it’s what I need anyway.