Since November, I have probably spent about 500+ hours (e.g. 80 hours, six times, on average) dealing with getting Ubuntu 16.04 running on three machines, each with multiple NVIDIA GPUs. This is perfectly understandable to go through once… well, no it isn’t. But let’s pretend it is. Especially if two of those three machines have different mobos.
But this is ridiculous. Many of the problems I have pursued were caused by, for example, shutting down the machine and then … turning it back on.
Sometimes the problem is caused by … wait for it … NVIDIA changing the device driver. By a point release. Which they pretty much have to do every time an A-title game comes out. According to the Linux community this is because NVIDIA doesn’t publish their source code on github. Except no software falls as flat as open-source community drivers for graphics and sound cards. And for some reason, all these point releases that make my Ubuntu “stable” release boot to a black screen don’t seem to have such a traumatic effect on my Windows 10 machine.
Back in late January I tried booting up a new machine with a couple of 1080Ti’s … after about 18 hours of pure futility it turned out that Ubuntu had pushed a new Linux kernel that didn’t work with anything, and NVIDIA had to push a beta driver that could deal with it.
Tonight, NVIDIA drivers aside, I’ve been up all night trying to make my cursor be not-invisible. It was not-invisible when I booted two days ago, but it’s not not-invisible now. I don’t recall anything this bad when Windows Vista came out, and that set a new standard for bad.
So, I can see that at some time it might have made sense to make CUDA development Linux-Mandatory. But now it’s just tormenting people who want to do Deep Learning or HPC. Use Containers you say? You mean, like Docker-CE containers? Like the ones that stopped working in January and still basically don’t?
SO, aside from the obvious question of whether CUDA development for DL and HPC can be migrated to Windows, which seems to be improving rather than deteriorating - unlike Linux - is there a Linux distro that doesn’t suffer from the capricious instability of Ubuntu?
Seriously, I really want a powerful gang of SIMD GPUs to work on. But truth be told, the capital spent acquiring four 1080Tis, four TitanX’s, and a Titan-V should have bought me a LOT of MFLOPs of research computing. The reality is that the same money, spent on high-end Intel CPUs running MKL would have gotten 10x or more effective GFLOPS - if not 100x - simply because the time spent to write the code that would suck up those GFLOPs would have been enormously higher. Instead, those GPUs have sat there most of the time doing nothing while I’m futzing with an OS whose developers seem dead determined to prevent GPUs from being used, unless their drivers are written by unpaid volunteers.
So, any recommendations? From here it looks like I’ve wasted about $15000 on hardware that is effectively not supported for HPC on a working OS. Worse, I have wasted many times that in opportunity cost of other things I could have been doing. I have wasted time and money relative to what I could have made working as a manager at McDonalds. And I say that as someone with many years’ experience in software (windows, unix, mac) and having built my own computers for 15+ years, and advanced degrees in the computing field.
Help. Or kill me. Just make the torture stop.
- B. Student