Yesterday my users started reporting an error when running nvidia-smi:
Failed to initialize NVML: Driver/library version mismatch
Additionally, users report the following error when trying to run scripts:
kernel version 367.57.0 does not match DSO version 375.39.0
I see from my apt logs that yesterday morning an automated update installed 375.39 nvidia drivers. Apparently it was marked as a security update. Now nvidia-367 and nvidia-375 packages are both present in dpkg, but nvidia-367 is now described as “Transitional package for nvidia-375”.
i get this error (filed to initialize NVML) only when inside the nvidia-docker container, outside everything is fine.
Reboot did not help. Driver version 375.39 . Any hints?
This happened to us again today. I assume this is because there is something in memory which conflicts with what’s on disk because of the automatic package upgrade. It would be nice if we could correct what’s in memory without having to reboot the system, as these are servers with multiple users.