390.42 + Centos7.4(3.10.0-693.21.1.el7.x86_64). nvidia-smi gives "No devices were found"

Some outputs:

# dmesg
[  301.390494] nvidia 0000:01:00.0: irq 145 for MSI/MSI-X
[  301.392867] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392892] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392905] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392923] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392935] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392956] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  301.392968] ACPI Warning: \_SB_.PCI0.RP01.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20130517/nsarguments-95)
[  309.157112] NVRM: failed to copy vbios to system memory.
[  309.157661] NVRM: RmInitAdapter failed! (0x30:0xffff:664)
[  309.157727] NVRM: rm_init_adapter failed for device bearing minor number 0

excerpt from strace:

open("/proc/driver/nvidia/params", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a32fe0000
read(4, "Mobile: 4294967295\nResmanDebugLe"..., 1024) = 599
close(4)                                = 0
munmap(0x7f9a32fe0000, 4096)            = 0
stat("/dev/nvidia0", {st_mode=S_IFCHR|0666, st_rdev=makedev(195, 0), ...}) = 0
open("/dev/nvidia0", O_RDWR)            = -1 EIO (Input/output error)
ioctl(3, _IOC(_IOC_READ|_IOC_WRITE, 0x46, 0xd1, 0x0c), 0x7ffd8bdf6e00) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 0), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a32fe0000
write(1, "No devices were found\n", 22No devices were found
) = 22
# cat /proc/driver/nvidia/params
Mobile: 4294967295
ResmanDebugLevel: 4294967295
RmLogonRC: 1
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 0
DeviceFileMode: 438
UpdateMemoryTypes: 4294967295
InitializeSystemMemoryAllocations: 1
UsePageAttributeTable: 4294967295
EnableMSI: 1
MapRegistersEarly: 0
RegisterForACPIEvents: 1
CheckPCIConfigSpace: 1
EnablePCIeGen3: 0
MemoryPoolSize: 0
IgnoreMMIOCheck: 0
TCEBypassMode: 0
UseThreadedInterrupts: 1
EnableStreamMemOPs: 0
EnableBacklightHandler: 0
EnableUserNUMAManagement: 1
EnableIBMNPURelaxedOrderingMode: 0
RegistryDwords: ""
RegistryDwordsPerDevice: ""
RmMsg: ""
AssignGpus: ""
# cat /proc/driver/nvidia/gpus/0000\:01\:00.0/information 
Model:           GeForce MX150
IRQ:             145
GPU UUID:        GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:        PCIe
DMA Size:        47 bits
DMA Mask:        0x7fffffffffff
Bus Location:    0000:01:00.0
Device Minor:    0
# ls -l /dev/nvidia*
crw-rw-rw-. 1 root root 195,   0 Mar 24 03:10 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Mar 24 03:10 /dev/nvidiactl
# lsmod | grep nvidia
nvidia_drm             39700  0 
nvidia_modeset       1104417  1 nvidia_drm
nvidia              14337655  1 nvidia_modeset
ipmi_msghandler        46608  2 ipmi_devintf,nvidia
drm_kms_helper        163265  2 i915,nvidia_drm
drm                   370825  5 i915,drm_kms_helper,nvidia_drm
i2c_core               40756  10 drm,i915,i2c_i801,i2c_hid,i2c_designware_core,i2c_designware_platform,drm_kms_helper,i2c_algo_bit,nvidia,videodev

No nuveau driver. All following outputs are empty

# lsmod | grep nuve
# rpm -qa | grep nuve
# lsinitrd /boot/initramfs-`uname -r`.img | grep nuve

Security settings

# getenforce 
Permissive
# mokutil --sb-state
SecureBoot disabled

I tried to roll back to 3.10.0-693.17.1 and to install different versions of drivers: 390.25 and 384.111, but result is the same. Am I missing something?

Any help would be appreciated.

nvidia-bug-report.log.gz (74.2 KB)

Some users previously had the same error, no solution
[url]https://devtalk.nvidia.com/default/topic/1028980/?comment=5236690[/url]
Did you have a working configuration previously? If so, which kernel/driver version?

Yeah, I saw this thread and tried all tips from there (without any result), but thought it was different issue, as other card is mentioned.
My device is completely new one. Arrived yesterday. Worked on pre-installed Windows, I assume, but I didn’t do intensive testing.

Please run nvidia-bug-report.sh as root and attach the resulting tar.gz file to your post. Hovering the mouse over an existing post will reveal a paperclip icon.

Thanks for the tip. Attached nvidia-bug-report.log.gz

Nothing obvious, two kernel parameters you could try:
iommu=off
and
nvidia-drm.modeset=1

Thanks for the tip, but I fixed it switching to Fedora.
As I am using Centos for a long time in production serving clusters with Nvidia GPUs I was sure I will be able to make another Nvidia GPU work on my laptop. But apparently consumers’ GPU is not like Teslas. I could use kernel-rt from external repos for centos, I think, but decided to switch fully.

So now I have my own sandbox:

2018-03-26 15:45:04.757354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0)

Care to tell which kernel and gcc version you’re running now to have some more info for other users hitting the same issue?

Yes, sure

$ uname -r
4.15.10-300.fc27.x86_64

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)

everyting goes out of the box
Only things I was needed to install to meet drivers requirements are elfutils-libelf-devel and libglvnd*