Thanks a lot for your reply!
Indeed this is a system with bbswitch / bumblebee.
To make sure this is not causing the issue, I have now removed the bumblebee-service from my init-system,
and bbswitch is also never loaded, i.e. I get no output from:
$ dmesg | grep bbs
$ lsmod | grep bbs
same for nvidia (since I do not load the driver / nvidia-persistenced).
lspci says:
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)
Subsystem: CLEVO/KAPOK Computer GM107M [GeForce GTX 960M]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 16
Region 0: [virtual] Memory at f5000000 (32-bit, non-prefetchable)
Region 1: Memory at e0000000 (64-bit, prefetchable)
Region 3: Memory at f0000000 (64-bit, prefetchable)
Region 5: I/O ports at e000
[virtual] Expansion ROM at f6000000 [disabled]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Kernel modules: nvidia_drm, nvidia
So the card is in D0 state.
Now:
$ modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device
$ dmesg
...
[ 309.870280] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[ 309.870408] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139b)
NVRM: installed in this system is not supported by the 370.28
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 309.870465] nvidia: probe of 0000:01:00.0 failed with error -1
[ 309.870508] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[ 309.870530] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 309.870531] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 309.870533] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
[ 309.870714] NVRM: NVIDIA init module failed!
Other ideas?
This is a Gentoo Linux with kernel 4.8.0, and some months ago (with older kernels and drivers) I know the bumblebee-setup worked once. Sadly I did not write down the exact versions with which it worked…
EDIT: Just to confirm, after loading bbswitch manually after that test (with -1 options, so it should not touch card power state), and retrying to load nvidia, I get:
[ 300.875227] bbswitch: version 0.8
[ 300.875231] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
[ 300.875235] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.PEG0.PEGP
[ 300.875244] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[ 300.875322] bbswitch: detected an Optimus _DSM function
[ 301.331999] pci 0000:01:00.0: enabling device (0000 -> 0003)
[ 301.332040] bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
[ 311.763633] NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:139b)
NVRM: installed in this system is not supported by the 370.28
NVRM: NVIDIA Linux driver release. Please see 'Appendix
NVRM: A - Supported NVIDIA GPU Products' in this release's
NVRM: README, available on the Linux driver download page
NVRM: at www.nvidia.com.
[ 311.763641] nvidia: probe of 0000:01:00.0 failed with error -1
[ 311.763666] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[ 311.763688] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 311.763688] NVRM: None of the NVIDIA graphics adapters were initialized!
[ 311.763690] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244
[ 311.763781] NVRM: NVIDIA init module failed!