GPU fallen off the bus while running more demanding demos or games

When I try to run some more demanding games or demos, computers hangs up with “GPU fallen off the bus” error in kernel log. Especially Total War Warhammer from Steam hangs every time I try to run gameplay. Unigine Valley hangs after some time. Other games are hanging at random. Less demanding games, like Minecraft, are fine.

My machine is Eurocom Sky X7E2 laptop with GTX 1080. I have tried to open nvidia settings and xsensors with various temperatures when running games or demos. It seems, that cooling is not the problem here because it hangs with GPU temperature under or at 70 degrees Celsius.

I have created nvidia-bug-report through ssh after hang (it was possible, the system lives under crashed graphics).
nvidia-bug-report.log.gz (295 KB)

Which game/demo did you run before creating the bug report?
You already ruled out temperature problems from CPU/GPU. XID 79 might also result from insufficient power. According to a review, Eurocom designed the PSU (330W) to the maximum power draw of the system without reserves. So if the PSU is heating up efficiency goes down so it might not be able to support the gpu on power peaks. It’s just a guess but maybe clean it and make sure it gets enough airflow.
Then there are also XID 16 visible in your logs from previous boots, did those come from TW:WH or does that game also result in XID 79?

It was the mentioned TW:WH.

The laptop is almost new (bought less than three months ago). PSU is clean and has some moderate temperature. Laptop ventilation is also clean. I have a temperature of about 64 degrees celsius on pch_skylake sensor, however. I don’t know, if it is fine there.

XID 16 is not from any game, the problem with TW:WH is related to XID 79. I don’t know what caused that XID 16s.

I forgot to mention that I’am using it mostly for Blender Cycles rendering and it works fine. Maybe one strange hang in these three months, but I don’t remember it exactly. And it puts heavy load on GPU as well. One difference may be that there are no big data transfers during heavy load, whole scene is loaded into GPU memory before render starts.

One more thing, right now I noticed, that Firefox stuck for some time after I posted my last post. And that XID 16 appeared in kernel log:

[16168.908250] perf: interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[23435.807422] NVRM: GPU at PCI:0000:01:00: GPU-b1eb6e84-5bfd-82f6-e93f-1e22c22d0f69
[23435.807449] NVRM: GPU Board Serial Number: 
[23435.807454] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469584
[23445.023469] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469585
[23453.215581] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469586
[23461.407619] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469587
[23469.599697] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469588
[23477.791757] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 00469589
[23485.983859] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 0046958a
[23494.175919] NVRM: Xid (PCI:0000:01:00): 16, Head 00000000 Count 0046958b

There was no heavy load on the GPU in that time, only that firefox was opened.

I think the XID 16 are rather related to the known problems with WebGL and the 390.25 driver.

So it is probably not related to my main problem (that “fallen off the bus”).

Yes, I think it’s unrelated.
One thing you could try about the XID 79 would be to set powermizer from adaptive to maximum to see if switching states is part of the problem.

I have tried it, but it behaves strange. Nvidia settings are showing me maximal performance level in all cases (level 4). But fans are quiet without heavy load and become noisy as I run something on GPU, as expected. If I try to change mode to maximum performace, everything behaves it the same way. If I close settings a reopen them “Auto” mode is there. Even if I try to “Save Current Configuration”

It seems, that those settings aren’t “wired” to real GPU. And I have another problem, backlight control does not work for me. With similar symptoms: I can change backlight level from GUI, but it does not change the real backlight. Is possible that both problems are caused by some BIOS bugs? My local Eurocom seller wrote me, that for Windows the problem with backlight can be solved by some BIOS upgrade.

UPDATE> Tried to install Win10 and check 3D things there. Everything (TW:WH, Furmark, 3DMark) works there without any problem. So hardware is probably healthy and the problem is somewhere in Linux driver.

I think we are having the same issue. My post is at:

We are playing different games, but both are experiencing the same problem. This actually reminds me a lot about a problem with nvidia and Linux back in 2013/14. Although not the exact same problem, it also affected laptops and the error message was the exact same (GPU has fallen off the bus).

I hope someone from Nvidia can have a look at both your and mine bugreports.

As I mentioned in my post, I didn’t have this problem a few months back. So I am suspecting it might be a kernel update that has exposed this issue. In the coming days, I will see if I have the time to test older kernels to see if the problem appears there as well.

A good starting point to test for a driver regression would be 375.66 and kernel 4.9.x, after that a lot changed within the driver.

So it was a faulty GPU in my case as well. One year ago it ended up as “problem with Linux, unable to reproduce in Windows”. On Christmas holiday I had tried to play some game under Windows and experienced the same problem (at last, I was unable to reproduce it under windows before). Last week I received a replacement GPU and now it works without any problems.