reproducible: NVRM: GPU at 0000:01:00.0 has fallen off the bus. -- Both screens black, Xorg at 100%

What is the probability of getting a useful suggestion after posting the requested log files?

1.00000000000000000
Depends on the quantum flux in GPU 0.
Slim to nothin.
What are log files?
What are suggestions?
My laptop has an external monitor attached: ViewSonic VX2439 as DFP-1 The screens both go black simultaneously. Often this seems to be triggered by the use of the scroll button on the mouse. I am not able to reproduce the issue at will. Unlike other reports, I do not see the mouse cursor after the problem occurs. The system does not respond to Ctrl-Alt-Fnum, so I am unable to debug from a console vTTY. I am able to SSH into the system afterward. When I upgraded to Ubuntu 12.04, I was using the "current" drivers from Canonical, ie. 295.40. However, those did not support my external monitor well. So I switched to the 310.32 linux drivers from Nvidia. That is when I first encountered this issue with both screens going blank. At that point, I decided to drop back to the "experimental" nvidia drivers from Canonical, 310.14, where it stands today. Still seeing the problem on 310.14. The Xorg log file shows: [ 60.812] (EE) NVIDIA(GPU-0): Failed detecting connected display devices [ 68.880] [mi] EQ overflowing. Additional events will be discarded until existing events are processed. [ 68.880] Backtrace: [ 68.968] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7ff5441a59e6] [ 68.968] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7ff5441860c3] [ 68.968] 2: /usr/bin/X (0x7ff54401d000+0x62a34) [0x7ff54407fa34] [ 68.968] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7ff53c576000+0x5d88) [0x7ff53c57bd88] [ 68.968] 4: /usr/bin/X (0x7ff54401d000+0x8af37) [0x7ff5440a7f37] [ 68.968] 5: /usr/bin/X (0x7ff54401d000+0xb0d3a) [0x7ff5440cdd3a] [ 68.968] 6: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7ff543343000+0xfcb0) [0x7ff543352cb0] [ 68.968] 7: (vdso) (0x7fff4af8f000+0x7dc) [0x7fff4af8f7dc] [ 68.968] 8: (vdso) (__vdso_gettimeofday+0x2b) [0x7fff4af8fa1b] [ 68.968] 9: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xed1fe) [0x7ff53cf281fe] [ 68.968] 10: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x7c1ae) [0x7ff53ceb71ae] [ 68.968] 11: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xf3ce6) [0x7ff53cf2ece6] [ 68.968] 12: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0952) [0x7ff53d2db952] [ 68.968] 13: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0bcb) [0x7ff53d2dbbcb] [ 68.968] 14: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49d03a) [0x7ff53d2d803a] [ 68.968] 15: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49c584) [0x7ff53d2d7584] [ 68.968] 16: /usr/bin/X (0x7ff54401d000+0xcd011) [0x7ff5440ea011] [ 68.968] 17: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x476b07) [0x7ff53d2b1b07] [ 68.968] 18: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6800) [0x7ff53cef1800] [ 68.968] 19: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6d95) [0x7ff53cef1d95] [ 68.968] 20: /usr/bin/X (xf86Wakeup+0x192) [0x7ff5440a86f2] [ 68.968] 21: /usr/bin/X (WakeupHandler+0x6b) [0x7ff54406f7eb] [ 68.968] 22: /usr/bin/X (WaitForSomething+0x1b6) [0x7ff5441a2de6] [ 68.968] 23: /usr/bin/X (0x7ff54401d000+0x4e5f2) [0x7ff54406b5f2] [ 68.968] 24: /usr/bin/X (0x7ff54401d000+0x3d7ba) [0x7ff54405a7ba] [ 68.968] 25: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xed) [0x7ff5421d476d] [ 68.968] 26: /usr/bin/X (0x7ff54401d000+0x3daad) [0x7ff54405aaad] [ 68.968] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack. [ 68.968] [mi] mieq is *NOT* the cause. It is a victim.
My laptop has an external monitor attached: ViewSonic VX2439 as DFP-1

The screens both go black simultaneously. Often this seems to be triggered
by the use of the scroll button on the mouse. I am not able to reproduce the
issue at will. Unlike other reports, I do not see the mouse cursor after the
problem occurs. The system does not respond to Ctrl-Alt-Fnum, so I am unable
to debug from a console vTTY. I am able to SSH into the system afterward.

When I upgraded to Ubuntu 12.04, I was using the "current" drivers from Canonical,
ie. 295.40. However, those did not support my external monitor well. So I switched
to the 310.32 linux drivers from Nvidia. That is when I first encountered this
issue with both screens going blank. At that point, I decided to drop back to the
"experimental" nvidia drivers from Canonical, 310.14, where it stands today. Still
seeing the problem on 310.14.


The Xorg log file shows:

[ 60.812] (EE) NVIDIA(GPU-0): Failed detecting connected display devices
[ 68.880] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 68.880]
Backtrace:
[ 68.968] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7ff5441a59e6]
[ 68.968] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7ff5441860c3]
[ 68.968] 2: /usr/bin/X (0x7ff54401d000+0x62a34) [0x7ff54407fa34]
[ 68.968] 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7ff53c576000+0x5d88) [0x7ff53c57bd88]
[ 68.968] 4: /usr/bin/X (0x7ff54401d000+0x8af37) [0x7ff5440a7f37]
[ 68.968] 5: /usr/bin/X (0x7ff54401d000+0xb0d3a) [0x7ff5440cdd3a]
[ 68.968] 6: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7ff543343000+0xfcb0) [0x7ff543352cb0]
[ 68.968] 7: (vdso) (0x7fff4af8f000+0x7dc) [0x7fff4af8f7dc]
[ 68.968] 8: (vdso) (__vdso_gettimeofday+0x2b) [0x7fff4af8fa1b]
[ 68.968] 9: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xed1fe) [0x7ff53cf281fe]
[ 68.968] 10: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x7c1ae) [0x7ff53ceb71ae]
[ 68.968] 11: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xf3ce6) [0x7ff53cf2ece6]
[ 68.968] 12: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0952) [0x7ff53d2db952]
[ 68.968] 13: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x4a0bcb) [0x7ff53d2dbbcb]
[ 68.968] 14: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49d03a) [0x7ff53d2d803a]
[ 68.968] 15: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x49c584) [0x7ff53d2d7584]
[ 68.968] 16: /usr/bin/X (0x7ff54401d000+0xcd011) [0x7ff5440ea011]
[ 68.968] 17: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0x476b07) [0x7ff53d2b1b07]
[ 68.968] 18: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6800) [0x7ff53cef1800]
[ 68.968] 19: /usr/lib/x86_64-linux-gnu/xorg/extra-modules/nvidia_drv.so (0x7ff53ce3b000+0xb6d95) [0x7ff53cef1d95]
[ 68.968] 20: /usr/bin/X (xf86Wakeup+0x192) [0x7ff5440a86f2]
[ 68.968] 21: /usr/bin/X (WakeupHandler+0x6b) [0x7ff54406f7eb]
[ 68.968] 22: /usr/bin/X (WaitForSomething+0x1b6) [0x7ff5441a2de6]
[ 68.968] 23: /usr/bin/X (0x7ff54401d000+0x4e5f2) [0x7ff54406b5f2]
[ 68.968] 24: /usr/bin/X (0x7ff54401d000+0x3d7ba) [0x7ff54405a7ba]
[ 68.968] 25: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xed) [0x7ff5421d476d]
[ 68.968] 26: /usr/bin/X (0x7ff54401d000+0x3daad) [0x7ff54405aaad]
[ 68.968] [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
[ 68.968] [mi] mieq is *NOT* the cause. It is a victim.

#1
Posted 03/28/2013 01:52 PM   
More information... My GPU is a Quadro FX 2800M with VBIOS 62.92.a2.00.09. My problem seems most like this one. [url]https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/[/url]
More information...

My GPU is a Quadro FX 2800M with VBIOS 62.92.a2.00.09.

My problem seems most like this one.

https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/

#2
Posted 03/28/2013 01:55 PM   
nvidia-bug-report.log.gz and nvidia-installer.log attached to first post...
nvidia-bug-report.log.gz and nvidia-installer.log attached to first post...

#3
Posted 03/28/2013 01:57 PM   
This "topic" was originally posted on 2013-May-28, making it right at 14 days old now. No reply... This in spite of having done my level best to follow Arron's instructions for constructively reporting on issue.
This "topic" was originally posted on 2013-May-28, making it right at 14 days old now.
No reply... This in spite of having done my level best to follow Arron's instructions
for constructively reporting on issue.

#4
Posted 04/11/2013 10:26 PM   
Apparently there are three other topics and four other forum users with a very similar behavior pattern and/or Xorg.0.log entries. Ahktenzero https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/ Khertan and Franster https://devtalk.nvidia.com/default/topic/524502/linux/frequent-freeze-crash-of-xorg-with-drivers-310-19-with-gts-250-on-3-2-0-4-amd64/ KenPDX https://devtalk.nvidia.com/default/topic/534892/linux/x-freeze-with-eq-overflows/ So, if five of us on varying hardware bothered to login and report the issue, how many others are searching for answers (without the help of the forum search widget) and not finding any? Regards, Cryptor
Apparently there are three other topics and four other forum users with a very
similar behavior pattern and/or Xorg.0.log entries.

Ahktenzero

https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/


Khertan and Franster

https://devtalk.nvidia.com/default/topic/524502/linux/frequent-freeze-crash-of-xorg-with-drivers-310-19-with-gts-250-on-3-2-0-4-amd64/


KenPDX

https://devtalk.nvidia.com/default/topic/534892/linux/x-freeze-with-eq-overflows/



So, if five of us on varying hardware bothered to login and report the issue, how many others
are searching for answers (without the help of the forum search widget) and not finding any?


Regards,

Cryptor

#5
Posted 04/11/2013 10:49 PM   
Xorg hang/freeze yesterday with the same mieq overflow reported in Xorg.0.log. I have dropped back to nvidia 304.88, which is now the recommended nvidia driver for Ubuntu 12.04. Not very hopeful this will be a fix because ahktenzero reported going back to 304.64 and still having the problem. Post #5 here: [url]https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/[/url] This in spite of the 304.51 release notes: "Fixed a bug that caused the X server to sometimes hang in response to input events." [url]http://www.nvidia.com/object/linux-display-amd64-304.51-driver[/url] Sounds a lot like the mieq overflow issue, but apparently not the same. Cryptor
Xorg hang/freeze yesterday with the same mieq overflow reported in Xorg.0.log.

I have dropped back to nvidia 304.88, which is now the recommended nvidia driver
for Ubuntu 12.04. Not very hopeful this will be a fix because ahktenzero reported
going back to 304.64 and still having the problem.

Post #5 here:
https://devtalk.nvidia.com/default/topic/535519/linux/x-hangs-using-100-cpu-wait-and-mieq-overflowing-errors-in-logs/

This in spite of the 304.51 release notes:

"Fixed a bug that caused the X server to sometimes hang in response to input events."
http://www.nvidia.com/object/linux-display-amd64-304.51-driver

Sounds a lot like the mieq overflow issue, but apparently not the same.


Cryptor

#6
Posted 04/12/2013 10:13 PM   
Approx. 2 weeks later. So far no hang (no black screens) on 304.88... I should also mention that I switched to a much simpler "xorg.conf" two weeks ago. The previous xorg.conf referred to DFP-0 and DFP-1 with layout settings. I renamed that one, booted into failsafe graphics and then ran the nvidia settings to generate a new, default xorg.conf. This was after switching to the recommended 304.88 driver (via System Settings -- Additional Drivers). Here is the new xorg.conf that has been stable with 304.88 for two weeks on my box. Section "Screen" Identifier "Default Screen" DefaultDepth 24 EndSection Section "Module" Load "glx" EndSection Section "Device" Identifier "Default Device" Driver "nvidia" Option "NoLogo" "True" EndSection From the looks of it, the Xorg server is now having to find the screens and determine the layout on its own or through the nvidia driver. Does not seem to be related to the problem at hand... Cryptor
Approx. 2 weeks later. So far no hang (no black screens) on 304.88...

I should also mention that I switched to a much simpler "xorg.conf" two weeks ago.
The previous xorg.conf referred to DFP-0 and DFP-1 with layout settings. I renamed that
one, booted into failsafe graphics and then ran the nvidia settings to generate a new,
default xorg.conf. This was after switching to the recommended 304.88 driver (via
System Settings -- Additional Drivers).

Here is the new xorg.conf that has been stable with 304.88 for two weeks on my box.

Section "Screen"
Identifier "Default Screen"
DefaultDepth 24
EndSection

Section "Module"
Load "glx"
EndSection

Section "Device"
Identifier "Default Device"
Driver "nvidia"
Option "NoLogo" "True"
EndSection


From the looks of it, the Xorg server is now having to find the screens and determine
the layout on its own or through the nvidia driver. Does not seem to be related to the
problem at hand...


Cryptor

#7
Posted 04/25/2013 08:06 PM   
As expected, the Xorg hang with both screens black is still present with NVIDIA drivers 304.88. Latest nvidia-bug-report.log.gz attached... Cryptor
As expected, the Xorg hang with both screens black is still present with NVIDIA drivers 304.88.
Latest nvidia-bug-report.log.gz attached...


Cryptor

#8
Posted 04/28/2013 05:05 PM   
@Cryptor - I see you're on Ubuntu, I originally posted a bug for this on Launchpad back in November 2012 but no joy there either... https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/1077616
@Cryptor - I see you're on Ubuntu, I originally posted a bug for this on Launchpad back in November 2012 but no joy there either...

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/1077616

#9
Posted 05/30/2013 08:38 PM   
After switching to 304.88, which is currently [Recommended] for Ubuntu 12.04, I have not seen many "EQ overflowing" type freezes. My impression is that either not running VMWare Workstation or keeping it maximized on a separate virtual desktop improves the odds. Of course, that is not a scientific observation in any way, shape or form. However, a recent kernel update rendered the system fairly unstable. Generally, I would get both screens black either just after login or within an hour or so. I was still able to SSH into the system and run terminal commands, but there was nothing on the displays and no access to the console pseudo-ttys. In this situation, Xorg.0.log did not show the "EQ overflowing" error. It seems to show no error until after the problem has occurred. [ 76.074] (**) NVIDIA(0): device ViewSonic VX2439 Series (DFP-1) (Using EDID [ 76.074] (**) NVIDIA(0): frequencies has been enabled on all display devices.) [ 8654.964] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000f518) [ 8661.964] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x0000f518) [ 8661.964] (EE) NVIDIA(GPU-0): Failed detecting connected display devices [ 8672.967] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x0000f570) I'll try to attach the latest "nvidia-bug-report.log.gz"... A little searching led me to this thread. [url]http://ubuntuforums.org/showthread.php?t=2165400[/url] It turns out that the scripts to relink the NVIDIA drivers with the new kernel are not quite bulletproof. So, sometimes the following can help after a kernel upgrade. $ sudo dpkg-reconfigure nvidia-current-updates or $ sudo dpkg-reconfigure nvidia-current I hope that helps someone with a system that worked (better) before they ran software updates... Cryptor
After switching to 304.88, which is currently [Recommended] for Ubuntu 12.04, I have
not seen many "EQ overflowing" type freezes. My impression is that either not running
VMWare Workstation or keeping it maximized on a separate virtual desktop improves the
odds. Of course, that is not a scientific observation in any way, shape or form.

However, a recent kernel update rendered the system fairly unstable. Generally, I
would get both screens black either just after login or within an hour or so. I was
still able to SSH into the system and run terminal commands, but there was nothing
on the displays and no access to the console pseudo-ttys. In this situation, Xorg.0.log
did not show the "EQ overflowing" error. It seems to show no error until after the
problem has occurred.

[ 76.074] (**) NVIDIA(0): device ViewSonic VX2439 Series (DFP-1) (Using EDID
[ 76.074] (**) NVIDIA(0): frequencies has been enabled on all display devices.)
[ 8654.964] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000f518)
[ 8661.964] (WW) NVIDIA(0): WAIT (1, 6, 0x8000, 0xdfff2fff, 0x0000f518)
[ 8661.964] (EE) NVIDIA(GPU-0): Failed detecting connected display devices
[ 8672.967] (WW) NVIDIA(0): WAIT (2, 7, 0x8000, 0xdfff2fff, 0x0000f570)

I'll try to attach the latest "nvidia-bug-report.log.gz"...


A little searching led me to this thread.

http://ubuntuforums.org/showthread.php?t=2165400

It turns out that the scripts to relink the NVIDIA drivers with the new kernel are not
quite bulletproof.

So, sometimes the following can help after a kernel upgrade.


$ sudo dpkg-reconfigure nvidia-current-updates

or

$ sudo dpkg-reconfigure nvidia-current


I hope that helps someone with a system that worked (better) before they ran software
updates...



Cryptor

#10
Posted 08/08/2013 02:28 PM   
The saga continues... After the "sudo dpkg-reconfigure nvidia-current-updates" brought a some semblance of stability back yesterday (see previous), I now seem to be getting "[mi] EQ overflowing" again. Today I find out that the error: [mi] EQ overflowing can be caused by many, many issues in the Xorg server. That being the case, the root cause for today's crash could conceivably be different from the earlier crashes. So, it seems that my latest particular flavor is given in "dmesg" or /var/log/syslog as: [99101.294734] NVRM: GPU at 0000:01:00.0 has fallen off the bus. [99101.294742] NVRM: GPU at 0000:01:00.0 has fallen off the bus. There is discussion of a somewhat similar issue here: [url]https://devtalk.nvidia.com/default/topic/567297/linux/linux-3-10-driver-crash/1[/url] However, I'm on Ubuntu vs Arch and my kernel is different: Linux cryptor-m6500 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Here is another possibly related thread. [url]http://www.nvnews.net/vbulletin/showthread.php?p=2571522[/url] Although, they seem to be focused on games running on Wine. That is a different workload from mine. Frankly, given the various topics that match a search on this new error message, it appears that it might not be completely unique to a root cause either. As far as mitigation goes, I am going to enable persistence in the GPU and switch to "Ubuntu 2D" rather than the 3D session. I believe that Ubuntu 2D will disable "compiz", which may provide a short-term workaround. My unscientific guess is that it will cause less work in the GPU and possibly avoid the scenario that invokes the root cause. [Waves hands in the air...] Here is one of the links suggesting that persistence be enabled. I will caution that several posts indicate that persistence did not help. YMMV [url]http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/[/url] In summary: - still getting random Xorg crashes with both screens black - Xorg goes to 100% CPU - can still login via SSH afterward - still using ViewSonic VX2439 as a second, external monitor along with the LCD on the laptop - not running any games, mostly just VMWare Workstation, Chrome, Firefox and Evolution I understand that there are later drivers published for the "Quadro FX 2800M". I did try some of those initially. However, I am still without any comment or reply or suggestion from "sandipt" or "aplattner" or anyone else at NVidia. Cryptor
The saga continues...

After the "sudo dpkg-reconfigure nvidia-current-updates" brought a some semblance
of stability back yesterday (see previous), I now seem to be getting "[mi] EQ overflowing"
again.

Today I find out that the error:

[mi] EQ overflowing

can be caused by many, many issues in the Xorg server. That being the case,
the root cause for today's crash could conceivably be different from the earlier
crashes.

So, it seems that my latest particular flavor is given in "dmesg" or /var/log/syslog as:

[99101.294734] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[99101.294742] NVRM: GPU at 0000:01:00.0 has fallen off the bus.


There is discussion of a somewhat similar issue here:

https://devtalk.nvidia.com/default/topic/567297/linux/linux-3-10-driver-crash/1

However, I'm on Ubuntu vs Arch and my kernel is different:

Linux cryptor-m6500 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Here is another possibly related thread.

http://www.nvnews.net/vbulletin/showthread.php?p=2571522

Although, they seem to be focused on games running on Wine. That is a different
workload from mine.

Frankly, given the various topics that match a search on this new error message,
it appears that it might not be completely unique to a root cause either.


As far as mitigation goes, I am going to enable persistence in the GPU and switch to
"Ubuntu 2D" rather than the 3D session. I believe that Ubuntu 2D will disable
"compiz", which may provide a short-term workaround. My unscientific guess is that
it will cause less work in the GPU and possibly avoid the scenario that invokes the
root cause. [Waves hands in the air...]

Here is one of the links suggesting that persistence be enabled. I will caution
that several posts indicate that persistence did not help. YMMV

http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/


In summary:

- still getting random Xorg crashes with both screens black
- Xorg goes to 100% CPU
- can still login via SSH afterward
- still using ViewSonic VX2439 as a second, external monitor along with the LCD on the laptop
- not running any games, mostly just VMWare Workstation, Chrome, Firefox and Evolution

I understand that there are later drivers published for the "Quadro FX 2800M". I did try some
of those initially. However, I am still without any comment or reply or suggestion from
"sandipt" or "aplattner" or anyone else at NVidia.


Cryptor

#11
Posted 08/09/2013 10:47 PM   
Both screens black again. This time I was scrolling an Xterm up and down with the scrollbar widget. Seems like scrolling makes it more likely on my box. At any rate, note this from dmesg and Xorg.0.log... dmesg [71950.826197] NVRM: GPU at 0000:01:00.0 has fallen off the bus. [71950.826208] NVRM: GPU at 0000:01:00.0 has fallen off the bus. /var/log/Xorg.0.log [ 72072.630] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000e41c) [ 72076.202] [mi] EQ overflowing. Additional events will be discarded until existing events are processed. [ 72076.202] Backtrace: [ 72076.257] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7f4df0d939e6] [ 72076.257] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7f4df0d740c3] Cryptor
Both screens black again.
This time I was scrolling an Xterm up and down with the scrollbar widget.
Seems like scrolling makes it more likely on my box.

At any rate, note this from dmesg and Xorg.0.log...


dmesg
[71950.826197] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[71950.826208] NVRM: GPU at 0000:01:00.0 has fallen off the bus.


/var/log/Xorg.0.log
[ 72072.630] (WW) NVIDIA(0): WAIT (2, 6, 0x8000, 0xdfff2fff, 0x0000e41c)
[ 72076.202] [mi] EQ overflowing. Additional events will be discarded until existing events are processed.
[ 72076.202]
Backtrace:
[ 72076.257] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7f4df0d939e6]
[ 72076.257] 1: /usr/bin/X (mieqEnqueue+0x263) [0x7f4df0d740c3]



Cryptor

#12
Posted 08/12/2013 04:56 PM   
I seem to have stumbled on a fairly quick way to generate or reproduce the "NVRM: GPU at 0000:01:00.0 has fallen off the bus." error on my system. I login to Ubuntu (either 3D/compiz or Ubuntu 2D) and then open a "Gnome Terminal". On my system, this terminal has 50 lines. $ echo $LINES 50 This terminal can be located on either my X screen primary display (external DFP-1) on on my non-primary display (internal LGD, DFP-0). Now, I create some listings usually about 1000 lines or so. $ ls -alt ~ $ ls -alt / $ ls -alt /usr/lib At this point, I have a scrollbar widget that pan back and forth through the listings. If I scroll rapidly back and forth through these listings (by dragging the scroller widget vigorously up and down) in the gnome terminal window for 3 or 4 minutes, I will always get the black out and frozen X session. Sometimes it happens in as little as 30 seconds, but usually it takes a couple minutes. BTW, I have enabled persistence as recommended elsewhere and I have been using Ubuntu 2D. Still the problem persists on my M6500 laptop with Quadro FX 2800M GPU. Ubuntu 12.04 Linux box 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux NVIDIA Driver Version: 304.88 [Additional Drivers: version current-updates] Quadro FX 2800M (GPU 0) Two displays: ViewSonic VX2439 Series (DFP-1), LGD (DFP-0)
I seem to have stumbled on a fairly quick way to generate or reproduce the

"NVRM: GPU at 0000:01:00.0 has fallen off the bus."

error on my system.

I login to Ubuntu (either 3D/compiz or Ubuntu 2D) and then open a "Gnome Terminal".
On my system, this terminal has 50 lines.

$ echo $LINES
50

This terminal can be located on either my X screen primary display (external DFP-1) on on my
non-primary display (internal LGD, DFP-0).

Now, I create some listings usually about 1000 lines or so.

$ ls -alt ~
$ ls -alt /
$ ls -alt /usr/lib

At this point, I have a scrollbar widget that pan back and forth through the listings.
If I scroll rapidly back and forth through these listings (by dragging the scroller widget
vigorously up and down) in the gnome terminal window for 3 or 4 minutes, I will always get
the black out and frozen X session. Sometimes it happens in as little as 30 seconds, but
usually it takes a couple minutes.

BTW, I have enabled persistence as recommended elsewhere and I have been using Ubuntu 2D. Still
the problem persists on my M6500 laptop with Quadro FX 2800M GPU.

Ubuntu 12.04
Linux box 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA Driver Version: 304.88 [Additional Drivers: version current-updates]
Quadro FX 2800M (GPU 0)
Two displays: ViewSonic VX2439 Series (DFP-1), LGD (DFP-0)

#13
Posted 08/12/2013 07:11 PM   
GPU fell off the bus again today right after a cold boot. Temp does not seem to be the problem because I checked and it was 37 C just before I ran the test. Both screens went black within 5 seconds that time. Typically, the GPU temp is around 51 C, which is one bar into the yellow region on the NVIDIA settings widget. Also have "UseEvents" set to "false" now and that is not preventing the issue either. Section "Device" Identifier "Device0" Option "UseEvents" "false" EndSection So, does anyone have any idea what *does* cause the GPU to fall off the bus? Or does anyone have any suggestions for what logs to capture that might illuminate the root cause?
GPU fell off the bus again today right after a cold boot. Temp does not seem to be the
problem because I checked and it was 37 C just before I ran the test. Both screens went
black within 5 seconds that time.

Typically, the GPU temp is around 51 C, which is one bar into the yellow region on the
NVIDIA settings widget.

Also have "UseEvents" set to "false" now and that is not preventing the issue either.

Section "Device"
Identifier "Device0"
Option "UseEvents" "false"
EndSection

So, does anyone have any idea what *does* cause the GPU to fall off the bus?

Or does anyone have any suggestions for what logs to capture that might illuminate
the root cause?

#14
Posted 08/14/2013 08:48 PM   
Switched to the 310.14 NVIDIA drivers. Same behavior. Interestingly enough, it only happens with gnome-terminal. Running xterm does not seem to produce the issue. Of course, scrolling is more difficult in xterm and I have not spent much time testing there yet, but it is easily reproducible with gnome-terminal on Quadro FX 2800M. This looks very similar to this issue, which is labeled as NVIDIA bug "973068": [url]http://www.nvnews.net/vbulletin/showthread.php?t=174759[/url] According to "sandipt": [i]We are reproduced this issue in house and investigating. Bug is :973068 Fedora 17: X freeze with flash HD video on firefox/chrome with NVRM: GPU at 0000:01:00.0 has fallen off the bus[/i] However, I can find no further mention of bug "973068" in the release notes.
Switched to the 310.14 NVIDIA drivers. Same behavior.

Interestingly enough, it only happens with gnome-terminal. Running xterm does not
seem to produce the issue. Of course, scrolling is more difficult in xterm and I have
not spent much time testing there yet, but it is easily reproducible with gnome-terminal
on Quadro FX 2800M.

This looks very similar to this issue, which is labeled as NVIDIA bug "973068":

http://www.nvnews.net/vbulletin/showthread.php?t=174759

According to "sandipt":

We are reproduced this issue in house and investigating. Bug is :973068 Fedora 17: X freeze with flash HD video on firefox/chrome with NVRM: GPU at 0000:01:00.0 has fallen off the bus

However, I can find no further mention of bug "973068" in the release notes.

#15
Posted 08/15/2013 01:43 PM   
Scroll To Top

Add Reply