X Server 1.13.1 deadlocks randomly on GeForce GTX680

Detailed description

The X server crashes / deadlocks randomly, for example:

  1. When the PC is activated again while being idle for some time (no hibernate/sleep mode)
  2. When browsing the internet, using simple applications.
  3. When playing 3D accelerated games using wine.

I’m currently running Archlinux x64, kernel 3.6.10 with the latest Nvidia beta drivers 313.09 from the Arch User Repository.
My hardware: Intel 3770k, 16G mem, Asus Nvidia GTX680, Asus Sabertooth Z77 BIOS 1708.

What i’ve tried already
Of course I searched online to find a fix or workaround, but sadly they didn’t fix the issue:

  1. Booted with iommu=pt or intel_iommu=igfx_off kernel options.
  2. Forcefully blacklisted the nouveau module and added gfxpayload=vga=0 to the kernel options.
  3. Downgraded to several older versions of the driver.
  4. Updated the BIOS of the motherboard.
  5. Added Option "UseEvents" "False" to Xorg config.
  6. Added Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerDefaultAC=0x1" to Xorg config.

I really can’t think of any steps to reproduce the issue due to the random behavior. I’m very happy to do anything to help identify the cause. Please let me know.

Bug reports
I couldn’t find a file attachment feature so I put the files on dropbox:

Bug report when just browsing, not doing anything special:
http://dl.dropbox.com/u/1076729/nvidia-bug-report.log.gz

Xorg log when activating the PC from idle:
http://dl.dropbox.com/u/1076729/Xorg.0.log.gz

Here’s a full debug log when activating the PC from idle:
[url]http://dl.dropbox.com/u/1076729/nvidia-bug-report-idle.log.gz[/url]

And another one when just using simple apps on the computer:
[url]http://dl.dropbox.com/u/1076729/nvidia-bug-report-simple-apps.log.gz[/url]

If the information is incomplete or just not enough to debug the issue, please give me some pointers so I can provide you with the information you require.

Am I the only one with this issue?
Could it be hardware related? Should I send it RMA?

I really appreciate any help to solve this, thank you!

Bump. Anyone?

Hi,

Seems like you also get the same NVRM Xid 59 error b[/b] I get on my Asus P8Z77 WS mainboard / 3770K / 4x GTX680 cards. So same mainboard manufacturer, same CPU, same GPU, and similar dmesg logs (such as ACPI conflict warnings). At the moment ASUS is silent b[/b] on the problem; as much as nvidia btw. Maybe downgrading to an older BIOS could help get a working hardware as suggested b[/b].

(1) [url]NVRM Xid error 59 with Kepler card (CUDA) on 4th PCIe 3.0 port - Linux - NVIDIA Developer Forums
(2) [url]http://vip.asus.com/forum/view.aspx?id=20121214190414274&board_id=1&model=P8Z77+WS&page=1&SLanguage=en-us[/url]
(3) http://vip.asus.com/forum/view.aspx?id=20121228195225613&board_id=1&model=P8Z77+WS&page=1&SLanguage=en-us]here

Yes, this bug is “old” - reported few weeks ago at least.

Just now for me,

[30808.533679] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[30808.588192] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[31322.700378] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[31322.760727] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[31479.704495] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[31479.799678] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000

The only workaround is to kill X session with SysRq-K and then restart it. Reloading driver does not help.

One reliable method of reproducing the error for me is running EveOnline (game) under Wine in 32-bit chroot, 64-bit OS (linux 3.6.7). There is severe “halting behaviour” when warping around and sometimes it causes above events. No other application causes errors for me. Messing around with shader settings, graphics quality, antialiasing, does not affect this bug.

My system is AMD, with MSI motherboard, 16GB RAM. So almost nothing in common except nVidia card (GTX 650) and driver.

Hopefully after new year this will be sorted.

Thanks for your replies, much appreciated! I created a ticket on Nvidia Support (CustHelp) and they forwarded it to the Linux team. I will keep this topic up to date with the progress in the ticket.

Is there any indication if this is a BIOS or Driver regression? Maybe anyone noticed the issue after a driver/BIOS upgrade a while ago?

I will cross-test older driver versions against older BIOS versions later this week. Again, if anyone has any other suggestion for me to test, I’m very happy to do so.

Maybe some useful additional info
I’m using dual-boot Windows for now (because the linux bug just is too annoying) and played some games lately. On one occasion the game Unreal Tournament 3 hung for a few seconds and Windows displayed a warning that the display driver has recovered. UT3 continued flawlessly. A few days later I was playing Borderlands 2 with some friends and out of nowhere the game just hung. (Too bad I was hosting at the time).

Update: I got a reply on the support ticket. They told me to send a mail to linux-bugs@nvidia.com with a bug report, so I just did.

[21686.153604] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[21903.534805] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[21903.629893] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[21953.749928] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[21953.805288] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[22473.453603] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22473.519689] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[22539.990059] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22585.533451] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22585.533797] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22585.628823] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[22633.881652] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22793.052762] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[22793.104973] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[25497.015131] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[25497.112577] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[25509.485803] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[25509.540257] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[26194.080698] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[26194.081404] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[26194.083599] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[26194.083749] NVRM: Xid (0000:01:00): 44, 0000 00000000 00000000 00000000 00000000 00000000
[26194.092213] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[26194.137376] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[26221.017712] NVRM: Xid (0000:01:00): 59, 0098(1c34) 00000000 00000000
[26221.073963] NVRM: Xid (0000:01:00): 59, 0098(1c40) 00000000 00000000
[26250.181167] NVRM: Xid (0000:01:00): 8, Channel 00000005
[26822.076214] NVRM: Xid (0000:01:00): 8, Channel 00000003
[27369.023791] NVRM: Xid (0000:01:00): 8, Channel 00000003

After a lot of 59 errors, some others started to get produced… The #8s were in Firefox.

Not sure what these mean. Reboot fixed the problems. PCIe Gen2 on this motherboard.