[REGRESSION] [300 series] Short freeze/FPS drop every 5 seconds

Hi, on my system, all of the 300 series drivers have the same problem:

Every 5 seconds, OpenGL games stop for like 200 ms or something. That causes an FPS drop, short audio glitch and then the game continues as usual for another 5 seconds.

Versions tested and affected:

  • 302.17
  • 304.22
  • 304.48
  • 310.14
  • 313.09

Versions not affected:

  • anything 2xx. I’m running 295.75 for months now.
    [This file was removed because it was flagged as potentially malicious] (102 KB)
    [This file was removed because it was flagged as potentially malicious] (150 KB)
    nvidia-bug-report.log.gz.bmp (155 KB)

Attached nvidia-bug-report.log from different kernel/nvidia versions. The “.bmp” extension was added to make forum accept it.

Can you connect from another computer with SSH while the game is running? Try running ‘top’ to see if any other processes are showing a CPU usage spike every 5 seconds.

There is nothing suspicious in top.

Even if it was some other application misbehaving with newer drivers, the skips are probably too short to push it high in top’s output.

Note: at the time I’m running a game, nothing else is supposed to use the GPU, I don’t run compositing window managers or two games at once or crunch numbers while gaming.

Not to mention this is a quad core system, so a CPU-bound program shouldn’t stop others from running.

This game uses 12%-14% CPU normally. Most of the time it’s in “S” state. It’s not taxing on GPU nor CPU, but still, has those “twitches” every few seconds:


It might be throttling, check your GPU temperature.

Surely has nothing to do with throttling or temperature.

However, I’ve noticed (and you can see this in nvidia-bug-reports as well) that Xorg.0.log contains huge amounts of messages about 3D Vision, EDID and modes that contradict themselves.

At the beginning it spits those frequently, but after some time, the messages start appearing in bunches roughly every 5 seconds and each “bunch” takes around 50, up to 80 ms. That has to be it, but how to stop this madness?

Again: none of this takes place with 290 series drivers. No “contradicts” there at all, even on first start. I can run nvidia-bug-report on working 295.75 if it’s needed for comparison.

Is sounds like something (either the GPU or one of the monitor attached to it) is generating hotplug events constantly. Try unplugging and firmly reconnecting the cables between the monitors and the GPU, and making sure the screws holding the connectors in are tight.

You can attempt to work around the problem by adding the ConnectedMonitor option:

Option ConnectedMonitor "CRT-0"

or

Option ConnectedMonitor "DFP-1"

or both

Option ConnectedMonitor "CRT-0, DFP-1"

We haven’t been able to reproduce this problem. Which desktop environment are you using? While the symptom is occurring, what other programs are using CPU time? I wonder if one of them is spamming the X server with RandR requests. (Your “top” window screenshot is obscured by Neverball).

I’ve tried that option and server sees it:
[ 42.095] () NVIDIA(0): Option “ConnectedMonitor” “CRT-0, DFP-1”
(…)
[ 42.095] (
) NVIDIA(0): ConnectedMonitor string: “CRT-0, DFP-1”

But this doesn’t help. I still get the skips and EDID spam in /var/log/Xorg.0.log

HOWEVER, after even more testing, I finally found what’s causing this.

Turns out, having the “GPU” meter enabled in gkrellm (version 2.3.4) makes it run:
nvidia-settings -q ‘[gpu:0]/GPUCoreTemp’
every 5 seconds. And when that happenned, I got this full stop of Xorg for a short while.

In fact, each time I run nvidia-settings interactively or nvidia-settings -q GPUCoreTemp, I got that spam in Xorg.0.log.

So I checked nvidia-settings version:
nvidia-settings: version 280.11

So I’ve packaged a newer one:
nvidia-settings: version 313.18

…and the problem looks like fixed. I’m going to reboot some more and test things now, wish me luck.