[REGRESSION] [300 series] Short freeze/FPS drop every 5 seconds
Hi, on my system, all of the 300 series drivers have the same problem: Every 5 seconds, OpenGL games stop for like 200 ms or something. That causes an FPS drop, short audio glitch and then the game continues as usual for another 5 seconds. Versions tested and affected: - 302.17 - 304.22 - 304.48 - 310.14 - 313.09 Versions not affected: - anything 2xx. I'm running 295.75 for months now.
Hi, on my system, all of the 300 series drivers have the same problem:

Every 5 seconds, OpenGL games stop for like 200 ms or something. That causes an FPS drop, short audio glitch and then the game continues as usual for another 5 seconds.

Versions tested and affected:
- 302.17
- 304.22
- 304.48
- 310.14
- 313.09

Versions not affected:
- anything 2xx. I'm running 295.75 for months now.

#1
Posted 01/13/2013 08:39 PM   
Attached nvidia-bug-report.log from different kernel/nvidia versions. The ".bmp" extension was added to make forum accept it.
Attached nvidia-bug-report.log from different kernel/nvidia versions. The ".bmp" extension was added to make forum accept it.

#2
Posted 01/13/2013 08:42 PM   
Can you connect from another computer with SSH while the game is running? Try running 'top' to see if any other processes are showing a CPU usage spike every 5 seconds.
Can you connect from another computer with SSH while the game is running? Try running 'top' to see if any other processes are showing a CPU usage spike every 5 seconds.

Aaron Plattner
NVIDIA Linux Graphics

#3
Posted 01/14/2013 11:39 PM   
There is nothing suspicious in top. Even if it was some other application misbehaving with newer drivers, the skips are probably too short to push it high in top's output. Note: at the time I'm running a game, nothing else is supposed to use the GPU, I don't run compositing window managers or two games at once or crunch numbers while gaming. Not to mention this is a quad core system, so a CPU-bound program shouldn't stop others from running. This game uses 12%-14% CPU normally. Most of the time it's in "S" state. It's not taxing on GPU nor CPU, but still, has those "twitches" every few seconds: [img]http://i.imgur.com/reEgHEN.png[/img]
There is nothing suspicious in top.

Even if it was some other application misbehaving with newer drivers, the skips are probably too short to push it high in top's output.

Note: at the time I'm running a game, nothing else is supposed to use the GPU, I don't run compositing window managers or two games at once or crunch numbers while gaming.

Not to mention this is a quad core system, so a CPU-bound program shouldn't stop others from running.

This game uses 12%-14% CPU normally. Most of the time it's in "S" state. It's not taxing on GPU nor CPU, but still, has those "twitches" every few seconds:

Image
Attachments

zrzut_ekranu-82.png

#4
Posted 01/23/2013 12:42 PM   
It might be throttling, check your GPU temperature.
It might be throttling, check your GPU temperature.

Artem S. Tashkinov
Linux and Open Source advocate

#5
Posted 01/23/2013 02:42 PM   
Surely has nothing to do with throttling or temperature. However, I've noticed (and you can see this in nvidia-bug-reports as well) that Xorg.0.log contains huge amounts of messages about 3D Vision, EDID and modes that contradict themselves. At the beginning it spits those frequently, but after some time, the messages start appearing in bunches roughly every 5 seconds and each "bunch" takes around 50, up to 80 ms. That has to be it, but how to stop this madness? Again: none of this takes place with 290 series drivers. No "contradicts" there at all, even on first start. I can run nvidia-bug-report on working 295.75 if it's needed for comparison.
Surely has nothing to do with throttling or temperature.

However, I've noticed (and you can see this in nvidia-bug-reports as well) that Xorg.0.log contains huge amounts of messages about 3D Vision, EDID and modes that contradict themselves.

At the beginning it spits those frequently, but after some time, the messages start appearing in bunches roughly every 5 seconds and each "bunch" takes around 50, up to 80 ms. That has to be it, but how to stop this madness?

Again: none of this takes place with 290 series drivers. No "contradicts" there at all, even on first start. I can run nvidia-bug-report on working 295.75 if it's needed for comparison.

#6
Posted 01/23/2013 07:15 PM   
Is sounds like something (either the GPU or one of the monitor attached to it) is generating hotplug events constantly. Try unplugging and firmly reconnecting the cables between the monitors and the GPU, and making sure the screws holding the connectors in are tight. You can attempt to work around the problem by adding the ConnectedMonitor option: [code] Option ConnectedMonitor "CRT-0" [/code] or [code] Option ConnectedMonitor "DFP-1" [/code] or both [code] Option ConnectedMonitor "CRT-0, DFP-1" [/code]
Is sounds like something (either the GPU or one of the monitor attached to it) is generating hotplug events constantly. Try unplugging and firmly reconnecting the cables between the monitors and the GPU, and making sure the screws holding the connectors in are tight.

You can attempt to work around the problem by adding the ConnectedMonitor option:
Option ConnectedMonitor "CRT-0"

or
Option ConnectedMonitor "DFP-1"

or both
Option ConnectedMonitor "CRT-0, DFP-1"

Aaron Plattner
NVIDIA Linux Graphics

#7
Posted 01/29/2013 05:34 PM   
We haven't been able to reproduce this problem. Which desktop environment are you using? While the symptom is occurring, what other programs are using CPU time? I wonder if one of them is spamming the X server with RandR requests. (Your "top" window screenshot is obscured by Neverball).
We haven't been able to reproduce this problem. Which desktop environment are you using? While the symptom is occurring, what other programs are using CPU time? I wonder if one of them is spamming the X server with RandR requests. (Your "top" window screenshot is obscured by Neverball).

Aaron Plattner
NVIDIA Linux Graphics

#8
Posted 01/29/2013 11:45 PM   
I've tried that option and server sees it: [ 42.095] (**) NVIDIA(0): Option "ConnectedMonitor" "CRT-0, DFP-1" (...) [ 42.095] (**) NVIDIA(0): ConnectedMonitor string: "CRT-0, DFP-1" But this doesn't help. I still get the skips and EDID spam in /var/log/Xorg.0.log HOWEVER, after even more testing, I finally found what's causing this. Turns out, having the "GPU" meter enabled in gkrellm (version 2.3.4) makes it run: nvidia-settings -q '[gpu:0]/GPUCoreTemp' every 5 seconds. And when that happenned, I got this full stop of Xorg for a short while. In fact, each time I run nvidia-settings interactively or nvidia-settings -q GPUCoreTemp, I got that spam in Xorg.0.log. So I checked nvidia-settings version: nvidia-settings: version 280.11 So I've packaged a newer one: nvidia-settings: version 313.18 ...and the problem looks like fixed. I'm going to reboot some more and test things now, wish me luck.
I've tried that option and server sees it:
[ 42.095] (**) NVIDIA(0): Option "ConnectedMonitor" "CRT-0, DFP-1"
(...)
[ 42.095] (**) NVIDIA(0): ConnectedMonitor string: "CRT-0, DFP-1"

But this doesn't help. I still get the skips and EDID spam in /var/log/Xorg.0.log

HOWEVER, after even more testing, I finally found what's causing this.


Turns out, having the "GPU" meter enabled in gkrellm (version 2.3.4) makes it run:
nvidia-settings -q '[gpu:0]/GPUCoreTemp'
every 5 seconds. And when that happenned, I got this full stop of Xorg for a short while.

In fact, each time I run nvidia-settings interactively or nvidia-settings -q GPUCoreTemp, I got that spam in Xorg.0.log.


So I checked nvidia-settings version:
nvidia-settings: version 280.11

So I've packaged a newer one:
nvidia-settings: version 313.18

...and the problem looks like fixed. I'm going to reboot some more and test things now, wish me luck.

#9
Posted 01/30/2013 05:35 PM   
Scroll To Top

Add Reply