Gigabyte GTX780: Fan running @100% at idle. (Solved).

Hello all,

I’ve just built a new Xeon based development workstation (Params below).
The machine has a single Gigabyte GTX N780OC GD3 card.
My problem is simple, once X starts, the fan goes into full swing no matter what type of load is being generated.
E.g. I’m getting the same noise level when displaying my normal KDE desktop (GPUCoreTemp at ~27c) and when running a Ungine benchmark (GPUCoreTemp at 55-57c).
At all times, GPUCurrentFanSpeed is stuck at 17 (read-only… coolbits?) and GPUCurrentFanSpeedRPM is stuck at 0.

Any ideas what I can do.
I’ll be shame if my wife will throw out my new brand new workstation out of the window :/

Machine configuration:
MB: Intel S2600C0.
RAM: 32GB RAM.
GPU: 1 x Gigabyte NGTX780OC 3GD.
HD: 4 x 2TB in software RAID10.
OS: Fedora 19, x86_64.
DRV: v331.20 (RPMFusion).

EDIT: I should add that the performance is right on the mark (~10% from Phoronix’ Titan review).

Thanks in advance,
Gilboa

I think (?) I found the source of the problem. The GPUCurrentClockFreqs seem to be running between 954Mhz (idle) and 1110Mhz (Unigine).
I would imagine this is quite high for idle?
On the other hand, power draw seems to be OK during idle (~200w) and Unigine (~300w).

Ideas?

  • Gilboa

I should add that once I free some time, I plan to install Windows on external drive and check if the problem persists. Obviously, if I can reproduce this issue under Windows, this is bad card, however, my gut feeling is that this is a video BIOS vs. driver issue. (Gigabyte uses a non-stock cooling solution).

  • Gilboa

nVidia bug report:

I solved the issue (following a Phoronix forum suggestion to try and search for an updated GPU BIOS [1])
I’ll write the complete answer so people facing the same issue might stumble upon this solution.
Here goes:
In order to find which BIOS to flash, I first went looking for the current GPU BIOS version at:
$ cat /proc/driver/nvidia/gpus/0/information | grep BIOS
Video BIOS: ??.??.??.??

In short, the nVidia driver, beyond not being able to detect the fan speed, was also unable to detect the BIOS version. In short, something is very wrong the machine’s POST sequence.

Went into the Intel MB settings (BIOS) and under Advanced → PCI Configuration was Legacy VGA Socket which was configured to initialize the wrong slot (Slot 1 instead of Slot 2).
Setting it to Slot 2 + complete power off + PSU disconnect (a simple reboot didn’t help) and I’ve got a silent machine.

Most likely this issue will prevent me from installing a second GPU, but to be honest, at least for now, I couldn’t care less