nvidia-smi power limit on GTX 1060

Hello,

I’m trying to set the power limit of my two GTX 1060 cards running on a Ubuntu 16.04 machine.

nvidia-smi -pm 1
nvidia-smi -pl 85 (85W for example)

However the result is the following:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 367.35 Driver Version: 367.35 | |-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================|
| 0 GeForce GTX 106… On | 0000:01:00.0 Off | N/A |
| 82% 80C P2 103W / 85W | 2161MiB / 6072MiB | 99% Default | ±------------------------------±---------------------±---------------------+
| 1 GeForce GTX 106… On | 0000:06:00.0 Off | N/A |
| 57% 77C P2 98W / 85W | 2161MiB / 6072MiB | 98% Default | ±------------------------------±---------------------±---------------------+

The cards are indeed still drawing more than 85W each. I already tried different versions of the nvidia driver (381, 375, now 367) without success.

Here’s my bug report log: nvidia-bug-report.log · GitHub

Thanks for any help

Hi Alex, Thanks for reporting this issue. Is this operation supported for this GPU? Have you checked with Vendor? Also do not use Option “RegistryDwords” “PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3” in xorg.conf.

– delete me –

Hello,
yes this operation is supported for my GPU, as it works on Windows using the vendor’s software (Palit Thunder Master).
I initially tried it without Option “RegistryDwords” “PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerLevel=0x3; PowerMizerDefault=0x3; PowerMizerDefaultAC=0x3” in xorg.conf already. I only put this there while troubleshooting.

Did you test with latest 375.66 and 381.22 drivers? Is any other driver version with which this issue resolved?

Yes I tested those latest drivers. It only works on Windows, not on Linux. I’m honestly out of ideas right now

We are tracking this issue under Bug 200316825 . We will keep you posted.

Hi flair666, Are you running any application that drawing power more than 85W each. Can you share that app or tell me the app we can run that will help to reproduce this issue.

I’m pretty sure the app in question is EWBF: EWBF's CUDA Zcash miner

The application I’m using is Claymore’s Dual Miner, found here [url]Claymore's Dual Ethereum AMD+NVIDIA GPU Miner v15.0 (Windows/Linux)
Download here: [url]MEGA

You may uncomment -epool, -ewal and -epsw in config.txt for testing purposes

Hi flair666,
Below is the update from our QA team. Looks like issue is specific to GPU you are using or system you are using. From which Vendor you bought this GPU? Did you test same gpu in any other system?

  1. Attempted try using dual GTX 1060 with default available BIOS, ran unigine heaven and 10 glxgear → could not observe the issue
  2. Flashed vBIOS as per customer to 86.06.13.00.11 and 86.06.39.00.18, Ran unigine heaven and 10 glxgear → Could not observe the issue
  3. Also updated the Game Deus Ex Mankind and played with the Dual GTX 1060 on power limit of 85W, Still i could not observe the power limit crossing over.

Downloaded the below app Tested with the config :- ( Dell precission tower 7910 + 2x GTX1060 + Driver 367.55 + Ubuntu 16.04 + Same vBIOS as per customer + connected to Samsung Sync master via DVI cable to single GPU )

Untar Claymore’s Dual Ethereum+Decred_Siacoin_Lbry_Pascal AMD+NVIDIA GPU Miner v9.5 - LINUX.tar.gz
Uncommented -epool, -ewal and -epsw from config.txt
Run ./start.bash

In Nvidia-smi i could not see much power utilization is happening

Power Utilization:- ( 2xGTX1060 with by default vBIOS 86.06.40.00.00 and with customer matching vBIOS the power usage are less compare to below details )

On Running Customer app :- 6W / 85W sometime for some moment it could be 25W / 85W
On Running Deux Mankind game:- 56W / 85W
On Running Furmark :- 82W / 85W sometime for some moment it could be 91W / 85W

Difference is only the system motherboard, whereas customer is using as per below config and same we are searching …

Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: H97-HD3

While I haven’t tested the cards in a any other system, everythings works just fine on Windows. The cards both are Palit graphic cards.
Meanwhile there’s another thread with people complaining about nvidia-smi compatibility with GTX 1060 cards: https://devtalk.nvidia.com/default/topic/1011804/linux/nvidia-smi-not-fully-supported-on-gtx-1060/
so there’s definitely something wrong

Power limit and performance state and two different things. Can you share url showing specification of you card?

Yeah, I can reproduce here reliably:

nvidia-smi -pm 1
nvidia-smi -pl 66

run ethminer # everything fine, 66 power limit is respected

startx # everything fine, 66 power limit is respected

stop the x server

66 power limit is respected

startx # f$!ck I can see the power limit as “98W / 66W” for one of my cards

The affected card is:

GPU UUID                        : GPU-52100543-8557-b8ae-4ac7-7d55def8f4c7
Minor Number                    : 0
VBIOS Version                   : 86.06.27.00.9B
MultiGPU Board                  : No

The card that works fine is:

GPU UUID                        : GPU-d163e003-0ead-7c8a-fbfa-e08f19ce189a
Minor Number                    : 1
VBIOS Version                   : 86.06.45.00.60
MultiGPU Board                  : No

And let me add that this is only one of many many issues with nvidia cards in my setup.

Problem is that he uses 2 cards!
You can not stop xserver ,nvidia does not know what cards is having what screen , so driver goes nuts.

I have similar problem:

https://ibb.co/fGZFja

I noticed that when i use

nvidia-xconfig --allow-empty-initial-configuration

than i get this:

https://ibb.co/cDd5Hv

which is ok ,only that my Mate-desktop show only black screen with empty white panels.

Web if full of problems with multiscreen but none of multiGPU and one screen.

In basic your problem is that you shutdown xorg, nvidia-settings are dependned on it. For some unknown (for me) reason, nvidia-smi is somehow dependent on nvidia-settings in xorg and you shut it down.
Same problem i had when runing xorg, using option:

nvidia-xconfig -a, --enable-all-gpus

I got 150W/130W.
It has something to do with monitors attached to GPU. If one or more GPU would not have monitor you will have same problem. (or it is only me)

i think that it is a bug but…

P.S.
Also do not understand why is overclocking can be done only by powermizer in nvidia-settings. I have gone from windows to linux ,and still i am dependend on xwindows in linux.

And it’s not a vendor-specific problem! Just tried swapping several cards in my test install and it’s always the first card listed in nvidia-smi (lowest pci id?) that is affected by this. Also note that the issue is not simply incorrect power draw reported but the card is really taking more power (measuring at the wall).

I’m on 375.66 … and please don’t ask me to try with newer drivers I think I’ve done enough investigating work for you already.

I was able to work around this issue by ensuring that the integrated Intel GPU is the first GPU in the system as shown with lspci -k | grep -A 2 -E “(VGA|3D)”. Some motherboards won’t enumerate the iGPU if there is no monitor connected, so I will use HDMI dummy plugs.

You can use dvi-VGA with VGA cable ,it is analog so VGA only know that something is in (CRT).

Thanks qxsnap!! I get exactly the same behaviour here!

Now we just have to wait few years until nvidia devs fix that one “if” when iterating over the gpus :(.