GeForce/Quadro power management on a headless Linux machine (without X server)

Hi,

I’m struggling with finding a way to control Nvidia power management on a Linux machine that runs in headless configuration (i.e. without running X server) with a Quadro K4000 or a GTX Titan.

Problem:
It can be observed in benchmarks that -starting from idle- CUDA compute performance is pretty slow for tens of milliseconds before the “normal” performance is achieved. Using nvidia-settings or nvidia-smi shows that the cards are initially in minimum-clock state and it takes some time before they switch to full-clock state.

This bothers me because I’m developing a CUDA application with real-time constraints and I want to have deterministic timing behaviour (as far as possible using CUDA…).

What I want to do:
Disable the clock reduction to have immediate full compute performance.

What I tried so far:
I know that this can be done e.g. with

nvidia-settings -a [gpu:n]/GpuPowerMizerMode=1

however nvidia-settings uses the NV-CONTROL X extension and hence requires a running X server.

Then I tried nvidia-smi, but it was unsuccesful. Maybe those commands are supported on Tesla cards only and not on desktop (Quadro, Geforce) cards?

$nvidia-smi -i 0 --application-clocks=2808,810
Setting applications clocks is not supported for GPU 0000:42:00.0.
Treating as warning and moving on.
All done.
$ nvidia-smi -i 0 --gom=1
GOM mode cannot be changed on GPU 0000:42:00.0.
Treating as warning and moving on.
All done.

So I was looking for a way to influence power management via kernel module parameters. In /usr/src/nvidia-340-340.29/nv-reg.h I discovered:

/*
 * Option: RegistryDwords
 *
 * Description:
 *
 * This option accepts a semicolon-separated list of key=value pairs. Each
 * key name is checked agains the table of static options; if a match is
 * found, the static option value is overridden, but invalid options remain
 * invalid. Pairs that do not match an entry in the static option table
 * are passed on to the RM directly.
 *
 * Format:
 *
 *  NVreg_RegistryDwords="<key=value>;<key=value>;..."
 */

#define __NV_REGISTRY_DWORDS RegistryDwords

As one solution would be to set in the xorg.conf

Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerDefault=0x1;PowerMizerDefaultAC=0x1"

my idea was to try to set these registry dwords on the kernel command line as module parameters and appended in the GRUB menu to the linux command line:

linux [...] NVreg_RegistryDwords="PowerMizerEnable=0x1;PerfLevelSrc=0x2222;PowerMizerDefault=0x1;PowerMizerDefaultAC=0x1"

But unfortunately, this had no effect.

Any more ideas how to disable clock throttling without running an X server?

try setting persistence mode using nvidia-smi

I’m not sure if this is supported on GeForce/Quadro

Hi,

I can activate persistence mode using

nvidia-persistenced --persistence-mode -u <myusername>

and I can see that driver persistence is activated using nvidia-smi.

I understand that the Nvidia driver being persistent might be a prerequisite for power management being active. However, I don’t see how this helps with resolving my problem, because I still can’t modify the power management settings…

As I set the persistence mode from command line after system boot, I suppose that the nvidia kernel module is loaded at boot time, but the driver is unloaded again because it is unused without an X server being active and then loaded again when I enable persistence mode. May driver and/or device are re-initialized then? Unfortunately, I didn’t find any Nvidia documentation on that topic.

If persistence mode is active, it should bring the card up to P0 power state, which was intended to be a suggestion to address this:

It is not a full answer to all the questions you posed. However it should have a noticeable effect on the “starting from idle” issue that seemed to be your central focus.