I’m running CUDA jobs on a Titan X on a system with no monitor attached. I would like to manually increase the fan speed to maintain reasonable temperatures because I will be running calculations constantly for weeks at a time.
I managed to get the hack for a dummy x server for setting the fan speed to work.
I followed the hack here: https://sites.google.com/site/akohlmey/random-hacks/nvidia-gpu-coolness
Basically, started up a dummy X windows so that I could use nvidia-settings commandline controls. I set the fan speed:
nvidia-settings -c :0 -a [gpu:0]/GPUFanControlState=1
nvidia-settings -c :0 -a [fan:0]/GPUTargetFanSpeed=70
and I set the driver to persistence mode
nvidia-smi -pm 1
then I quit the dummy X window server. Things run great until my first CUDA job finishes. Then the powermizer kicks in and throttles back from P2 to P8. But, it doesn’t throttle back up to P2 when I start the next job. If I restart the dummy x-server then it throttles up to P2 again. But, I don’t want to keep the dummy x-server running because it uses memory and I think it also slightly slows down the CUDA calculation.
If I don’t do the hack at all, and just run my jobs on the headless machine, then powermizer behaves sensibly (throttles up and down appropriately). (Of course this is not good because the fan speed stays stuck around 37% and I get to about 80+ C.
In a nutshell, it seems that manually setting the fan, and exiting X causes the powermizer to behave incorrectly.
Any ideas? This is with 349.16