33Mhz stuck problem after waking up from suspend

I have MSI GE702PC with gtx 850m nvidia graphic card with ,linux mint 17.3 18.1 18.2, nvidia-384.69(tried almost all releases that is lower than this). Everything works fine until waking up from suspend. Normally gpu clock is set to be max via /etc/X11/xorg.conf with
Option “RegistryDwords” “PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerDefaultAC=0x1”.

But after waking up from suspend the clock is fixed to 33Mhz(fps drops from around 3800 to around 78) and i cant change it. My kernel version is 10 (lower versions causes flickering.) and i tried with 4.4… 4.8… nothing changed.

When i went deeper i saw that, this is a throttling problem. Before suspending my system everything works great and my throttle parameters are:

Clocks Throttle Reasons
    Idle                        : Not Active
    Applications Clocks Setting : Not Active
    SW Power Cap                : Not Active
    HW Slowdown                 : Not Active
    Sync Boost                  : Not Active
    Unknown                     : Not Active

But after suspend it becomes:

Clocks Throttle Reasons
    Idle                        : Active
    Applications Clocks Setting : Not Active
    SW Power Cap                : Not Active
    HW Slowdown                 : Active
    Sync Boost                  : Not Active
    Unknown                     : Not Active

I know there is no problem with my hardware, i switched my operating system from windows to linux like 3 years ago and this problem started with this OS movement.

I don’t think someone will offer officially to play with nvidia bios. But i’m brave and open to all suggestions.

UPDATE

*This problem occurs also when i unplug the power cable. When i switch from AC to battery it also suddenly goes to 33Mhz and same throttle issue. And stays there even after re-plugging the AC cable.

*Its all okay with open source driver and intel graphics card. But i don’t get smooth experience with open source driver, that is why im insist on nvidia driver.

nvidia-bug-report.log.gz (154 KB)

Please remove the RegistryDwords line, reboot, suspend/resume then run (as root) nvidia-bug-report.sh and attach the output file to your post.

Just edited the topic.

Attach the .tar.gz file it produces, not the text.

done. Under throttle part

Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Active
HW Slowdown : Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active

Is there any power management problem?
nvidia-bug-report.log.gz (154 KB)

Nothing unusual, besides the GPU stuck at 33MHz.
Can you please reboot, then unplug power and run nvidia-bug-report.sh ẁhile unplugged again?

Attached the file. What is your opinion for throttling?
nvidia-bug-report.log.gz (142 KB)

Looks like ACPI flaws of your bios which the nvidia driver doesn’t like, e.g.

[   60.442745] [Firmware Bug]: battery: (dis)charge rate invalid.

Still a question at which level.
For a start, you could try in xorg.conf

Option "ConnectToAcpid" "0"

and see if unplugging power still has an effect.

Its interesting :). now we have same problem after suspend. But when i unplug the cable and plug again it seems better but not the optimal. unplugged frequency 215Mhz plugged 238Mhz. The bug file is attached.

Now we have only HW Slowdown throttle active…

I really wonder how did you decide to add this option when you see the bug. Is there a list or experience?..
nvidia-bug-report.log.gz (147 KB)

There’s a list:
http://us.download.nvidia.com/XFree86/Linux-x86/304.43/README/xconfigoptions.html

Obviously, this problem goes a bit deeper.
Now, this might be a bit dangerous, so be sure to know how to boot to safe mode and revert:
In the folder
/etc/modprobe.d
look for the nvidia.conf (or the like) , edit it, find the line

options nvidia NVreg_Devic...

and add the option

NVreg_RegisterForACPIEvents=0

reboot, and check.

nvidia-bug-report.log.gz (146 KB)

No need to enter those DeviceFile options, those settings are to set the access rights to the device files in /dev. I think Mint/Ubuntu uses udev rules to accomplish that. It differs between distributions.
In your case, the line

options nvidia_384 NVreg_RegisterForACPIEvents=0

at the end of nvidia-graphics-drivers.conf would suffice.

The same attitude after power switching. Adding the bug report.

What we got so far is that:
*We could fix one throttle problem after power switching. Now we have only one throttle activated (Max 238 Mhz).
*But we have same problem (33 Mhz and 2 throttle activated) after waking up from suspend.

/etc/modprobe.d $ cat nvidia-graphics-drivers.conf 
# This file was installed by nvidia-384
# Do not edit this file manually

blacklist nouveau
blacklist lbm-nouveau
blacklist nvidia-current
blacklist nvidia-173
blacklist nvidia-96
blacklist nvidia-current-updates
blacklist nvidia-173-updates
blacklist nvidia-96-updates
blacklist nvidia-384-updates
alias nvidia nvidia_384
alias nvidia-uvm nvidia_384_uvm
alias nvidia-modeset nvidia_384_modeset
alias nvidia-drm nvidia_384_drm
alias nouveau off
alias lbm-nouveau off

options nvidia_384_drm modeset=0
options nvidia_384 NVreg_RegisterForACPIEvents=0

nvidia-bug-report.log.gz (147 KB)

Hi.

I’m noticing something similar for my Dell Precision M3520 w/Quadro M620. pclk is getting stuck at 254 MHz after switching to battery or after sleep/wakeup.

# gpu   pwr  temp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     %     %     %     %   MHz   MHz
    0     -    67    75    25     0     0  2505  1019
    0     -    68    92    29     0     0  2505  1019
    0     -    69    99    31     0     0  2505  1019
    0     -    69    99    33     0     0  2505  1019
    0     -    69    97    31     0     0  2505  1019
    0     -    68    52    12     0     0  2505  1019
    0     -    69    84    27     0     0  2505  1019
    0     -    69    98    32     0     0  2505  1019
    0     -    69    96    33     0     0  2505  1019
    0     -    69   100    32     0     0  2505  1019
    0     -    69    99    34     0     0  2505  1019
    0     -    69    99    35     0     0  2505  1019
    0     -    70    94    30     0     0  2505  1019
    0     -    69    83    27     0     0  2505  1019
    0     -    70    85    27     0     0  2505  1019
    0     -    70    98    31     0     0  2505  1019
    0     -    69    83    20     0     0  2505  1019
    0     -    66     0     0     0     0  2505  1019
    0     -    67    84    20     0     0  2505  1019
    0     -    68    96    25     0     0  2505  1019
    0     -    69    63    21     0     0  2505  1019
    0     -    70   100    32     0     0  2505  1019
    0     -    70   100    35     0     0  2505  1019
    0     -    69    86    28     0     0  2505  1019
    0     -    68   100    16     0     0  2505   254 <-- AC disconnected
    0     -    67   100     8     0     0  2505   254
    0     -    67   100    15     0     0  2505   254
    0     -    67   100     8     0     0  2505   254
    0     -    67   100    15     0     0  2505   254
    0     -    67   100     8     0     0  2505   254

I’ve tried passing NVreg_RegisterForACPIEvents=0 option, but journalctl says that nvidia module is connecting to acpid. I’m using bumblebee.
nvidia-bug-report-just-after-reboot.log.gz (140 KB)
nvidia-bug-report-when-app-is-working.log.gz (177 KB)
nvidia-bug-report-app-is-working-on-batt.log.gz (180 KB)
nvidia-bug-report-app-is-working-on-ac-clock-stuck.log.gz (183 KB)

It does not make any sense. Lots of MSI laptops has this problem. And nothing is done to fix it.

https://superuser.com/questions/893482/gtx-860m-stuck-to-33-mhz-after-suspend-on-linux
https://askubuntu.com/questions/790535/nvidia-gt860m-gpu-clock-stuck-at-33mhz-after-resume-from-suspend-14-04-16-04

https://devtalk.nvidia.com/default/topic/831369/?comment=4530202

Can’t really NVIDIA take an MSI laptop and easily reproduce this issue and fix it?

Up?

Just for info, does restarting just the Xserver return the gpu to normal operations like someone in those links mentioned?

No it cannot return the gpu to normal operations

Coud you attach the output of
sudo lspci -vvv
before and after pulling the plug?
You could also try kernel 4.17, it has some acpi fixes that might be relevant.