396.24 + 1070 Max-Q + External HDMI Monitor = 100% kworker thread
I just received a SAGER NP8851 (CLEVO P950ER) [[url]http://www.xoticpc.com/sager-np8851-clevo-p950er.html[/url]]. After installing the latest drivers, everything works great. Until I plug in my external HDMI monitor. As soon as I plug in the monitor, one kworker process pegs out at 100% of one core, and I always see an 'irq/###-nvidia' process getting much more CPU than normal (>6% most of the time). The kworker process stays stuck at 100%, even after unplugging the monitor, and can't be killed. [b]Other than this one pegged process, everything - including the external monitor - seems to be working fine.[/b] I've used this same monitor with other systems without this issue. [code] PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 83 root 20 0 0 0 0 R 97.0 0.0 1:05.58 kworker/0:1 2402 root 20 0 385660 93996 49080 S 2.0 0.3 0:09.14 Xorg 2656 evil 20 0 3225828 102348 82324 S 2.0 0.3 0:08.00 kwin_x11 2407 root -51 0 0 0 0 S 1.3 0.0 0:04.39 irq/130-nvidia 2680 evil 20 0 4242472 321128 155512 S 1.3 1.0 0:12.16 plasmashell 3664 evil 20 0 790744 21756 18412 S 1.0 0.1 0:03.40 conky 3993 evil 20 0 607524 162948 89496 S 1.0 0.5 0:05.00 steam [/code] I tried back-leveling to the 390 drivers from the official Ubuntu repos, but they produced the same issue. I haven't tried anything older yet, as I've not researched how long the 1070 has been supported by the linux drivers. I'm attaching a bug report. Nothing in the dmesg/logs seems to be jumping out at me to explain the problem. Also, here's what I see when I force a backtrace for the maxed CPU: [code] [ 1447.277376] NMI backtrace for cpu 0 [ 1447.277378] CPU: 0 PID: 83 Comm: kworker/0:1 Tainted: P OE 4.15.0-20-generic #21-Ubuntu [ 1447.277378] Hardware name: Notebook P95xER /P95xER , BIOS 1.05.04dRLS2 04/25/2018 [ 1447.277381] Workqueue: kacpid acpi_os_execute_deferred [ 1447.277382] RIP: 0010:_raw_spin_unlock_irqrestore+0x1b/0x20 [ 1447.277382] RSP: 0018:ffffa8cb034b7ba0 EFLAGS: 00000293 [ 1447.277383] RAX: 0000000000000293 RBX: ffff894d9884f2d0 RCX: 0000000180330029 [ 1447.277384] RDX: 0000000000000001 RSI: 0000000000000293 RDI: 0000000000000293 [ 1447.277384] RBP: ffffa8cb034b7ba8 R08: ffff894d9cd51550 R09: 0000000180330029 [ 1447.277385] R10: ffffa8cb034b7b90 R11: ffff894d9cd88000 R12: 0000000000000002 [ 1447.277385] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 1447.277386] FS: 0000000000000000(0000) GS:ffff894d9d200000(0000) knlGS:0000000000000000 [ 1447.277386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1447.277387] CR2: 00007f8ee7d9c000 CR3: 000000055540a005 CR4: 00000000003606f0 [ 1447.277387] Call Trace: [ 1447.277389] ? acpi_os_release_lock+0xe/0x10 [ 1447.277390] acpi_ut_update_ref_count.part.1+0x51/0x6e1 [ 1447.277391] acpi_ut_update_object_reference+0x113/0x20e [ 1447.277392] acpi_ut_add_reference+0x64/0x6a [ 1447.277394] acpi_ex_resolve_node_to_value+0x23f/0x46c [ 1447.277395] acpi_ex_resolve_to_value+0x391/0x43f [ 1447.277397] acpi_ds_evaluate_name_path+0xb2/0x168 [ 1447.277398] ? acpi_db_single_step+0x1f/0x29d [ 1447.277399] acpi_ds_exec_end_op+0x120/0x736 [ 1447.277401] acpi_ps_parse_loop+0x918/0x9c2 [ 1447.277425] ? acpi_ut_remove_reference+0x72/0x79 [ 1447.277426] acpi_ps_parse_aml+0x1ac/0x4bd [ 1447.277427] acpi_ps_execute_method+0x1fa/0x2bc [ 1447.277429] acpi_ns_evaluate+0x2ee/0x435 [ 1447.277430] acpi_ev_asynch_execute_gpe_method+0xbd/0x159 [ 1447.277431] acpi_os_execute_deferred+0x1a/0x30 [ 1447.277433] process_one_work+0x1de/0x410 [ 1447.277434] worker_thread+0x32/0x410 [ 1447.277435] kthread+0x121/0x140 [ 1447.277436] ? process_one_work+0x410/0x410 [ 1447.277437] ? kthread_create_worker_on_cpu+0x70/0x70 [ 1447.277438] ret_from_fork+0x35/0x40 [ 1447.277439] Code: 89 d0 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d 0f 1f 44 00 00 5d <c3> 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 48 89 f7 57 [/code] Any help getting to the bottom of this is appreciated.
I just received a SAGER NP8851 (CLEVO P950ER) [http://www.xoticpc.com/sager-np8851-clevo-p950er.html].

After installing the latest drivers, everything works great. Until I plug in my external HDMI monitor.

As soon as I plug in the monitor, one kworker process pegs out at 100% of one core, and I always see an 'irq/###-nvidia' process getting much more CPU than normal (>6% most of the time). The kworker process stays stuck at 100%, even after unplugging the monitor, and can't be killed. Other than this one pegged process, everything - including the external monitor - seems to be working fine.

I've used this same monitor with other systems without this issue.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                  
83 root 20 0 0 0 0 R 97.0 0.0 1:05.58 kworker/0:1
2402 root 20 0 385660 93996 49080 S 2.0 0.3 0:09.14 Xorg
2656 evil 20 0 3225828 102348 82324 S 2.0 0.3 0:08.00 kwin_x11
2407 root -51 0 0 0 0 S 1.3 0.0 0:04.39 irq/130-nvidia
2680 evil 20 0 4242472 321128 155512 S 1.3 1.0 0:12.16 plasmashell
3664 evil 20 0 790744 21756 18412 S 1.0 0.1 0:03.40 conky
3993 evil 20 0 607524 162948 89496 S 1.0 0.5 0:05.00 steam


I tried back-leveling to the 390 drivers from the official Ubuntu repos, but they produced the same issue. I haven't tried anything older yet, as I've not researched how long the 1070 has been supported by the linux drivers.

I'm attaching a bug report. Nothing in the dmesg/logs seems to be jumping out at me to explain the problem.

Also, here's what I see when I force a backtrace for the maxed CPU:

[ 1447.277376] NMI backtrace for cpu 0
[ 1447.277378] CPU: 0 PID: 83 Comm: kworker/0:1 Tainted: P OE 4.15.0-20-generic #21-Ubuntu
[ 1447.277378] Hardware name: Notebook P95xER /P95xER , BIOS 1.05.04dRLS2 04/25/2018
[ 1447.277381] Workqueue: kacpid acpi_os_execute_deferred
[ 1447.277382] RIP: 0010:_raw_spin_unlock_irqrestore+0x1b/0x20
[ 1447.277382] RSP: 0018:ffffa8cb034b7ba0 EFLAGS: 00000293
[ 1447.277383] RAX: 0000000000000293 RBX: ffff894d9884f2d0 RCX: 0000000180330029
[ 1447.277384] RDX: 0000000000000001 RSI: 0000000000000293 RDI: 0000000000000293
[ 1447.277384] RBP: ffffa8cb034b7ba8 R08: ffff894d9cd51550 R09: 0000000180330029
[ 1447.277385] R10: ffffa8cb034b7b90 R11: ffff894d9cd88000 R12: 0000000000000002
[ 1447.277385] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 1447.277386] FS: 0000000000000000(0000) GS:ffff894d9d200000(0000) knlGS:0000000000000000
[ 1447.277386] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1447.277387] CR2: 00007f8ee7d9c000 CR3: 000000055540a005 CR4: 00000000003606f0
[ 1447.277387] Call Trace:
[ 1447.277389] ? acpi_os_release_lock+0xe/0x10
[ 1447.277390] acpi_ut_update_ref_count.part.1+0x51/0x6e1
[ 1447.277391] acpi_ut_update_object_reference+0x113/0x20e
[ 1447.277392] acpi_ut_add_reference+0x64/0x6a
[ 1447.277394] acpi_ex_resolve_node_to_value+0x23f/0x46c
[ 1447.277395] acpi_ex_resolve_to_value+0x391/0x43f
[ 1447.277397] acpi_ds_evaluate_name_path+0xb2/0x168
[ 1447.277398] ? acpi_db_single_step+0x1f/0x29d
[ 1447.277399] acpi_ds_exec_end_op+0x120/0x736
[ 1447.277401] acpi_ps_parse_loop+0x918/0x9c2
[ 1447.277425] ? acpi_ut_remove_reference+0x72/0x79
[ 1447.277426] acpi_ps_parse_aml+0x1ac/0x4bd
[ 1447.277427] acpi_ps_execute_method+0x1fa/0x2bc
[ 1447.277429] acpi_ns_evaluate+0x2ee/0x435
[ 1447.277430] acpi_ev_asynch_execute_gpe_method+0xbd/0x159
[ 1447.277431] acpi_os_execute_deferred+0x1a/0x30
[ 1447.277433] process_one_work+0x1de/0x410
[ 1447.277434] worker_thread+0x32/0x410
[ 1447.277435] kthread+0x121/0x140
[ 1447.277436] ? process_one_work+0x410/0x410
[ 1447.277437] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1447.277438] ret_from_fork+0x35/0x40
[ 1447.277439] Code: 89 d0 5d c3 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d 0f 1f 44 00 00 5d <c3> 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 48 89 f7 57


Any help getting to the bottom of this is appreciated.

#1
Posted 05/11/2018 11:01 AM   
I found a fix: I upgraded to the mainline 4.17 kernel, and the problem seems to have gone away.
Answer Accepted by Original Poster
I found a fix: I upgraded to the mainline 4.17 kernel, and the problem seems to have gone away.

#2
Posted 05/11/2018 07:55 PM   
Scroll To Top

Add Reply