GPU job fail

Hi,

My system has 8 GTX 1080, when run multi-GPU job, system randomly hang

Linux gauss 3.13.0-86-lowlatency #131-Ubuntu

Kernel LOG:

Aug 21 16:37:23 gauss kernel: [10112.892192] Hardware name: Supermicro SYS-4028GR-TR/X10DRG-O±CPU, BIOS 2.0 12/28/2015
Aug 21 16:37:23 gauss kernel: [10112.892193] 0000000000000000 ffff88039b439d88 ffffffff8172e8b5 ffff881d50728000
Aug 21 16:37:23 gauss kernel: [10112.892199] 0000000000000000 ffff88039b439df8 ffffffffa02ed307 ffff881024ace080
Aug 21 16:37:23 gauss kernel: [10112.892203] ffffc900339a1008 00000001ffffffff 0000092b814852b6 000009327d6bfee2
Aug 21 16:37:23 gauss kernel: [10112.892206] Call Trace:
Aug 21 16:37:23 gauss kernel: [10112.892213] [] dump_stack+0x64/0x82
Aug 21 16:37:23 gauss kernel: [10112.892235] [] fetch_fault_buffer_entries+0x1e7/0x240 [nvidia_uvm]
Aug 21 16:37:23 gauss kernel: [10112.892246] [] uvm8_isr_bottom_half+0xaf/0x960 [nvidia_uvm]
Aug 21 16:37:23 gauss kernel: [10112.892253] [] _main_loop+0x92/0x180 [nvidia_uvm]
Aug 21 16:37:23 gauss kernel: [10112.892259] [] ? nvstatusToString+0x50/0x50 [nvidia_uvm]
Aug 21 16:37:23 gauss kernel: [10112.892264] [] kthread+0xd2/0xf0
Aug 21 16:37:23 gauss kernel: [10112.892267] [] ? kthread_create_on_node+0x1c0/0x1c0
Aug 21 16:37:23 gauss kernel: [10112.892270] [] ret_from_fork+0x58/0x90
Aug 21 16:37:23 gauss kernel: [10112.892272] [] ? kthread_create_on_node+0x1c0/0x1c0
Aug 21 16:37:53 gauss kernel: [10142.892189] CPU: 14 PID: 46725 Comm: ID 2: GeForce G Tainted: P OX 3.13.0-86-lowlatency #131-Ubuntu
Aug 21 16:37:53 gauss kernel: [10142.892192] Hardware name: Supermicro SYS-4028GR-TR/X10DRG-O±CPU, BIOS 2.0 12/28/2015
Aug 21 16:37:53 gauss kernel: [10142.892194] 0000000000000000 ffff88039b439d88 ffffffff8172e8b5 ffff881d50728000
Aug 21 16:37:53 gauss kernel: [10142.892198] 0000000000000000 ffff88039b439df8 ffffffffa02ed307 ffff881024ace080
Aug 21 16:37:53 gauss kernel: [10142.892202] ffffc900339a1008 00000001ffffffff 0000092b814852b6 00000939798faaf6
Aug 21 16:37:53 gauss kernel: [10142.892205] Call Trace:
Aug 21 16:37:53 gauss kernel: [10142.892213] [] dump_stack+0x64/0x82
Aug 21 16:37:53 gauss kernel: [10142.892239] [] fetch_fault_buffer_entries+0x1e7/0x240 [nvidia_uvm]
Aug 21 16:37:53 gauss kernel: [10142.892248] [] uvm8_isr_bottom_half+0xaf/0x960 [nvidia_uvm]
Aug 21 16:37:53 gauss kernel: [10142.892255] [] _main_loop+0x92/0x180 [nvidia_uvm]
Aug 21 16:37:53 gauss kernel: [10142.892261] [] ? nvstatusToString+0x50/0x50 [nvidia_uvm]
Aug 21 16:37:53 gauss kernel: [10142.892267] [] kthread+0xd2/0xf0
Aug 21 16:37:53 gauss kernel: [10142.892269] [] ? kthread_create_on_node+0x1c0/0x1c0
Aug 21 16:37:53 gauss kernel: [10142.892272] [] ret_from_fork+0x58/0x90
Aug 21 16:37:53 gauss kernel: [10142.892274] [] ? kthread_create_on_node+0x1c0/0x1c0
Aug 21 16:38:23 gauss kernel: [10172.892188] CPU: 42 PID: 46725 Comm: ID 2: GeForce G Tainted: P OX 3.13.0-86-lowlatency #131-Ubuntu

nvidia-bug-report.log

Best!
feiteng