Titan V, Ubuntu 16.04LTS and 387.34 driver crashes badly
I would try updating motherboard BIOS and using another PSU. Also, try disabling any OC'ing if you have one (CPU/RAM/PCI-E). I would even recommend resetting BIOS settings to safe defaults. Titan V is quite a beast which warrants top quality components.
I would try updating motherboard BIOS and using another PSU. Also, try disabling any OC'ing if you have one (CPU/RAM/PCI-E). I would even recommend resetting BIOS settings to safe defaults.

Titan V is quite a beast which warrants top quality components.

Artem S. Tashkinov
Linux and Open Source advocate

#16
Posted 01/03/2018 11:19 AM   
Already updated BIOS, disabled all OC, cleared CMOS to basic settings. Tried swapping PSU. Symptoms remain the same. Windows is rock solid. (Mathworks has confirmed that the initial delay is to be expected - it takes them that long to compile and load a binary to the GPU when it's a model they don't have in their database). Linux is flaky. New datum - Linux driver frequently returns NaN as available GPU memory on first attach. Windows driver does not exhibit this behavior. I'm going to try swapping slots for the GPU. I can't use built-in graphics for display - the motherboard doesn't have it. Seems silly to do that when the Windows system works just fine. Switching slots may be a good thing to try. I may even swap it into a different computer just as a driver check.
Already updated BIOS, disabled all OC, cleared CMOS to basic settings. Tried swapping PSU. Symptoms remain the same. Windows is rock solid. (Mathworks has confirmed that the initial delay is to be expected - it takes them that long to compile and load a binary to the GPU when it's a model they don't have in their database). Linux is flaky. New datum - Linux driver frequently returns NaN as available GPU memory on first attach. Windows driver does not exhibit this behavior. I'm going to try swapping slots for the GPU. I can't use built-in graphics for display - the motherboard doesn't have it. Seems silly to do that when the Windows system works just fine. Switching slots may be a good thing to try. I may even swap it into a different computer just as a driver check.

#17
Posted 01/03/2018 03:57 PM   
390 beta driver is out renewing Titan V support.
390 beta driver is out renewing Titan V support.

#18
Posted 01/04/2018 05:14 PM   
I'm not finding that beta driver. Can you provide a link? It's not showing up on the Advanced Driver Search when looking for English language, Ubuntu 16.04 64-bit, Titan V, beta
I'm not finding that beta driver. Can you provide a link? It's not showing up on the Advanced Driver Search when looking for English language, Ubuntu 16.04 64-bit, Titan V, beta

#19
Posted 01/04/2018 07:19 PM   
[url]https://devtalk.nvidia.com/default/topic/533434/linux/current-graphics-driver-releases/[/url]
Got it. Hate doing .run installs, but did it anyway. Exact same behavior. I'm torn between giving up and sticking with <ack> Windows, which at least works, or giving up and returning the board as unfit for use.
Got it. Hate doing .run installs, but did it anyway. Exact same behavior. I'm torn between giving up and sticking with <ack> Windows, which at least works, or giving up and returning the board as unfit for use.

#21
Posted 01/04/2018 08:06 PM   
Can you check if Spread Spectrum is enabled in bios and disable it? If that's not helping, try using kernel parameters clocksource=hpet lapic=notscdeadline Edit: [url]https://communities.intel.com/thread/119716[/url]
Can you check if Spread Spectrum is enabled in bios and disable it?
If that's not helping, try using kernel parameters
clocksource=hpet lapic=notscdeadline
Edit: https://communities.intel.com/thread/119716

#22
Posted 01/04/2018 09:42 PM   
Basically the same problem here with Fedora 27, Titan V, and the 387.34 driver. It just hangs randomly and when I SSH to my desktop from my laptop and check top Xorg is using 100%. I attached to the process with gdb as root and got the following backtrace. (gdb) bt #0 0x00007f9bfc116877 in ioctl () from /lib64/libc.so.6 #1 0x00007f9bf691b7d1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #2 0x00007f9bf691710a in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #3 0x00007f9bf6919e79 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #4 0x00007f9bf68ad26b in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #5 0x00007f9bf6e2ab44 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so #6 0x00000000020e9660 in ?? () #7 0x00000000020f9278 in ?? () #8 0x00000000028bb720 in ?? () #9 0x00000020f6919e85 in ?? () #10 0x0000000000000080 in ?? () #11 0x000000000219f650 in ?? () #12 0x0000000000000553 in ?? () #13 0x000000000000049d in ?? () #14 0x000000000219b4c0 in ?? () #15 0x00000000020d85a0 in ?? () #16 0x00000000027219d0 in ?? () #17 0x00000000004ba187 in xf86ScreenSetCursor () #18 0x00000000004ba484 in xf86SetCursor () #19 0x00000000004b8ec0 in xf86CursorSetCursor () #20 0x000000000058580a in miPointerUpdateSprite () #21 0x0000000000585a5a in miPointerDisplayCursor () #22 0x00000000004c73f0 in CursorDisplayCursor () #23 0x0000000000516c06 in AnimCurTimerNotify () ... Beyond that it's just more Xorg and timer stuff and probably not too useful. It would actually seem quite a coincidence now that I think about it that when the cursor starts to spin from some activity in Chrome this sometimes happens. It continues to spin actually (animates) even though everything else seems frozen.
Basically the same problem here with Fedora 27, Titan V, and the 387.34 driver. It just hangs randomly and when I SSH to my desktop from my laptop and check top Xorg is using 100%. I attached to the process with gdb as root and got the following backtrace.

(gdb) bt
#0 0x00007f9bfc116877 in ioctl () from /lib64/libc.so.6
#1 0x00007f9bf691b7d1 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#2 0x00007f9bf691710a in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#3 0x00007f9bf6919e79 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#4 0x00007f9bf68ad26b in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#5 0x00007f9bf6e2ab44 in ?? () from /usr/lib64/xorg/modules/drivers/nvidia_drv.so
#6 0x00000000020e9660 in ?? ()
#7 0x00000000020f9278 in ?? ()
#8 0x00000000028bb720 in ?? ()
#9 0x00000020f6919e85 in ?? ()
#10 0x0000000000000080 in ?? ()
#11 0x000000000219f650 in ?? ()
#12 0x0000000000000553 in ?? ()
#13 0x000000000000049d in ?? ()
#14 0x000000000219b4c0 in ?? ()
#15 0x00000000020d85a0 in ?? ()
#16 0x00000000027219d0 in ?? ()
#17 0x00000000004ba187 in xf86ScreenSetCursor ()
#18 0x00000000004ba484 in xf86SetCursor ()
#19 0x00000000004b8ec0 in xf86CursorSetCursor ()
#20 0x000000000058580a in miPointerUpdateSprite ()
#21 0x0000000000585a5a in miPointerDisplayCursor ()
#22 0x00000000004c73f0 in CursorDisplayCursor ()
#23 0x0000000000516c06 in AnimCurTimerNotify ()
...


Beyond that it's just more Xorg and timer stuff and probably not too useful. It would actually seem quite a coincidence now that I think about it that when the cursor starts to spin from some activity in Chrome this sometimes happens. It continues to spin actually (animates) even though everything else seems frozen.

#23
Posted 01/05/2018 08:59 AM   
@ework What kind of system setup are you using?
@ework
What kind of system setup are you using?

#24
Posted 01/05/2018 09:02 AM   
[quote=""]@ework What kind of system setup are you using?[/quote] It's a custom built system using an X99 chipset. I'll attach the lshw output.
said:@ework
What kind of system setup are you using?


It's a custom built system using an X99 chipset. I'll attach the lshw output.
Attachments

lshw.txt

#25
Posted 01/05/2018 09:06 AM   
It just happened again and for the exact same reason. Nautilus was waiting to load a directory and it started to spin the mouse cursor. Strange thing is when I attached with gdb and then quit which detaches then everything comes back. So it's waiting for something and gdb halting the process for a second seems to unblock it.
It just happened again and for the exact same reason. Nautilus was waiting to load a directory and it started to spin the mouse cursor. Strange thing is when I attached with gdb and then quit which detaches then everything comes back. So it's waiting for something and gdb halting the process for a second seems to unblock it.

#26
Posted 01/05/2018 09:18 AM   
Not overly familiar (yet) with the ASROCK BIOS. Spread spectrum clock was set to "AUTO" which doesn't tell me much. Was able to set it to no spread spectrum. Tried that. No difference. Added the kernel parameters - no difference.
Not overly familiar (yet) with the ASROCK BIOS. Spread spectrum clock was set to "AUTO" which doesn't tell me much. Was able to set it to no spread spectrum. Tried that. No difference. Added the kernel parameters - no difference.

#27
Posted 01/05/2018 02:51 PM   
This hang is an X server bug. There's a thread about it from November but it looks like the patches never made it into the codebase. I'll ping Keith to try to get them merged. https://lists.x.org/archives/xorg-devel/2017-November/055144.html As you noted, pausing the X server for a moment breaks it out of this infinite recursion loop. Attaching GDB is one way to do that. You could probably also do something like "pkill -STOP Xorg; sleep 0.5; pkill -CONT Xorg" from a script. I realize this isn't ideal, sorry. The reason this shows up more on Titan V is simply because cursor updates take a hair longer than they do on earlier GPU architectures.
Answer Accepted by Original Poster
This hang is an X server bug. There's a thread about it from November but it looks like the patches never made it into the codebase. I'll ping Keith to try to get them merged. https://lists.x.org/archives/xorg-devel/2017-November/055144.html


As you noted, pausing the X server for a moment breaks it out of this infinite recursion loop. Attaching GDB is one way to do that. You could probably also do something like "pkill -STOP Xorg; sleep 0.5; pkill -CONT Xorg" from a script. I realize this isn't ideal, sorry.

The reason this shows up more on Titan V is simply because cursor updates take a hair longer than they do on earlier GPU architectures.

Aaron Plattner
NVIDIA Linux Graphics

#28
Posted 01/05/2018 06:08 PM   
I had my doubts, but after testing, that does seem to be it. The script method isn't going to work for me - constantly running the script from another computer every time the thing hangs isn't practical - but it does give me hope for an eventual solution. This was a good catch!
I had my doubts, but after testing, that does seem to be it. The script method isn't going to work for me - constantly running the script from another computer every time the thing hangs isn't practical - but it does give me hope for an eventual solution. This was a good catch!

#29
Posted 01/05/2018 06:40 PM   
Thanks Aaron. If I get a chance I'll try applying that patch from the mailing list and rebuilding the xorg rpm for Fedora. If I get around to doing that and I go a few days without any hangs I'll report back.
Thanks Aaron. If I get a chance I'll try applying that patch from the mailing list and rebuilding the xorg rpm for Fedora. If I get around to doing that and I go a few days without any hangs I'll report back.

#30
Posted 01/05/2018 07:41 PM   
Scroll To Top

Add Reply