EDIT 2018-09-01: This bug was fixed in 390.x but then was reintroduced in 396.x Beta :( … if you suffer from this bug, stick to 390.x (I’m using 390.48).
Ubuntu 16.04 here, tried many kernels too, but it happened with other distros. I’m taking the time to post this after many months of frustration.
I have motherboards with as many as 13 GPUs but also have mobos with 7 or 8 GPUs.
The nvidia driver seems to load the GPUs sequentially and it adds an exponentially increasing delay between each card. For 13 GPUs it used to take about 40 seconds on the v384 drivers and older, which is already very long! Now with the release of v387 and 390 drivers it takes a whopping 81 seconds!
Here’s a kernel log:
Jan 21 07:51:52 m8 kernel: [197026.537238] nvidia-modeset: Allocated GPU:0 (GPU-177a47dc-f3f8-b480-0f2a-e223c6874e91) @ PCI:0000:01:00.0
Jan 21 07:51:53 m8 kernel: [197027.475759] nvidia-modeset: Allocated GPU:1 (GPU-943a5e69-78c3-51a5-c0ed-d7f314655bab) @ PCI:0000:02:00.0
Jan 21 07:51:54 m8 kernel: [197028.379958] nvidia-modeset: Allocated GPU:2 (GPU-fb4e4bae-5192-590f-3151-b793f2aaaec6) @ PCI:0000:03:00.0
Jan 21 07:51:55 m8 kernel: [197029.338908] nvidia-modeset: Allocated GPU:3 (GPU-e7a64c57-e302-52d7-908a-3c10b3d33544) @ PCI:0000:04:00.0
Jan 21 07:51:56 m8 kernel: [197030.292160] nvidia-modeset: Allocated GPU:4 (GPU-af125524-a38d-aa30-849a-210db7f73f2f) @ PCI:0000:05:00.0
Jan 21 07:51:57 m8 kernel: [197031.325380] nvidia-modeset: Allocated GPU:5 (GPU-8b7a4614-8f29-9c2f-0f1c-aa15d87bb934) @ PCI:0000:06:00.0
Jan 21 07:51:59 m8 kernel: [197032.623103] nvidia-modeset: Allocated GPU:6 (GPU-5e72c389-98bc-9af3-c875-dda1baa09120) @ PCI:0000:09:00.0
Jan 21 07:52:00 m8 kernel: [197034.333221] nvidia-modeset: Allocated GPU:7 (GPU-095a5f5c-a8a9-f5a3-d55a-0153f8ed9e1f) @ PCI:0000:0a:00.0
Jan 21 07:52:03 m8 kernel: [197036.970865] nvidia-modeset: Allocated GPU:8 (GPU-d2c07689-13c9-fd1f-d3b3-f1c52e419114) @ PCI:0000:0b:00.0
Jan 21 07:52:08 m8 kernel: [197041.947771] nvidia-modeset: Allocated GPU:9 (GPU-b77a08ed-2317-f267-7555-cb2fcce58f81) @ PCI:0000:0c:00.0
Jan 21 07:52:17 m8 kernel: [197051.463534] nvidia-modeset: Allocated GPU:10 (GPU-4fd72b20-d503-e310-827c-3e5b5873e162) @ PCI:0000:0d:00.0
Jan 21 07:52:36 m8 kernel: [197070.371853] nvidia-modeset: Allocated GPU:11 (GPU-3143f61f-0f5d-1fde-17bf-9949bc461857) @ PCI:0000:0e:00.0
Jan 21 07:53:13 m8 kernel: [197107.221568] nvidia-modeset: Allocated GPU:12 (GPU-335ea0b6-06b6-29b7-4a67-da5d47324df7) @ PCI:0000:0f:00.0
Note the increasing delay between allocating each GPU: 1,1,1,1,1,2,3,5,9,19,37 seconds … 81 seconds between allocating GPU0 and GPU12.
This is very painful as you can’t do anything with any of the GPUs until all of them are loaded. You must wait those 80 seconds all the time: X is delayed, nvidia-smi is delayed, etc. What’s worse is that if you try to launch any gpu app while the driver is loading the GPUs, then it takes double the amount of time as the reinitialization occurs again separately for the gpu app which doesn’t see the driver initialized and reinitializes it again.
nvidia-persistenced doesn’t help … it just triggers the same process which takes the same amount of time (while all other GPU apps are blocked). The machine boots in <10 seconds but then I have to wait a 1.5 minutes until doing anything GPU related, including starting X.
Can Nvidia devs PLEASE do something about this? I was hoping newer drivers to fix this, but it made it worse!
Also, could the driver allocate the GPUs in parallel, and without delays?
EDIT: REPORT GENERATED BY nvidia-bug-report.sh: https://dl.dropbox.com/s/s0y84x5933fxiuk/nvidia-bug-report.log.gz