FFmpeg cannot init CUDA for transcoding

Gendalph · May 17, 2019, 3:45pm

I was asked to set up a headless machine using a GTX 1080 for video transcoding acceleration, and under load I get “Cannot init CUDA” error from ffmpeg, nvidia-smi reports “No devices were found”, and all of this is accompanied by “rm_init_adapter failed” kernel messages.

bug-report: nvidia-bug-report.log.gz

FFmpeg reports it cannot init CUDA (using verbose logging here):

[graph_1_in_0_1 @ 0x55ed1b4980c0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x60f
[format_out_0_1 @ 0x55ed1b498c80] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[auto_resampler_0 @ 0x55ed1b498a80] ch:6 chl:5.1(side) fmt:fltp r:48000Hz -> ch:2 chl:stereo fmt:fltp r:48000Hz
[graph 0 input from stream 0:0 @ 0x55ed1d2e2b40] w:1920 h:1080 pixfmt:yuv420p10le tb:1/1000 fr:24000/1001 sar:1/1 sws_param:flags=2
[scaler_out_0_0 @ 0x55ed1d2e3800] w:1280 h:720 flags:'bicubic' interl:0
[scaler_out_0_0 @ 0x55ed1d2e3800] w:1920 h:1080 fmt:yuv420p10le sar:1/1 -> w:1280 h:720 fmt:yuv420p sar:1/1 flags:0x4
[h264_nvenc @ 0x55ed1b42da40] Loaded Nvenc version 9.0
[h264_nvenc @ 0x55ed1b42da40] Nvenc initialized successfully
[h264_nvenc @ 0x55ed1b42da40] Cannot init CUDA
[h264_nvenc @ 0x55ed1b42da40] Nvenc unloaded
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height

dmesg is basically flooded by these messages

[608628.794213] NVRM: RmInitAdapter failed! (0x25:0x51:1084)
[608628.801279] NVRM: rm_init_adapter failed for device bearing minor number 0

Calling nvidia-smi in a loop does make it through, eventually, but most of the time I get “No devices were found”.
This happens within a day of starting encoding tasks and lasts even after the load is gone. Resetting GPU (using “nvidia-smi -r”) doesn’t really help. The only way to fix this is to reboot.
I didn’t have any issues during test-runs (ran transcoding a single video in a loop over weekend).

Machine is running Ubuntu 16.04 (upgraded from 14.04), 4.16.3 mainline kernel, nvidia-418 drivers and FFmpeg 4.1.3 (from this PPA, rebuilt with “–enable-nvenc --enable-cuda”, mostly following this guide).

gendalph@d5528:~$ ffmpeg -version
ffmpeg version 4.1.3-0local~16.04 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 20160609
configuration: --prefix=/usr --extra-version='0local~16.04' --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-nonfree --enable-libfdk-aac --enable-nvenc --enable-cuda --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 22.100 / 56. 22.100
libavcodec     58. 35.100 / 58. 35.100
libavformat    58. 20.100 / 58. 20.100
libavdevice    58.  5.100 / 58.  5.100
libavfilter     7. 40.101 /  7. 40.101<a target='_blank' rel='noopener noreferrer' href=''></a>
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  3.100 /  5.  3.100
libswresample   3.  3.100 /  3.  3.100
libpostproc    55.  3.100 / 55.  3.100

FFmpeg command line:

ffmpeg -y -i video.mkv -vcodec h264_nvenc -acodec aac -b:v 3000k -g 25 -threads 16 -vsync 2 -s 1920x1080 -movflags +faststart -map 0:0 -map 0:1 -b:a 192k -ac 2 video.mp4

Ubuntu 14.04 had the same issue, but much more severe. It had CUDA 8.0 installed, with 384.111 drivers, 4.4.0-121-generic kernel and ffmpeg 3.3.2.

generix · May 17, 2019, 9:37pm

Please configure nvidia-persistenced to continuously run, it’s starting and stopping currently so the gpu will init/deinit.

Gendalph · May 17, 2019, 10:18pm

Thanks!
Had to tweak the systemd unit a bit, but now it seems to be starting and not shutting down immediately, mainly ExecStart part:

ExecStart=
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose
Restart=always

I will report on results next week.

username.murphy · May 19, 2019, 2:07am

FFmpeg build for NVIDIA PRIME-enabled systems on Ubuntu 18.04LTS+. NVENC, QuickSync and VAAPI hwaccels are enabled.

Sasys hes from nVidia
https://gist.github.com/Brainiarc7/5fbecef51470d2d25a0747444abc2c53

Gendalph · May 20, 2019, 1:30pm

Yep, running nvidia-persistenced fixes the issue, but you have to configure it to run:

You need to modify parameters used to start the daemon, because by default it automatically exits. To do that you need to modify the SystemD unit, best way to to that is through drop-ins.

Default unit can be found here:

/lib/systemd/system/nvidia-persistenced.service

And this is how it looks:

[Unit]
Description=NVIDIA Persistence Daemon
Wants=syslog.target

[Service]
Type=forking
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced

What we need to change:

Run in persistence mode
Start at boot

This is how its done:

# default editor is nano, I prefer vim
export EDITOR=vim
sudo systemctl edit nvidia-persistenced.service

Copy/paste or type in this:

[Unit]
Wants=syslog.target

[Service]
[Unit]
# Avoids certain startup errors
After=systemd-user-sessions.service

[Service]
# This line is required - it resets ExecStart so we can override it
ExecStart=
ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose
Restart=always

[Install]
# This causes the unit to be started automatically during boot
WantedBy=multi-user.target

Now enable and [re]start it:

sudo systemctl enable nvidia-persistenced.service
sudo systemctl restart nvidia-persistenced.service

And this how the log should look like:

$ systemctl status nvidia-persistenced.service

● nvidia-persistenced.service - NVIDIA Persistence Daemon
   Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; static; vendor preset: enabled)
  Drop-In: /etc/systemd/system/nvidia-persistenced.service.d
           └─override.conf
   Active: active (running) since Fri 2019-05-17 22:13:19 UTC; 2 days ago
  Process: 6894 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
  Process: 6899 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose (code=exited, status=0/SUCCESS)
 Main PID: 6900 (nvidia-persiste)
   CGroup: /system.slice/nvidia-persistenced.service
           └─6900 /usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose

May 17 22:13:18 d5528 systemd[1]: Starting NVIDIA Persistence Daemon...
May 17 22:13:18 d5528 nvidia-persistenced[6900]: Verbose syslog connection opened
May 17 22:13:18 d5528 nvidia-persistenced[6900]: Now running with user ID 109 and group ID 119
May 17 22:13:18 d5528 nvidia-persistenced[6900]: Started (6900)
May 17 22:13:18 d5528 nvidia-persistenced[6900]: device 0000:82:00.0 - registered
May 17 22:13:19 d5528 nvidia-persistenced[6900]: device 0000:82:00.0 - persistence mode enabled.
May 17 22:13:19 d5528 nvidia-persistenced[6900]: device 0000:82:00.0 - NUMA memory onlined.
May 17 22:13:19 d5528 systemd[1]: Started NVIDIA Persistence Daemon.
May 17 22:13:19 d5528 nvidia-persistenced[6900]: Local RPC services initialized

This fixed the issue for me.