Kernel Panic On Running NvCamera Daemon Processes

I am in the process of trying to get a tx2 to boot repeatedly while initiating gstream pipelines on startup. I have been doing hard reboots and soft reboots and trying to track down failures when they inevitably occur communicating with the daemon. Most recently, I have an error that the daemon is producing while trying to instantiate gstreamer to read from the camera. This error is persisting through manual systemctl stops and starts. I did not want to reboot yet because i wanted to share the output.

I have a script that runs gstream pipelines at the moment. The script is as such:
gst-launch-1.0 nvcamerasrc sensor-id=$1 fpsRange=“30.0 30.0” ! ‘video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)I420, framerate=(fraction)30/1’ ! nvtee ! nvvidconv ! ‘video/x-raw, format=(string)I420, framerate=(fraction)30/1’ ! tee ! v4l2sink device=/dev/video$2

It is executed like

./test_gst.sh 0 6

where 0 is the nvcamerasrc id and 6 is a loopback device used for viewing.

The syslog error is as follows:

nvcamera-daemon[4263]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x92000006
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787493] nvcamera-daemon[4264]: unhandled level 2 translation fault (11) at 0x00000ce9, esr 0x92000046
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787496] pgd = ffffffc05dba6000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787504] [00000ce9] *pgd=00000000fb035003, *pud=00000000fb035003, *pmd=0000000000000000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787506] 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787515] CPU: 4 PID: 4264 Comm: nvcamera-daemon Tainted: G           O    4.4.38 #1
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787518] Hardware name: quill (DT)
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787521] task: ffffffc1ec60b200 ti: ffffffc1e6108000 task.ti: ffffffc1e6108000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787526] PC is at 0x7f9c86c10c
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787528] LR is at 0x7f9c86c100
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787532] pc : [<0000007f9c86c10c>] lr : [<0000007f9c86c100>] pstate: 60000000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787533] sp : 0000007f9afb3840
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787537] x29: 0000007f9afc49d0 x28: 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787541] x27: 0000007f9c89c930 x26: 0000007f9cafa380 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787544] x25: 0000000000000cf0 x24: 0000007f8c009ec8 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787548] x23: 0000000000000005 x22: 0000007f8c00bc00 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787551] x21: 0000007f9cd22c98 x20: 0000007f9cd22c98 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787554] x19: 0000007f9cafa000 x18: 0000000000000014 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787557] x17: 0000007f9cf09d30 x16: 0000007f9cd4f4a0 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787560] x15: 00000000000001db x14: 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787563] x13: 0000000000000008 x12: a3d70a3d70a3d70b 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787566] x11: 0000007f9afb3840 x10: 0000007f9afb3840 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787569] x9 : ffffff80ffffffc8 x8 : 0000000000000040 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787572] x7 : 0000000000000000 x6 : 0000007f9afb35fc 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787575] x5 : 0000007f9afc52e8 x4 : 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787578] x3 : 0000000000000000 x2 : 0000000000000001 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787581] x1 : 0000007f94b38340 x0 : 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787582] 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787590] Library at 0x7f9c86c10c: 0x7f9c83a000 /usr/lib/aarch64-linux-gnu/tegra/libnvodm_imager.so
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787593] Library at 0x7f9c86c100: 0x7f9c83a000 /usr/lib/aarch64-linux-gnu/tegra/libnvodm_imager.so
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.787595] vdso base = 0x7f9da0d000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.959488] pgd = ffffffc05dba6000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.964097] [00000000] *pgd=00000000fb035003, *pud=00000000fb035003, *pmd=0000000000000000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.972507] 
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.974012] CPU: 0 PID: 4263 Comm: nvcamera-daemon Tainted: G           O    4.4.38 #1
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.981939] Hardware name: quill (DT)
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.985638] task: ffffffc1b0f42580 ti: ffffffc1c5db0000 task.ti: ffffffc1c5db0000
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.993146] PC is at 0x403038
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.996141] LR is at 0x403034
Apr 18 14:56:55 tegra-ubuntu kernel: [  500.999247] pc : [<0000000000403038>] lr : [<0000000000403034>] pstate: 20000000
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.006725] sp : 0000007f9b7c0290
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.010117] x29: 0000007f9b7c49d0 x28: 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.015527] x27: 0000000000000006 x26: 0000007f9b7c4300 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.020895] x25: 0000000000404000 x24: 0000000000000334 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.026285] x23: 0000007f9b7c2300 x22: 0000007f9b7c1300 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.031681] x21: 0000007f9b7c2244 x20: 0000007f9b7c0310 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.037098] x19: 0000007f9b7c2930 x18: 0000000000000014 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.042473] x17: 0000007f9cffcfb0 x16: 0000007f9cd4f540 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.047873] x15: 0000007f9da0e000 x14: 7265766972446172 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.053232] x13: 656d61432f697061 x12: 2f637273206d6f72 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.058628] x11: 6620676e69746167 x10: 61706f7270282020 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.064014] x9 : 3a726574656d6172 x8 : 0000000000000062 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.069404] x7 : 0000007f946b3620 x6 : 0000000000000001 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.074773] x5 : 0000000000000000 x4 : 0000007f94000b10 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.080165] x3 : 0000000000000000 x2 : 0000000000000001 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.085536] x1 : 0000000000000081 x0 : 0000000000000000 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.090944] 
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.092439] Library at 0x403038: 0x400000 /usr/sbin/nvcamera-daemon
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.098712] Library at 0x403034: 0x400000 /usr/sbin/nvcamera-daemon
Apr 18 14:56:55 tegra-ubuntu kernel: [  501.105034] vdso base = 0x7f9da0d000
Apr 18 14:56:55 tegra-ubuntu systemd[1]: nvcamera-daemon.service: Main process exited, code=killed, status=11/SEGV
Apr 18 14:56:57 tegra-ubuntu systemd[1]: nvcamera-daemon.service: Unit entered failed state.
Apr 18 14:56:57 tegra-ubuntu systemd[1]: nvcamera-daemon.service: Failed with result 'signal'.
Apr 18 14:56:58 tegra-ubuntu systemd[1]: nvcamera-daemon.service: Service hold-off time over, scheduling restart.

Hi,
Please try to run your script after certain of startup. Please refer to:
[url]https://devtalk.nvidia.com/default/topic/1048409/jetson-tx2/gstreamer-pipeline-doesn-t-return/post/5320560/#5320560[/url]

What is happening behind the scenes with the daemon that requires 60s after full boot to initialize? Is this called out in documentation somewhere? This seems like a rather large issue for industrial applications that require the system to boot every time from a hard power cycle.

Can you explain what the kernel panic is from?

Hi,
You may try 5 or 10 second instead of 60.

Does the crash happen to default ov5693 or specific to your camera sensor?

I have tried, previously, to put a 20s delay in my program start that uses the camera daemon, but that did not resolve the issues. What leads you to believe it is a timing issue?

This is on a custom board with imx290 cameras.

Is there any way to diagnose why the nvcamera-daemon would be hanging or dumping? The only thing we can do is rely on Nvidia for support, correct?

Can nvidia provide an explanation as to why this dump can occur from a userspace daemon and not recover?

It looks like nvcamera-daemon is segfaulting. You said this is on a custom board so I imagine you also have a custom DTB. When nvcamera-daemon or argus_daemon segfault it’s almost always due to incorrect or unexpected values in the tegra-camera-platform device tree node. In particular check that proc-device-tree, devname, and the other attributes are correct.

It’s not always help but if you run nvcamera-daemon from a shell you will see some additional output that may help you diagnose the problem.

# These environment variables are optional but will produce more verbose output
export enableCamPclLogs=5
export enableCamScfLogs=5
nvcamera-daemon

Do you get valid output from your camera when you run nvgstcapture with no arguments? This is the easiest test I know of for CSI cameras using the ISP.

I recently ran into a scenario that causes a segfault in argus_daemon. We added a boolean property to the device tree node for our camera (e.g. if it’s present we consider that as true). This will result in a segfault in argus_daemon. There’s a good chance the segfault is actually in a supporting library and will cause a segfault in nvcamera-daemon too but I haven’t verified that.

One point may be to have the nvcamera socket created.
Using this as /etc/rc.local seems ok:

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#

LOGFILE=/home/nvidia/Desktop/rclocal.log
/bin/rm -f $LOGFILE
(while [ ! -S /tmp/nvcamera_socket ]; do cat /proc/uptime | cut -d '.' -f1 >> $LOGFILE; sleep 1; done; sleep 1;
/usr/bin/gst-launch-1.0 nvcamerasrc sensor-id=0 fpsRange="30.0 30.0" ! 'video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, format=(string)I420, framerate=(fraction)30/1' ! nvvidconv ! 'video/x-raw, format=(string)I420, framerate=(fraction)30/1' ! tee ! v4l2sink device=/dev/video10 >> $LOGFILE) &

exit 0

Usually it is created when uptime is about 10s (when /etc/rc.local starts about 7-8s uptime).

D3_growe:
I was successfully able to run nvgstcapture for both Argus and NvCamera-Daemon from the command line. We had another company develop the bsp and drivers for our system, but if there is any information you can glean from the log that points to a device tree issue, please let me know as I will push that back on them.

Honey:
I will give this a try on monday. Can you provide a little theory on waiting for the camera_socket in tmp? I’m not familiar with what the daemon is doing there.

In general, it seems to work most of the time. However, a few times out of ten it fails. I’m wondering if there would be a recommended way of starting up an application that connects to the daemon (whether it be gstreams or an application) that is better than others? Is there a reason where the daemon would require some sort of delay between the opening of multiple cameras (i.e. 4 separate gstream pipelines accessing different cameras – which is what I am doing)?

Hi Ben,

If nvgstcapture works from the command line then the issues I had in mind are not your problem.

I’m very curious to find out what the problem ultimately is.

Are you certain that your camera driver probes on every single boot? If the camera driver doesn’t load then nvcamera-daemon might segfault.


Greg

Hi,

We are deprecating nvcamerasrc. Suggest you also try nvarguscamerasrc. If you can reproduce the issue with nvarguscamerasrc + ov5693, please share script and steps for reproducing the issue. If it is an issue in NVIDIA SW stacks and not in imx290 drivers, it shall be reproduced with default camera ov5693.
PLease help us reproduce the issue fist so that we can do further check and give explanation.

D3: I believe it does. When I check dmesg I believe I see all of the messages for success from our driver. Is there a more automated and convenient way of checking for driver probe through reboots without manually scanning or grepping dmesg (or kern.log)?

DaneLL: I will try and reproduce this on the devkit. The only issue with doing so is that I am limited to 1 camera for testing. My current setup has a max of 6 imx290s. If it’s a race condition or a timing issue due to multiple instantiations of sockets on the daemon, you won’t see it.

Ben,

You could script the existence of the video node(s) you are interested in if these are the only video devices in the system. You could also use udev rules.

if [ -e /dev/video0 ]; then
# video node exists
fi

Honey and DaneLL,

Can you provide a little background on the “/tmp/nvcamera_socket”? What is it, when is it instantiated and when is it destroyed? I think we May be on to something with that.

My understanding is that nvcamera-deamon creates this socket when it starts and then listen to it for requests from clients opening this socket. When the deamon crashes you’ll usually see a ‘socket read error’ message.

When starting a gstreamer pipeline with nvcamerasrc client before this socket exists, it cannot open the socket for sending requests, so it cannot work. In my case (devkit + R28.2.0), it doesn’t crash, though. It may be related to having several clients that I can’t test.

The script I’ve posted above just waits for this socket to be created, not sure but it might be safer to add one more second delay for being sure the deamon is ready.

I haven’t used yet, but note there is also a socket /tmp/argus_socket in case you move to nvarguscamerasrc.

Hi,
The SW stacks are

nvcamerasrc - (/tmp/nvcamera_socket) - nvcamera-daemon - low level camera driver

It is initialized at

/etc/init/nvcamera-daemon.conf

Honey,

I’ve added the check for the socket and I also added additional 5s sleeps between each gstream instantiation. It seems both of these, independently still fail, but together they seem remarkably more stable. Those changes went 40/40 yesterday on hard reboots, whereas previously the errors were happening about 3 out of 20 reboots. Unfortunately today, on the very first try, camera 0 failed with both “fixes” added in. Perhaps, like you said, we need longer than 1 second after the validation of the socket being available.

Just so everyone is clear, when just running the gstreams with the scripts, the kernel messages seem to have subsided. The result I get now, IF a single camera doesn’t produce images, is that the daemon doesn’t crash, but instead doesn’t produce any frames at all to the gstream pipeline and instead writes this in syslog:

tegra-ubuntu nvcamera-daemon[1645]: (548470317536) Error in calling capture= 86 result= 8

The fact it worked fine yesterday and fails this morning would lead me to the question: did the board or one camera move ? Be sure you have good connections and if possible check there is no noise on I2C connections. Some cameras also generate some noise before their power up sequence leading to I2C bus being busy, but I can’t tell how you would prevent this for your case.