TX1 + JetPack 3.1 + /dev/ttyTHS1 - reliable hard lockup?

I decided to base the next version of APSync on JetPack 3.1

Unfortunately, I’m having a few issues with obtaining a serial connection to the flight controller.

It seems any access to /dev/ttyTHS1 is sufficient to instantly lock the board up. What I’m assuming is a hardware watchdog reboots the board after a minute or so. If the access to the serial port is made on the first tty (ctrl-alt-f1) no further output occurs on that terminal until the board reboots.

“stty </dev/ttyTHS1” is sufficient to kill the board
“screen /dev/ttyTHS1” is sufficient to kill the board

I’ve poked around a little bit:

  • removed the console= line from the kernel commandline (expanded ${cbootargs} in /boot/extlinux/extlinux.conf and removed that specific parameter)
  • ensured the getty/login combo isn’t active on /dev/ttyS0

This seems so easy to provoke I’m somewhat surprised I’m unable to find evidence others are not having similar problems.

Any suggestions?

This is a long way from being particularly helpful, but consider that ttyS# devices use the standard 16550 UART setup, while the ttyTHS# devices are designed to support a DMA transfer (they are different drivers…U-Boot does not support DMA and so sets up ttyS0 as a 16550 UART without DMA). If you were to use a ttyTHS device which did not have DMA set up, then I would expect an attempt to read/write an invalid DMA address. Can you use stty or screen on ttyS instead of ttyTHS?

Thanks!

Hmm. Website appears to have eaten my first response…

I did investigate /dev/ttyS0. Not being DMA is going to be a problem as we do need to run at 921600.

However:
ubuntu@tegra-ubuntu:~$ stty </dev/ttyS0
speed 9600 baud; line = 0;
-brkint -imaxbel
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$ screen /dev/ttyTHS1 115200
[screen is terminating]
ubuntu@tegra-ubuntu:~$ stty </dev/ttyS0
speed 115200 baud; line = 0;
kill = ^H; min = 100; time = 2;
-icrnl -imaxbel
-opost -onlcr
-isig -icanon -echo
ubuntu@tegra-ubuntu:~$

screen yield no data from /dev/ttyS0

The same physical configuration is tested and known working with the older JetPack (against /dev/ttyTHS1).

Have you tried testing various settings while doing loopback? Connect RX to TX, and CTS to RTS. Then attempt screen or other terminal…typing in should echo back if it works. This could be tried at different speeds and with or without CTS/RTS flow control. It would help if you can get echo at lower speeds on one of these without crash, and then increase speed to find the limit. Knowing that ttyS0 or ttyTHS1 works at any speed in loopback would be a very strong hint as to which driver is active.

Btw, I’ve seen another case of higher speeds on the serial port (ttyTHS2, connector J17 on the dev board) be unreliable, but I believe the speeds tested were faster than 921600.

Have you done any kind of device tree change? I’m not sure what would be required to convert J21’s serial port from ttyS0 to ttyTHS1, but device tree configuration has changed between prior releases and R28.1. You might look at the device tree in the working version and compare with non-working…the two might not be identical anyway since there is a lot of change going from the 3.x kernel series to 4.x kernel series.

If you want to extract the exact current definition of device tree on a running Jetson you can do this:

dtc -I fs -O dts -o extracted.dts /proc/device-tree
# To find serial controllers:
gawk '/\tserial[@]/,/[}]/' extracted.dts
# Compare between older working and current non-working version...

If possible please give an exact/detailed list of what was done to disable serial console (basically what needs to be done for someone else to repeat this).

@linuxdev Apologies for the slow response here. My cycle time on the TX1 is quite long due to the time taken to flash, which I haven’t gotten down to a fine art yet…

I have not attempted to work with the /dev/ttyTHS1 with the serial port in any configuration except “known working” and “unplugged”. Behaviour is the same in both cases.

I have done no device tree change on JetPack 3.1 - yet! I will dump the device tree from 3.0 and compare it to 3.1 (I note the kernel version is 3.10 in JetPack 3.0, 4+ in JetPack 3.1…

I took just two steps to disable serial console:

  • use systemctl to disable and stop the service
  • modify /etc/extlinux/extlinux.conf to (a) expand the kernel command line options usually filled in (by u-Boot?) and (b) remove the references to console=/dev/ttyS0 in that expansion

Note that any access to /dev/ttyTHS1 - even just to get the properties - kills the TX1. So “stty </dev/ttyTHS1” on a freshly booted image kills the TX1. That action should be independent of whatever baud rate the port is set to. I have also attempted to start screen with different baud rates - they all lead to the same thing, board lockup and (eventual) reboot.

My WIP instructions for creating the APSync image are here: companion/1_create_base_image.txt at apweb · peterbarker/companion · GitHub

Thanks for your help on this! It would be nice to release this next version of APSync on the more modern JetPack.

Addendum: you might have gleaned from the above - JetPack 3.0 (installed from the 3.1 JetPack installer) does not lock up.

Hi,
Please try below two changes -

change#1 -
diff --git a/hardware/nvidia/soc/tegra/kernel-include/dt-bindings/reset/tegra210-car.h b/hardware/nvidia/soc/tegra/kernel-include/dt-bindings/res
index 296ec6e…b20bb71 100644
— a/hardware/nvidia/soc/tegra/kernel-include/dt-bindings/reset/tegra210-car.h
+++ b/hardware/nvidia/soc/tegra/kernel-include/dt-bindings/reset/tegra210-car.h
@@ -9,5 +9,7 @@
#define TEGRA210_RESET(x) (7 * 32 + (x))
#define TEGRA210_RST_DFLL_DVCO TEGRA210_RESET(0)
#define TEGRA210_RST_ADSP TEGRA210_RESET(1)
+#define TEGRA210_RST_UARTB 7
+#define TEGRA210_RST_XUSB_DEV 95

#endif /* _DT_BINDINGS_RESET_TEGRA210_CAR_H */

change#2 -
diff --git a/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi b/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-
index 8322ffe…c800b80 100644
— a/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
+++ b/hardware/nvidia/soc/t210/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
@@ -615,7 +615,7 @@
clocks = <&tegra_car TEGRA210_CLK_UARTB>,
<&tegra_car TEGRA210_CLK_PLL_P>;
clock-names = “serial”, “parent”;

  •           resets = <&tegra_car TEGRA210_CLK_UARTB>;
    
  •           resets = <&tegra_car TEGRA210_RST_UARTB>;
              reset-names = "serial";
              nvidia,adjust-baud-rates = <115200 115200 100>;
              status = "disabled";
    

Thanks @zjuchi, we’ll organise a test here.

In case this is still relevant, we’ve just published a fix: redtail/tools/install/tx1-uart-patch at master · NVIDIA-AI-IOT/redtail · GitHub

Hey akamenev,

Can you also share the necessary changes to the .dts files for this fix? Or is it the same as zjuchi posted?

Thanks.

@zjuchi @akamenev This patch does the job for me - thanks, and apologies for the glacial response.