To reproduce this all you need to do is configure any TX2 CAN controller using the restart-ms property. The restart-ms property is a standard parameter in the Linux CAN interface configuration options (see https://www.kernel.org/doc/Documentation/networking/can.txt for further details).
What restart-ms does is allow the CAN controller to recover from the bus-off condition, which is a fault isolation mode a CAN controller may enter when an excessive amount of errors are detected. The bus off condition is something that is defined by the Bosch CAN standard and is not specific to NVIDIA. Every CAN controller I’ve worked with gives you two options for dealing with it 1) permanently staying in the bus off state which prevents all further transmissions from the affected controller until it is power cycled or 2) automatically recovering back to normal operation after enough error free bus operation are observed. We want our design to use option 2 because our CAN buses are regularly exposed to the environment via connectors and spurious errors are a realistic possibility in the field.
The problem here is that when a TX2 CAN controller enters the bus off state (which is usually an exceptional event to be clear) and the restart-ms property is set (to define the time after which the CAN controller should be restarted to recover) the recovery process generates a kernel error which requires a power cycle to recover from. So basically entering the bus-off state is always a permanent failure when that does not necessarily need to be the case.
To test this, all that needs to be required is to configure a TX2 CAN controller with the restart-ms property set and then get it to enter the bus-off state. So for example, you could configure a restart-ms delay of 100 ms on can0 using commands like:
ip link set can0 type can bitrate 1000000 restart-ms 100
ip link set up
To enter the bus-off state after configuring the controller, the easiest thing to do is transmit a CAN frame (any frame, transmitted using any convenient method like a socketCAN socket or something like the cansend tool - GitHub - linux-can/can-utils: Linux-CAN / SocketCAN user space applications) on a disconnected bus. Here I would define a disconnected bus as a bus that contains only the TX2 CAN controller, a single external CAN transceiver connected to the TX2 CAN controller, and a single termination resistor (maybe even without any termination resistors or CAN transceiver to make things even worse electrically). This is not an electrically valid CAN bus, but that is intentional because it’s the easiest way to force the bus-off recovery logic to occur (other options would be using termination resistances that are too large, shorting CANH and CANL, etc).
Hopefully that is enough information for you to reproduce the problem, since you should not require any externally provided code. You just need to explicitly test one of the documented features of the Linux CAN interface and the CAN controller itself.
For reference, when we experience the problem typical output would be as follows (where pld_can is just a convenient alias we have assigned to the c320000.mttcan device using udev).
[root@MKXXXXXXXXXXXXX ~]# cansend pld_can 5A1#11.22.33.44.55.66.77.88
[ 2160.290698] mttcan c320000.mttcan pld_can: entered error warning state
[ 2160.297399] mttcan c320000.mttcan pld_can: entered error passive state
[ 2160.304082] mttcan c320000.mttcan pld_can: entered bus off state
[root@MKXXXXXXXXXXXXX ~]# [ 2160.355457] mttcan_controller_config: ctrlmode 0
[ 2160.360194] mttcan c320000.mttcan pld_can: Bitrate set
[ 2160.365446] IPv6: ADDRCONF(NETDEV_CHANGE): pld_can: link becomes ready
[ 2160.415453] ------------[ cut here ]------------
[ 2160.420066] Kernel BUG at ffffffbffc0f66d0 [verbose debug info unavailable]
[ 2160.427016] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 2160.432490] Modules linked in: mttcan can_dev rffc5071mixer(O) bcmdhd ath_pktlog(PO) umac(O) can_raw ath_ds
[ 2160.455544] CPU: 0 PID: 12134 Comm: kworker/0:0 Tainted: P W O 4.4.38-aeryon #22
[ 2160.463879] Hardware name: aeryon-tx2-flyer (DT)
[ 2160.468495] Workqueue: events can_restart_work [can_dev]
[ 2160.473806] task: ffffffc1e5746400 ti: ffffffc1e9ea8000 task.ti: ffffffc1e9ea8000
[ 2160.481275] PC is at can_restart+0xc8/0xe8 [can_dev]
[ 2160.486230] LR is at can_restart_work+0x10/0x18 [can_dev]
[ 2160.491615] pc : [<ffffffbffc0f66d0>] lr : [<ffffffbffc0f6700>] pstate: 60000045
[ 2160.498994] sp : ffffffc1e9eabd30
[ 2160.502301] x29: ffffffc1e9eabd30 x28: 0000000000000000
[ 2160.507619] x27: 0000000000000000 x26: ffffffc001390000
[ 2160.512939] x25: 0000000000000000 x24: 0000000000000000
[ 2160.518257] x23: ffffffc1f5cd2400 x22: ffffffc0702f4830
[ 2160.523578] x21: ffffffc1f5cccc00 x20: ffffffc1e1c71908
[ 2160.528898] x19: ffffffc1e1c71000 x18: 0000000000000013
[ 2160.534219] x17: 0000007f79709490 x16: ffffffc0001e3240
[ 2160.539539] x15: 0019b52994000000 x14: 0000000000000000
[ 2160.544859] x13: 00000001f4000000 x12: 0000000000000017
[ 2160.550179] x11: 00000000000d298e x10: 00000000000008a0
[ 2160.555497] x9 : ffffffc1e9eabd20 x8 : ffffffc1e5746d00
[ 2160.560817] x7 : 00000000000003b2 x6 : 000000000059d3ba
[ 2160.566136] x5 : 0000000000000000 x4 : ffffffc1f5ccd000
[ 2160.571456] x3 : ffffffc1e5746400 x2 : ffffffc1f5cd2405
[ 2160.576774] x1 : 0000000000000003 x0 : ffffffc1e1c71000
[ 2160.582093]
[ 2160.583580] Process kworker/0:0 (pid: 12134, stack limit = 0xffffffc1e9ea8020)
[ 2160.590784] Call trace:
[ 2160.593228] [<ffffffbffc0f66d0>] can_restart+0xc8/0xe8 [can_dev]
[ 2160.599224] [<ffffffbffc0f6700>] can_restart_work+0x10/0x18 [can_dev]
[ 2160.605654] [<ffffffc0000bc1dc>] process_one_work+0x150/0x448
[ 2160.611388] [<ffffffc0000bc608>] worker_thread+0x134/0x40c
[ 2160.616862] [<ffffffc0000c1ea4>] kthread+0xe0/0xf4
[ 2160.621644] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[ 2160.626946] ---[ end trace 4491671ec513f65d ]---
[ 2160.632926] ------------[ cut here ]------------
[ 2160.637533] WARNING: at ffffffc0000a91c4 [verbose debug info unavailable]
[ 2160.644304] Modules linked in: mttcan can_dev rffc5071mixer(O) bcmdhd ath_pktlog(PO) umac(O) can_raw ath_ds
[ 2160.667343]
[ 2160.668831] CPU: 0 PID: 12134 Comm: kworker/0:0 Tainted: P D W O 4.4.38-aeryon #22
[ 2160.677163] Hardware name: aeryon-tx2-flyer (DT)
[ 2160.681776] task: ffffffc1e5746400 ti: ffffffc1e9ea8000 task.ti: ffffffc1e9ea8000
[ 2160.689245] PC is at __local_bh_enable_ip+0x68/0xb8
[ 2160.694115] LR is at _raw_spin_unlock_bh+0x20/0x28
[ 2160.698896] pc : [<ffffffc0000a91c4>] lr : [<ffffffc000b32b38>] pstate: 400003c5
[ 2160.706274] sp : ffffffc1e9eab9d0
[ 2160.709579] x29: ffffffc1e9eab9d0 x28: ffffffc1e9ea8000
[ 2160.714899] x27: 0000000000000000 x26: ffffffc1e5746400
[ 2160.720219] x25: ffffffc1e5746400 x24: 00000000000003c0
[ 2160.725537] x23: 0000000000000001 x22: 0000000000000000
[ 2160.730854] x21: ffffffc000f21a90 x20: ffffffc1e5746400
[ 2160.736174] x19: ffffffc001412530 x18: 0000000000000013
[ 2160.741491] x17: 0000007f79709490 x16: ffffffc0001e3240
[ 2160.746809] x15: 0019b52994000000 x14: 3534303030303036
[ 2160.752128] x13: 203a657461747370 x12: 0000000000000030
[ 2160.757445] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[ 2160.762763] x9 : fefefefefefefeff x8 : ffffffc1e5746c20
[ 2160.768081] x7 : feff09432c313232 x6 : ffffffc00146e000
[ 2160.773401] x5 : ffffffc0012309c0 x4 : ffffffc001215048
[ 2160.778720] x3 : ffffffc001230920 x2 : 0000000000000000
[ 2160.784038] x1 : 0000000000000201 x0 : ffffffc00138f000
[ 2160.789355]
[ 2160.790842] ---[ end trace 4491671ec513f65e ]---
[ 2160.795448] Call trace:
[ 2160.797887] [<ffffffc0000a91c4>] __local_bh_enable_ip+0x68/0xb8
[ 2160.803793] [<ffffffc000b32b38>] _raw_spin_unlock_bh+0x20/0x28
[ 2160.809616] [<ffffffc00012c7f4>] cgroup_exit+0x58/0xe4
[ 2160.814743] [<ffffffc0000a6d8c>] do_exit+0x29c/0x9a0
[ 2160.819698] [<ffffffc000089c08>] bug_handler.part.3+0x0/0x7c
[ 2160.825344] [<ffffffc000089c48>] bug_handler.part.3+0x40/0x7c
[ 2160.831079] [<ffffffc000089ca0>] bug_handler+0x1c/0x2c
[ 2160.836206] [<ffffffc0000829b8>] brk_handler+0x8c/0xc8
[ 2160.841332] [<ffffffc000081518>] do_debug_exception+0x3c/0xa8
[ 2160.847066] [<ffffffc000084630>] el1_dbg+0x18/0x74
[ 2160.851849] [<ffffffbffc0f6700>] can_restart_work+0x10/0x18 [can_dev]
[ 2160.858275] [<ffffffc0000bc1dc>] process_one_work+0x150/0x448
[ 2160.864007] [<ffffffc0000bc608>] worker_thread+0x134/0x40c
[ 2160.869482] [<ffffffc0000c1ea4>] kthread+0xe0/0xf4
[ 2160.874262] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[ 2160.879863] Unable to handle kernel paging request at virtual address ffffffffffffffd8
[ 2160.887766] pgd = ffffffc1e7d70000
[ 2160.891160] [ffffffffffffffd8] *pgd=0000000267d76003, *pud=0000000267d76003, *pmd=0000000000000000
[ 2160.900134] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[ 2160.905694] Modules linked in: mttcan can_dev rffc5071mixer(O) bcmdhd ath_pktlog(PO) umac(O) can_raw ath_ds
[ 2160.928741] CPU: 0 PID: 12134 Comm: kworker/0:0 Tainted: P D W O 4.4.38-aeryon #22
[ 2160.937075] Hardware name: aeryon-tx2-flyer (DT)
[ 2160.941685] task: ffffffc1e5746400 ti: ffffffc1e9ea8000 task.ti: ffffffc1e9ea8000
[ 2160.949156] PC is at kthread_data+0x4/0xc
[ 2160.953159] LR is at wq_worker_sleeping+0x10/0xc4
[ 2160.957852] pc : [<ffffffc0000c2574>] lr : [<ffffffc0000bd0b4>] pstate: 600002c5
[ 2160.965231] sp : ffffffc1e9eab9a0
[ 2160.968537] x29: ffffffc1e9eab9a0 x28: ffffffc1e9ea8000
[ 2160.973855] x27: 0000000000000000 x26: ffffffc001215000
[ 2160.979174] x25: 0000000000000000 x24: ffffffc000b2f178
[ 2160.984493] x23: 0000000000000000 x22: ffffffc1e5746990
[ 2160.989813] x21: ffffffc0011e9000 x20: ffffffc1e5746400
[ 2160.995133] x19: ffffffc1f5ccd500 x18: ffffffc000bb0038
[ 2161.000450] x17: 000000000000000e x16: 0000000000000007
[ 2161.005771] x15: ffffffc000b3da60 x14: 00000000fa83b2da
[ 2161.011091] x13: 0000000000000001 x12: 0000000001f9dbec
[ 2161.016410] x11: 0000000000000000 x10: 0000000000392d90
[ 2161.021729] x9 : 0000000000392d90 x8 : 00000000000003b2
[ 2161.027049] x7 : 0000000000000000 x6 : 0000000001ff6ab8
[ 2161.032367] x5 : ffffffc1f5ccd500 x4 : ffffffc1f5ccdee0
[ 2161.037687] x3 : 000000000001af3b x2 : ffffffc1ecc03000
[ 2161.043006] x1 : 0000000000000000 x0 : 0000000000000000
[ 2161.048324]
[ 2161.049811] Process kworker/0:0 (pid: 12134, stack limit = 0xffffffc1e9ea8020)
[ 2161.057016] Call trace:
[ 2161.059458] [<ffffffc0000c2574>] kthread_data+0x4/0xc
[ 2161.064502] [<ffffffc000b2eda0>] __schedule+0x348/0x6dc
[ 2161.069715] [<ffffffc000b2f178>] schedule+0x44/0xa8
[ 2161.074583] [<ffffffc0000a70a0>] do_exit+0x5b0/0x9a0
[ 2161.079538] [<ffffffc000089c08>] bug_handler.part.3+0x0/0x7c
[ 2161.085185] [<ffffffc000089c48>] bug_handler.part.3+0x40/0x7c
[ 2161.090918] [<ffffffc000089ca0>] bug_handler+0x1c/0x2c
[ 2161.096046] [<ffffffc0000829b8>] brk_handler+0x8c/0xc8
[ 2161.101173] [<ffffffc000081518>] do_debug_exception+0x3c/0xa8
[ 2161.106906] [<ffffffc000084630>] el1_dbg+0x18/0x74
[ 2161.111693] [<ffffffbffc0f6700>] can_restart_work+0x10/0x18 [can_dev]
[ 2161.118120] [<ffffffc0000bc1dc>] process_one_work+0x150/0x448
[ 2161.123853] [<ffffffc0000bc608>] worker_thread+0x134/0x40c
[ 2161.129328] [<ffffffc0000c1ea4>] kthread+0xe0/0xf4
[ 2161.134108] [<ffffffc000084f90>] ret_from_fork+0x10/0x40
[ 2161.139410] ---[ end trace 4491671ec513f65f ]---
[ 2161.145427] Fixing recursive fault but reboot is needed!