We are running Strongswan 5.3.5 with an IPSec tunnel between 2 Jetson nodes running R28.2. We added the necessary kernel modules as outlined in the Strongswan install instructions and the tunnel comes up fine. However, after some amount of time the tunnel becomes unstable and we see kernel errors in kern.log.
From what I can gather, it appears tegra_se_aes_queue_req is doing some scheduling when it shouldn’t. Has anyone else encountered this issue or is this a legit bug? We will be upgrading to Strongswan 5.6.2 to test if that helps but it appears to be a problem in tegra specific code.
[ 2168.298684] BUG: scheduling while atomic: swapper/5/0/0x00000103
[ 2168.304694] Modules linked in: xfrm6_mode_tunnel xfrm4_mode_tunnel xt_policy nfnetlink_queue nfnetlink_log nfnetlink bluetooth xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 pci_manager(O) dmadriver(PO) fuse ip6table_filter bcmdhd xt_conntrack iptable_filter pci_tegra ip_tables bluedroid_pm
[ 2168.330840] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P O 4.4.38-tegra #2
[ 2168.338482] Hardware name: quill (DT)
[ 2168.342135] Call trace:
[ 2168.344582] [<ffffffc000089388>] dump_backtrace+0x0/0xe8
[ 2168.349882] [<ffffffc000089484>] show_stack+0x14/0x20
[ 2168.354926] [<ffffffc000379b18>] dump_stack+0xa0/0xc8
[ 2168.359970] [<ffffffc0000c9bd0>] __schedule_bug+0x48/0x60
[ 2168.365358] [<ffffffc000bd0a4c>] __schedule+0x614/0x750
[ 2168.370572] [<ffffffc000bd0bcc>] schedule+0x44/0xb8
[ 2168.375439] [<ffffffc000bd1098>] schedule_preempt_disabled+0x20/0x40
[ 2168.381781] [<ffffffc0000eb234>] mutex_optimistic_spin+0x1a4/0x1e8
[ 2168.387948] [<ffffffc000bd269c>] __mutex_lock_slowpath+0x3c/0x158
[ 2168.394027] [<ffffffc000bd2804>] mutex_lock+0x4c/0x68
[ 2168.399069] [<ffffffc00098316c>] tegra_se_aes_queue_req+0x34/0xa8
[ 2168.405150] [<ffffffc00098338c>] tegra_se_aes_cbc_encrypt+0x2c/0x38
[ 2168.411404] [<ffffffc0003415ec>] crypto_authenc_encrypt+0x114/0x148
[ 2168.417659] [<ffffffc000307ecc>] echainiv_encrypt+0x124/0x148
[ 2168.423396] [<ffffffbffcef3e28>] esp_output+0x320/0x490 [esp4]
[ 2168.429217] [<ffffffc000af35a0>] xfrm_output_resume+0x160/0x3a8
[ 2168.435124] [<ffffffc000af38d4>] xfrm_output+0x44/0xf8
[ 2168.440251] [<ffffffc000ae7858>] xfrm4_output_finish+0x20/0x28
[ 2168.446072] [<ffffffc000ae76ec>] __xfrm4_output+0x34/0x60
[ 2168.451458] [<ffffffc000ae78f0>] xfrm4_output+0x90/0xa0
[ 2168.456674] [<ffffffc000a9502c>] ip_local_out+0x44/0x58
[ 2168.461887] [<ffffffc000a95304>] ip_queue_xmit+0x124/0x388
[ 2168.467362] [<ffffffc000aac93c>] tcp_transmit_skb+0x424/0x920
[ 2168.473095] [<ffffffc000aae908>] tcp_send_ack+0x110/0x170
[ 2168.478483] [<ffffffc000ab0d84>] tcp_delack_timer_handler+0x104/0x210
[ 2168.484910] [<ffffffc000ab0ec4>] tcp_delack_timer+0x34/0xc0
[ 2168.490472] [<ffffffc000107e2c>] call_timer_fn+0x54/0x1d8
[ 2168.495860] [<ffffffc0001081ec>] run_timer_softirq+0x224/0x2a8
[ 2168.501682] [<ffffffc0000a837c>] __do_softirq+0x124/0x350
[ 2168.507069] [<ffffffc0000a8828>] irq_exit+0x88/0xe0
[ 2168.511937] [<ffffffc0000f6450>] __handle_domain_irq+0x60/0xb8
[ 2168.517756] [<ffffffc000081774>] gic_handle_irq+0x64/0xc0
[ 2168.523144] [<ffffffc000084740>] el1_irq+0x80/0xf8
[ 2168.527925] [<ffffffc000864b08>] cpuidle_enter+0x18/0x20
[ 2168.533227] [<ffffffc0000e907c>] call_cpuidle+0x24/0x50
[ 2168.538440] [<ffffffc0000e9318>] cpu_startup_entry+0x270/0x340
[ 2168.544262] [<ffffffc00008e10c>] secondary_start_kernel+0x12c/0x168
[ 2168.550514] [<0000000080081adc>] 0x80081adc
[ 2168.554807] timer: tcp_delack_timer+0x0/0xc0 preempt leak: 00000101 -> ffffffff
[ 2168.562168] ------------[ cut here ]------------
[ 2168.566778] WARNING: at ffffffc000107f94 [verbose debug info unavailable]
[ 2168.573550] Modules linked in: xfrm6_mode_tunnel xfrm4_mode_tunnel xt_policy nfnetlink_queue nfnetlink_log nfnetlink bluetooth xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 pci_manager(O) dmadriver(PO) fuse ip6table_filter bcmdhd xt_conntrack iptable_filter pci_tegra ip_tables bluedroid_pm
[ 2168.599637]
[ 2168.601125] CPU: 5 PID: 0 Comm: swapper/5 Tainted: P W O 4.4.38-tegra #2
[ 2168.608765] Hardware name: quill (DT)
[ 2168.612418] task: ffffffc1ece83e80 ti: ffffffc1ecea0000 task.ti: ffffffc1ecea0000
[ 2168.619889] PC is at call_timer_fn+0x1bc/0x1d8
[ 2168.624322] LR is at call_timer_fn+0x1bc/0x1d8
[ 2168.628755] pc : [<ffffffc000107f94>] lr : [<ffffffc000107f94>] pstate: 00000045
[ 2168.636135] sp : ffffffc1ecea3be0
[ 2168.639440] x29: ffffffc1ecea3be0 x28: ffffffc1dfd4e5a8
[ 2168.644758] x27: ffffffc001465060 x26: ffffffc1f5fefc38
[ 2168.650074] x25: ffffffc000bde000 x24: ffffffc1dfd4e180
[ 2168.655394] x23: ffffffc000ab0e90 x22: 0000000000000101
[ 2168.660711] x21: ffffffc1dfd4e5a8 x20: ffffffc0014656a0
[ 2168.666029] x19: ffffffc001464000 x18: 0000000000000000
[ 2168.671347] x17: 0000000000000004 x16: 00000000210b0001
[ 2168.676664] x15: 0000000000000010 x14: 3030203a6b61656c
[ 2168.681983] x13: 2074706d65657270 x12: 20306378302f3078
[ 2168.687301] x11: 302b72656d69745f x10: 6b63616c65645f70
[ 2168.692620] x9 : 000000000001abb2 x8 : ffffffc0002e2c00
[ 2168.697940] x7 : ffffffc00131fd08 x6 : 0000000000000053
[ 2168.703258] x5 : 0000000000000000 x4 : 0000000000000000
[ 2168.708575] x3 : 0000000000000000 x2 : ffffffc1ecea0000
[ 2168.713895] x1 : 00000000ffffffff x0 : 0000000000000043
[ 2168.719213]
[ 2168.720995] ---[ end trace c836e4164d6e79ad ]---
[ 2168.725603] Call trace:
[ 2168.728046] [<ffffffc000107f94>] call_timer_fn+0x1bc/0x1d8
[ 2168.733520] [<ffffffc0001081ec>] run_timer_softirq+0x224/0x2a8
[ 2168.739342] [<ffffffc0000a837c>] __do_softirq+0x124/0x350
[ 2168.744728] [<ffffffc0000a8828>] irq_exit+0x88/0xe0
[ 2168.749596] [<ffffffc0000f6450>] __handle_domain_irq+0x60/0xb8
[ 2168.755418] [<ffffffc000081774>] gic_handle_irq+0x64/0xc0
[ 2168.760806] [<ffffffc000084740>] el1_irq+0x80/0xf8
[ 2168.765589] [<ffffffc000864b08>] cpuidle_enter+0x18/0x20
[ 2168.770891] [<ffffffc0000e907c>] call_cpuidle+0x24/0x50
[ 2168.776104] [<ffffffc0000e9318>] cpu_startup_entry+0x270/0x340
[ 2168.781926] [<ffffffc00008e10c>] secondary_start_kernel+0x12c/0x168
[ 2168.788180] [<0000000080081adc>] 0x80081adc