Jetson TX2 bootup failure after reach 142F / 61C

Hi, When the TX2 module reach 142F / 61C I got the following error at the bootup messages after soft reset (reset by using reset button)

4.004884] tegra18-bridge 2390000.axi2apb: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.004887] tegra18-bridge 2390000.axi2apb: enabled timeout = 11155000
[ 4.004890] tegra18-bridge 2390000.axi2apb: bridge probed OK
[ 4.004968] tegra18-bridge 23a0000.axi2apb: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.004970] tegra18-bridge 23a0000.axi2apb: enabled timeout = 11155000
[ 4.004972] tegra18-bridge 23a0000.axi2apb: bridge probed OK
[ 4.005040] tegra18-bridge 23b0000.axi2apb: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005043] tegra18-bridge 23b0000.axi2apb: enabled timeout = 11155000
[ 4.005044] tegra18-bridge 23b0000.axi2apb: bridge probed OK
[ 4.005115] tegra18-bridge 23c0000.axi2apb: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005117] tegra18-bridge 23c0000.axi2apb: enabled timeout = 11155000
[ 4.005119] tegra18-bridge 23c0000.axi2apb: bridge probed OK
[ 4.005188] tegra18-bridge 23d0000.axi2apb: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005191] tegra18-bridge 23d0000.axi2apb: enabled timeout = 11155000
[ 4.005192] tegra18-bridge 23d0000.axi2apb: bridge probed OK
[ 4.005259] tegra18-bridge 2100000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005264] tegra18-bridge 2100000.axip2p: enabled timeout = 11155000
[ 4.005266] tegra18-bridge 2100000.axip2p: bridge probed OK
[ 4.005328] tegra18-bridge 2110000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005333] tegra18-bridge 2110000.axip2p: enabled timeout = 11155000
[ 4.005334] tegra18-bridge 2110000.axip2p: bridge probed OK
[ 4.005433] tegra18-bridge 2120000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005437] tegra18-bridge 2120000.axip2p: enabled timeout = 11155000
[ 4.005440] tegra18-bridge 2120000.axip2p: bridge probed OK
[ 4.005507] tegra18-bridge 2130000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005513] tegra18-bridge 2130000.axip2p: enabled timeout = 11155000
[ 4.005514] tegra18-bridge 2130000.axip2p: bridge probed OK
[ 4.005576] tegra18-bridge 2140000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005580] tegra18-bridge 2140000.axip2p: enabled timeout = 11155000
[ 4.005582] tegra18-bridge 2140000.axip2p: bridge probed OK
[ 4.005642] tegra18-bridge 2150000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005649] tegra18-bridge 2150000.axip2p: enabled timeout = 11155000
[ 4.005650] tegra18-bridge 2150000.axip2p: bridge probed OK
[ 4.005733] tegra18-bridge 2160000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005739] tegra18-bridge 2160000.axip2p: enabled timeout = 11155000
[ 4.005741] tegra18-bridge 2160000.axip2p: bridge probed OK
[ 4.005815] tegra18-bridge 2170000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005821] tegra18-bridge 2170000.axip2p: enabled timeout = 11155000
[ 4.005822] tegra18-bridge 2170000.axip2p: bridge probed OK
[ 4.005888] tegra18-bridge 2180000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005894] tegra18-bridge 2180000.axip2p: enabled timeout = 11155000
[ 4.005895] tegra18-bridge 2180000.axip2p: bridge probed OK
[ 4.005966] tegra18-bridge 2190000.axip2p: axi_cbb clk rate = 115 MHZ, timeout = 97000 useconds
[ 4.005971] tegra18-bridge 2190000.axip2p: enabled timeout = 11155000
[ 4.005973] tegra18-bridge 2190000.axip2p: bridge probed OK
[ 4.006068] **** A57 ECC: Enabled
[ 4.006072] tegra18_a57_serr_init: on CPU 4 a A57 Core
[ 4.006185] tegra18x_actmon d230000.actmon: in actmon_register()…
[ 4.006769] tegra18x_actmon d230000.actmon: initialization Completed for the device mc_all
[ 4.007728] hw perfevents: enabled with denver15_uncore_pmu PMU driver, 3 counters available
[ 4.007903] denver_knobs_init:MTS_VERSION:43068234
[ 4.014642] nvpmodel: initialized successfully
[ 4.018947] usbcore: registered new interface driver snd-usb-audio
[ 4.020413] mmc0: mmc_decode_ext_csd: CMDQ supported: depth: 31, cmdq_support: 1
[ 4.026901] pre_t19x_iso_plat_register(): iso bandwidth 24576KB is not available, client ape_adma
[ 4.026905] tegra_isomgr_adma_register: Failed to register adma isomgr client. err=-22
[ 4.028368] input: tegra-hda HDMI/DP,pcm=3 as /devices/3510000.hda/sound/card0/input0
[ 4.028724] input: tegra-hda HDMI/DP,pcm=7 as /devices/3510000.hda/sound/card0/input1
[ 4.036487] mmc0: periodic cache flush enabled
[ 4.036525] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[ 4.041897] mmcblk0: mmc0:0001 032G34 29.1 GiB
[ 4.042713] mmcblk0boot0: mmc0:0001 032G34 partition 1 4.00 MiB
[ 4.043125] mmcblk0boot1: mmc0:0001 032G34 partition 2 4.00 MiB
[ 4.047520] mmcblk0rpmb: mmc0:0001 032G34 partition 3 4.00 MiB
[ 4.052332] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 p26 p27 p28 p29 p30 p31
[ 4.054271] OPE platform probe
[ 4.054437] OPE platform probe successful
[ 4.062939] tegra-asoc: sound: This is a dummy codec
[ 4.065885] tegra-pcie 10003000.pcie-controller: link 2 down, retrying
[ 4.067919] tegra-pcie 10003000.pcie-controller: link 2 down, ignoring
[ 4.067940] tegra-pcie 10003000.pcie-controller: PCIE: no end points detected
[ 4.068627] tegra-pcie 10003000.pcie-controller: PCIE: Disable power rails
[ 4.132148] tegra-asoc: sound: ADMAIF1 <-> ADMAIF1 mapping ok
[ 4.132318] tegra-asoc: sound: ADMAIF2 <-> ADMAIF2 mapping ok
[ 4.132453] tegra-asoc: sound: ADMAIF3 <-> ADMAIF3 mapping ok
[ 4.132605] tegra-asoc: sound: ADMAIF4 <-> ADMAIF4 mapping ok
[ 4.132749] tegra-asoc: sound: ADMAIF5 <-> ADMAIF5 mapping ok
[ 4.132900] tegra-asoc: sound: ADMAIF6 <-> ADMAIF6 mapping ok
[ 4.133046] tegra-asoc: sound: ADMAIF7 <-> ADMAIF7 mapping ok
[ 4.133188] tegra-asoc: sound: ADMAIF8 <-> ADMAIF8 mapping ok
[ 4.133335] tegra-asoc: sound: ADMAIF9 <-> ADMAIF9 mapping ok
[ 4.133475] tegra-asoc: sound: ADMAIF10 <-> ADMAIF10 mapping ok
[ 4.133631] tegra-asoc: sound: ADMAIF11 <-> ADMAIF11 mapping ok
[ 4.133764] tegra-asoc: sound: ADMAIF12 <-> ADMAIF12 mapping ok
[ 4.133924] tegra-asoc: sound: ADMAIF13 <-> ADMAIF13 mapping ok
[ 4.134067] tegra-asoc: sound: ADMAIF14 <-> ADMAIF14 mapping ok
[ 4.134220] tegra-asoc: sound: ADMAIF15 <-> ADMAIF15 mapping ok
[ 4.134357] tegra-asoc: sound: ADMAIF16 <-> ADMAIF16 mapping ok
[ 4.134504] tegra-asoc: sound: ADMAIF17 <-> ADMAIF17 mapping ok
[ 4.134730] tegra-asoc: sound: ADMAIF18 <-> ADMAIF18 mapping ok
[ 4.134880] tegra-asoc: sound: ADMAIF19 <-> ADMAIF19 mapping ok
[ 4.135018] tegra-asoc: sound: ADMAIF20 <-> ADMAIF20 mapping ok
[ 4.189629] u32 classifier
[ 4.189632] Actions configured
[ 4.189764] Initializing XFRM netlink socket
[ 4.190680] NET: Registered protocol family 10
[ 4.191972] NET: Registered protocol family 17
[ 4.191995] NET: Registered protocol family 15
[ 4.192076] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 4.192606] Bluetooth: RFCOMM socket layer initialized
[ 4.192637] Bluetooth: RFCOMM ver 1.11
[ 4.192647] Bluetooth: HIDP (Human Interface Emulation) ver 1.2
[ 4.192654] Bluetooth: HIDP socket layer initialized
[ 4.192677] 9pnet: Installing 9P2000 support
[ 4.192765] Key type dns_resolver registered
[ 4.198725] Registered cp15_barrier emulation handler
[ 4.198771] Registered setend emulation handler
[ 4.200348] registered taskstats version 1
[ 4.208820] isp 15600000.isp: initialized
[ 4.209063] isp 15600000.isp: isp_probe: failed
[ 4.209181] isp: probe of 15600000.isp failed with error -22
[ 4.218317] nvcsi 150c0000.nvcsi: initialized
[ 4.218666] nvcsi: probe of 150c0000.nvcsi failed with error -22
[ 4.228580] gpio tegra-gpio-aon wake29 for gpio=56(FF:0)
[ 4.228691] gpio tegra-gpio-aon wake67 for gpio=57(FF:1)
[ 4.228770] gpio tegra-gpio-aon wake68 for gpio=58(FF:2)
[ 4.229036] input: gpio-keys as /devices/gpio-keys/input/input2
[ 4.232222] tegra-vi4 15700000.vi: initialized
[ 4.234126] Unable to handle kernel read from unreadable memory at virtual address 00000000
[ 4.234128] Mem abort info:
[ 4.234129] ESR = 0x96000005
[ 4.234134] Exception class = DABT (current EL), IL = 32 bits
[ 4.234136] SET = 0, FnV = 0
[ 4.234136] EA = 0, S1PTW = 0
[ 4.234138] Data abort info:
[ 4.234139] ISV = 0, ISS = 0x00000005
[ 4.234140] CM = 0, WnR = 0
[ 4.234141] [0000000000000000] user address but active_mm is swapper
[ 4.234150] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 4.234158] Modules linked in:
[ 4.234168] CPU: 0 PID: 63 Comm: kworker/u12:3 Not tainted 4.9.140 #1
[ 4.234170] Hardware name: quill (DT)
[ 4.234209] Workqueue: events_unbound async_run_entry_fn
[ 4.234211] task: ffffffc1ebae3800 task.stack: ffffffc1ebbf4000
[ 4.234231] PC is at v4l2_async_notifier_register+0x134/0x1a0
[ 4.234233] LR is at v4l2_async_notifier_register+0x118/0x1a0
[ 4.234236] pc : [] lr : [] pstate: 60400045
[ 4.234237] sp : ffffffc1ebbf7a70
[ 4.234240] x29: ffffffc1ebbf7a70 x28: 0000000000000000
[ 4.234242] x27: ffffffc1ec9d9490 x26: 0000000000000000
[ 4.234244] x25: 00000000024080c0 x24: ffffff800a0c6f68
[ 4.234246] x23: ffffff8009f80000 x22: ffffff8009f800a0
[ 4.234247] x21: ffffffc1e3f84028 x20: fffffffffffffef0
[ 4.234249] x19: ffffffc1eb2a4ef8 x18: 0000000000000000
[ 4.234250] x17: 0000000000000000 x16: 0000000000000000
[ 4.234252] x15: ffffffffffffffff x14: ffffffc1ea3d2538
[ 4.234253] x13: ffffffc1ea3d252c x12: 0000000000000028
[ 4.234255] x11: 0000000000000038 x10: 0101010101010101
[ 4.234256] x9 : 0000000000000008 x8 : ffffffc1e3cadb40
[ 4.234258] x7 : 0000000000000000 x6 : 0000000000000020
[ 4.234260] x5 : 0000000000000020 x4 : ffffffc1eb2a4f10
[ 4.234261] x3 : ffffff8009f80090 x2 : 0000000000000000
[ 4.234262] x1 : ffffffc1e3cb9338 x0 : 0000000000000000
[ 4.234264]
[ 4.234265] Process kworker/u12:3 (pid: 63, stack limit = 0xffffffc1ebbf4000)
[ 4.234268] Call trace:
[ 4.234272] [] v4l2_async_notifier_register+0x134/0x1a0
[ 4.234281] [] tegra_vi_graph_init+0x210/0x290
[ 4.234286] [] tegra_vi_media_controller_init+0x180/0x1b8
[ 4.234299] [] tegra_vi4_probe+0x240/0x360
[ 4.234310] [] platform_drv_probe+0x60/0xc8
[ 4.234314] [] driver_probe_device+0xd0/0x3f8
[ 4.234317] [] __driver_attach+0x124/0x128
[ 4.234321] [] bus_for_each_dev+0x74/0xb0
[ 4.234323] [] driver_attach+0x30/0x40
[ 4.234325] [] driver_attach_async+0x20/0x60
[ 4.234327] [] async_run_entry_fn+0x48/0x160
[ 4.234336] [] process_one_work+0x1e8/0x490
[ 4.234338] [] worker_thread+0x58/0x4c0
[ 4.234342] [] kthread+0xd8/0xf0
[ 4.234350] [] ret_from_fork+0x10/0x40
[ 4.234358] —[ end trace 36bfa45299e4fe4b ]—
[ 4.238777] Unable to handle kernel paging request at virtual address ffffffffffffffd8
[ 4.238777] Mem abort info:
[ 4.238778] ESR = 0x96000005
[ 4.238781] Exception class = DABT (current EL), IL = 32 bits
[ 4.238781] SET = 0, FnV = 0
[ 4.238783] EA = 0, S1PTW = 0
[ 4.238784] Data abort info:
[ 4.238786] ISV = 0, ISS = 0x00000005
[ 4.238787] CM = 0, WnR = 0
[ 4.238789] swapper pgtable: 4k pages, 39-bit VAs, pgd = ffffff800a183000
[ 4.238793] [ffffffffffffffd8] *pgd=0000000000000000, *pud=0000000000000000
[ 4.238796] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[ 4.238799] Modules linked in:
[ 4.238807] CPU: 0 PID: 63 Comm: kworker/u12:3 Tainted: G D 4.9.140 #1
[ 4.238809] Hardware name: quill (DT)
[ 4.238822] task: ffffffc1ebae3800 task.stack: ffffffc1ebbf4000
[ 4.238832] PC is at kthread_data+0x24/0x30
[ 4.238836] LR is at wq_worker_sleeping+0x20/0xd0
[ 4.238837] pc : [] lr : [] pstate: 804000c5
[ 4.238839] sp : ffffffc1ebbf7670
[ 4.238842] x29: ffffffc1ebbf7670 x28: ffffffc1ebae3800
[ 4.238844] x27: ffffff80080eb93c x26: ffffffc1f703ccc0
[ 4.238845] x25: ffffffc1ebae3ed0 x24: 0000000000000000
[ 4.238847] x23: ffffff80097e4000 x22: ffffffc1ebae3800
[ 4.238849] x21: ffffff8009e18000 x20: ffffff80097ef000
[ 4.238850] x19: ffffffc1ebae3800 x18: 000000000000000e
[ 4.238852] x17: 0000000000000000 x16: 0000000000000000
[ 4.238853] x15: 00000000002f5252 x14: 0000000000000001
[ 4.238855] x13: 0000000000e6099a x12: 0000000000001400
[ 4.238857] x11: 00000000013ed302 x10: 0000000000e61d9a
[ 4.238858] x9 : 000000000138a2d9 x8 : 0000000000000400
[ 4.238860] x7 : 0000000000000001 x6 : 00000000000002f0
[ 4.238862] x5 : ffffffc1ebae38c0 x4 : 0000000000000eb0
[ 4.238863] x3 : 0000000000000000 x2 : 0000000000000000
[ 4.238865] x1 : ffffffc1f703ccc0 x0 : 0000000000000000
[ 4.238865]
[ 4.238866] Process kworker/u12:3 (pid: 63, stack limit = 0xffffffc1ebbf4000)
[ 4.238868] Call trace:
[ 4.238871] [] kthread_data+0x24/0x30
[ 4.238887] [] __schedule+0x4f8/0x770
[ 4.238896] [] do_task_dead+0x74/0x78
[ 4.238905] [] do_exit+0x5c4/0x9d8
[ 4.238912] [] die+0x188/0x1a0
[ 4.238925] [] __do_kernel_fault.isra.1+0x140/0x218
[ 4.238927] [] do_page_fault+0x1d4/0x4c0
[ 4.238931] [] do_translation_fault+0x6c/0x80
[ 4.238934] [] do_mem_abort+0x54/0xb0
[ 4.238936] [] el1_da+0x24/0xb4
[ 4.238941] [] tegra_vi_graph_init+0x210/0x290
[ 4.238943] [] tegra_vi_media_controller_init+0x180/0x1b8
[ 4.238946] [] tegra_vi4_probe+0x240/0x360
[ 4.238953] [] platform_drv_probe+0x60/0xc8
[ 4.238957] [] driver_probe_device+0xd0/0x3f8
[ 4.238959] [] __driver_attach+0x124/0x128
[ 4.238962] [] bus_for_each_dev+0x74/0xb0
[ 4.238964] [] driver_attach+0x30/0x40
[ 4.238965] [] driver_attach_async+0x20/0x60
[ 4.238967] [] async_run_entry_fn+0x48/0x160
[ 4.238970] [] process_one_work+0x1e8/0x490
[ 4.238972] [] worker_thread+0x58/0x4c0
[ 4.238974] [] kthread+0xd8/0xf0
[ 4.238978] [] ret_from_fork+0x10/0x40
[ 4.238981] —[ end trace 36bfa45299e4fe4c ]—
[ 4.243347] Fixing recursive fault but reboot is needed!
[ 5.900005] nct1008_nct72 7-004c: !!!Found deprecated property!!!
[ 5.906123] nct1008_nct72 7-004c: success parsing dt
[ 5.911522] nct1008_nct72 7-004c: success in enabling tmp451 VDD rail
[ 16.026622] tegra-i2c c250000.i2c: pio timed out addr: 0x4c tlen:16 rlen:0
[ 16.033520] tegra-i2c c250000.i2c: — register dump for debugging ----
[ 16.040151] tegra-i2c c250000.i2c: I2C_CNFG - 0x22c00
[ 16.045219] tegra-i2c c250000.i2c: I2C_PACKET_TRANSFER_STATUS - 0x1010001
[ 16.052021] tegra-i2c c250000.i2c: I2C_FIFO_CONTROL - 0xe0
[ 16.057523] tegra-i2c c250000.i2c: I2C_FIFO_STATUS - 0x800080
[ 16.063284] tegra-i2c c250000.i2c: I2C_INT_MASK - 0x7c
[ 16.068442] tegra-i2c c250000.i2c: I2C_INT_STATUS - 0xc2
[ 16.073772] tegra-i2c c250000.i2c: i2c transfer timed out addr: 0x4c
[ 16.080199] nct1008_nct72 7-004c: write reg err -110
[ 16.085184] nct1008_nct72 7-004c: sensor init failed 0xffffff92
[ 16.091122] nct1008_nct72 7-004c:
[ 16.091122] exit nct1008_probe, err=-110
[ 16.098625] nct1008_nct72 7-004c: success in disabling tmp451 VDD rail
[ 16.105211] nct1008_nct72: probe of 7-004c failed with error -110
[ 24.346562] Watchdog detected hard LOCKUP on cpu 3[ 24.351247] ------------[ cut here ]------------
[ 24.355937] WARNING: CPU: 2 PID: 0 at /home/cary/Cornet2/Projects/4chrolan/SVN/L4T_32_1/os/Build/Kernel/source/kernel/kernel-4.9/kernel/watchdog_hld.c:143 watchdog_check_hardlockup_other_cpu+0x108/0x128
[ 24.373910] Modules linked in:
[ 24.376992]
[ 24.378503] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 4.9.140 #1
[ 24.385724] Hardware name: quill (DT)
[ 24.389391] task: ffffffc1ecb7b800 task.stack: ffffffc1ecb90000
[ 24.395334] PC is at watchdog_check_hardlockup_other_cpu+0x108/0x128
[ 24.401707] LR is at watchdog_check_hardlockup_other_cpu+0x108/0x128
[ 24.408093] pc : [] lr : [] pstate: 604001c5
[ 24.415492] sp : ffffffc1f7062d10
[ 24.418816] x29: ffffffc1f7062d10 x28: ffffff800a124db0
[ 24.424177] x27: ffffff80097eb000 x26: 0000000000000001
[ 24.429525] x25: 0000000000000000 x24: 0000000000000000
[ 24.434876] x23: ffffff8009e17000 x22: ffffff8009e17fa8
[ 24.440225] x21: 0000000000000003 x20: ffffff8009e18c30
[ 24.445581] x19: ffffff80097eb760 x18: 0000000000000000
[ 24.450926] x17: 0000000000000f81 x16: 0000000000000000
[ 24.456270] x15: ffffffffffffffff x14: ffffffc2770626df
[ 24.461624] x13: ffffffc1f70626e2 x12: 0000000000000018
[ 24.466973] x11: ffffffc1f70626a0 x10: ffffffc1f70626a0
[ 24.472327] x9 : 0000000000000326 x8 : 206e6f2050554b43
[ 24.477705] x7 : 4f4c206472616820 x6 : ffffffc1f7062718
[ 24.484319] x5 : 0000000000000001 x4 : 0000000000000000
[ 24.489667] x3 : 0000000000000000 x2 : 0000000000000000
[ 24.495034] x1 : ffffffc1ecb7b800 x0 : 0000000000000026
[ 24.500383]
[ 24.501886] —[ end trace 36bfa45299e4fe4d ]—
[ 24.506576] Call trace:
[ 24.509092] [] watchdog_check_hardlockup_other_cpu+0x108/0x128
[ 24.516519] [] watchdog_timer_fn+0x9c/0x288
[ 24.522303] [] __hrtimer_run_queues+0xd0/0x348
[ 24.528333] [] hrtimer_interrupt+0xa8/0x1e0
[ 24.534136] [] tegra186_timer_isr+0x34/0x48
[ 24.539905] [] __handle_irq_event_percpu+0x60/0x258
[ 24.546350] [] handle_irq_event_percpu+0x28/0x60
[ 24.552540] [] handle_irq_event+0x50/0x80
[ 24.558121] [] handle_fasteoi_irq+0xc0/0x1b8
[ 24.563961] [] generic_handle_irq+0x34/0x50
[ 24.569711] [] __handle_domain_irq+0x68/0xc0
[ 24.575565] [] gic_handle_irq+0x5c/0xb0
[ 24.580972] [] el1_irq+0xe8/0x18c
[ 24.585869] [] cpuidle_enter_state+0xb8/0x378
[ 24.591793] [] cpuidle_enter+0x34/0x48
[ 24.597117] [] call_cpuidle+0x40/0x70
[ 24.602359] [] cpu_startup_entry+0x1a0/0x1f0
[ 24.608203] [] secondary_start_kernel+0x18c/0x200
[ 24.614472] [<00000000811211a4>] 0x811211a4

The TX2 module reset correctly when temperature is lower then 61C. However the temperature is still lower then the specification operating temperature (80C). Can you point out where is the problem of the TX2 from the above bootup messages and what might cause this failure?

The first error comes from csi and vi. Do you run any usecase during the temperature runs up?

Also, (if you have more than one) does this issue only happen to one module?

Hi Wayne,

Thanks for the reply. We are running a multi-channel video capture application, sending
video data to one or more CSI channels from a parallel to CSI converter.
The test we run is to let it run capturing multiple streams for a while
so the temperature ramps up to around 61C then soft reset the system.

This seems to happen with all TX2 modules, but only with some
of our carrier boards. We design and make our own carrier boards. Also it happened after soft reset (Low to RESET_IN# (A47)). Works fine if I recycle the power (power cycle to VDD_IN).

I am helping klam to debug this problem.

The most common error we see seems to begin with this error during kernel boot:

[    5.661118] registered taskstats version 1
[    5.674892] isp 15600000.isp: initialized
[    5.679245] isp 15600000.isp: isp_probe: failed
[    5.684043] isp: probe of 15600000.isp failed with error -22
[    5.699595] nvcsi 150c0000.nvcsi: initialized
[    5.704433] nvcsi: probe of 150c0000.nvcsi failed with error -22

After this the initialization of the video capture endpoints fails partway through

[    5.719437] gpio tegra-gpio-aon wake29 for gpio=56(FF:0)
[    5.722933] tegra-vi4 15700000.vi: initialized
[    5.725155] tegra-vi4 15700000.vi: handling endpoint /host1x/vi@15700000/ports/port@0/endpoint
[    5.725244] tegra-vi4 15700000.vi: parsing node /host1x/nvcsi@150c0000/channel@0
[    5.725253] tegra-vi4 15700000.vi: handling endpoint /host1x/nvcsi@150c0000/channel@0/ports/port@0/endpoint@0
[    5.725270] tegra-vi4 15700000.vi: parsing node /i2c@3180000/cti_4chrolan_hd_in@01
[    5.725280] tegra-vi4 15700000.vi: handling endpoint /i2c@3180000/cti_4chrolan_hd_in@01/ports/port@0/endpoint
[    5.725299] tegra-vi4 15700000.vi: handling endpoint /host1x/nvcsi@150c0000/channel@0/ports/port@1/endpoint@1
[    5.725373] Unable to handle kernel read from unreadable memory at virtual address 00000000
[    5.725374] Mem abort info:
[    5.725377]   ESR = 0x96000005
[    5.725382]   Exception class = DABT (current EL), IL = 32 bits
[    5.725384]   SET = 0, FnV = 0
[    5.725386]   EA = 0, S1PTW = 0
[    5.725387] Data abort info:
[    5.725389]   ISV = 0, ISS = 0x00000005
[    5.725390]   CM = 0, WnR = 0
[    5.725395] [0000000000000000] user address but active_mm is swapper
[    5.725406] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    5.725417] Modules linked in:
[    5.725436] CPU: 2 PID: 1427 Comm: kworker/u12:6 Not tainted 4.9.140 #1
[    5.725438] Hardware name: quill (DT)
[    5.725517] Workqueue: events_unbound async_run_entry_fn
[    5.725519] task: ffffffc1eb390000 task.stack: ffffffc1ea338000
[    5.725540] PC is at v4l2_async_notifier_register+0x134/0x1a0
[    5.725545] LR is at v4l2_async_notifier_register+0x118/0x1a0
[    5.725550] pc : [<ffffff8008aeff94>] lr : [<ffffff8008aeff78>] pstate: 60400045
[    5.725553] sp : ffffffc1ea33ba70
[    5.725557] x29: ffffffc1ea33ba70 x28: 0000000000000000 
[    5.725563] x27: 0000000000000000 x26: 0000000000000000 
[    5.725566] x25: 00000000024080c0 x24: ffffff800a0c6f68 
[    5.725570] x23: ffffff8009f80000 x22: ffffff8009f800a0 
[    5.725575] x21: ffffffc1e3fa4028 x20: fffffffffffffef0 
[    5.725579] x19: ffffffc1e579aef8 x18: 0000000000000000 
[    5.725583] x17: 0000000000000000 x16: 0000000000000000 
[    5.725588] x15: ffffffffffffffff x14: 2f30303030633035 
[    5.725591] x13: 3140697363766e2f x12: 783174736f682f20 
[    5.725595] x11: 0000000000000038 x10: 0101010101010101 
[    5.725599] x9 : 0000000000000008 x8 : ffffffc1e3f22780 
[    5.725603] x7 : 0000000000000000 x6 : 0000000000000020 
[    5.725606] x5 : 0000000000000020 x4 : ffffffc1e579af10 
[    5.725610] x3 : ffffff8009f80090 x2 : 0000000000000000 
[    5.725614] x1 : ffffffc1e3f0d638 x0 : 0000000000000000 
[    5.725616] 
[    5.725618] Process kworker/u12:6 (pid: 1427, stack limit = 0xffffffc1ea338000)
[    5.725623] Call trace:
[    5.725632] [<ffffff8008aeff94>] v4l2_async_notifier_register+0x134/0x1a0
[    5.725681] [<ffffff8008b0cb80>] tegra_vi_graph_init+0x210/0x290
[    5.725688] [<ffffff8008b069e8>] tegra_vi_media_controller_init+0x180/0x1b8
[    5.725705] [<ffffff800854e830>] tegra_vi4_probe+0x240/0x360
[    5.725721] [<ffffff8008759780>] platform_drv_probe+0x60/0xc8
[    5.725729] [<ffffff8008756d48>] driver_probe_device+0xd0/0x3f8
[    5.725741] [<ffffff8008757194>] __driver_attach+0x124/0x128
[    5.725746] [<ffffff800875487c>] bus_for_each_dev+0x74/0xb0
[    5.725749] [<ffffff8008756540>] driver_attach+0x30/0x40
[    5.725752] [<ffffff8008754e40>] driver_attach_async+0x20/0x60
[    5.725758] [<ffffff80080df8a0>] async_run_entry_fn+0x48/0x160
[    5.725777] [<ffffff80080d4e58>] process_one_work+0x1e8/0x490

Note that

a) This only happens when the TX2 module is above about 140f/60C.

b) Our camera driver is implemented as a loadable kernel module, and not yet
loaded.

c) The video capture routing IS enabled in the Device Tree. If we go back to
the standard device tree we don’t see this problem.

Any ideas?

Thanks,

Cary

hello klamhl3o9, cobrien,

according to your description, you cannot reproduce the issue with default settings.
may I know what’s the additional configure you had applied?
thanks

We have 8 video capture channels enabled, 5 2-lane and 3 4-lane.

I.e entries in the device tree like this:

tegra-camera-platform {

		compatible = "nvidia, tegra-camera-platform";

		modules {
			/* 5 2-lane inputs.  */
			cam_module0: module0 {
				badge = "cti_4chrolan_hd_in_1";
				position = "topleft";
				orientation = "1";
				drivernode0 {
					pcl_id = "v4l2_sensor";
					proc-device-tree = "/proc/device-tree/i2c@3180000/cti_4chrolan_hd_in@01";
				};
			};	
...

	host1x {
		vi_base: vi@15700000 {
			num-channels = <8>;
			ports {
				#address-cells = <1>;
				#size-cells = <0>;
				port@0 {
					reg = <0>;
					vi_in0: endpoint {
						port-index = <0>;
						bus-width = <2>;
						remote-endpoint = <&csi_out0>;
					};
				};	

		csi_base: nvcsi@150c0000 {
			/* the rest is filled in from hardware/nvidia/soc/t18x/kernel-dts/tegra186-soc/tegra186-soc-base.dtsi */
			num-channels = <8>;
			#address-cells = <1>;
			#size-cells = <0>;
			status = "okay";
			/*  5 (of a possible 6) 2-lane inputs */
			csi_chan0: channel@0 {
				reg = <0>;
				ports {
					#address-cells = <1>;
					#size-cells = <0>;
					port@0 {
						reg = <0>;
						csi_in0: endpoint@0 {
							port-index = <0>;
							bus-width = <2>;
							remote-endpoint = <&cti_4chrolan_hd_in_0>;
						};
					};
					port@1 {
						reg = <1>;
						csi_out0: endpoint@1 {
							remote-endpoint = <&vi_in0>;
						};
					};	
				};
			};

Note the segmentation violation is in the tegra_vi_graph_init routine.

Some additional information from testing…

a) The problem does NOT occur on the NVidia TX2 Eval carrier board.

b) The problem does NOT occur with the default device tree.

c) If we switch our camera driver from a loadable module to being compiled
in, the problem still exists.

d) The cutoff for the problem seems very sharp at 60 deg c.

Just to add some more information…

I was able to capture the boot sequence between normal operation and the error condition.
(This is sdiff output, | indicates difference, < > indicate lines inserted)

The first difference (good on the left, bad on the right) is that the iso emc max clk
and bw aren’t calculated properly.

la/ptsa driver initialized.                                                                             la/ptsa driver initialized.
pre_t19x_iso_plat_init(): iso emc max clk=1866000KHz                                               |    pre_t19x_iso_plat_init(): iso emc max clk=0KHz
pre_t19x_iso_plat_init(): max_iso_bw=26870400KB                                                    |    pre_t19x_iso_plat_init(): max_iso_bw=0KB
NET: Registered protocol family 2                                                                       NET: Registered protocol family 2

It’s possible this causes some of the camera subsystem to fail initialization.

misc tegra_camera_ctrl: tegra_camera_isomgr_register: some fields not in DT.                            misc tegra_camera_ctrl: tegra_camera_isomgr_register: some fields not in DT.
misc tegra_camera_ctrl: tegra_camera_isomgr_register tpg_max_iso = 3916800KBs                           misc tegra_camera_ctrl: tegra_camera_isomgr_register tpg_max_iso = 3916800KBs
misc tegra_camera_ctrl: tegra_camera_isomgr_register isp_iso_bw=0, vi_iso_bw=2250000, max_bw=391        misc tegra_camera_ctrl: tegra_camera_isomgr_register isp_iso_bw=0, vi_iso_bw=2250000, max_bw=391
                                                                                                   >    pre_t19x_iso_plat_register(): iso bandwidth 3916800KB is not available, client tegra_camera_ctrl
                                                                                                   >    misc tegra_camera_ctrl: tegra_camera_isomgr_register: unable to register to isomgr
                                                                                                   >    misc tegra_camera_ctrl: tegra_camera_probe: failed to register CAMERA as isomgr client
                                                                                                   >    tegra_camera_platform: probe of tegra-camera-platform failed with error -12
tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes                                      tegra-pcie 10003000.pcie-controller: probing port 2, using 1 lanes

And later

input: tegra-hda HDMI/DP,pcm=7 as /devices/3510000.hda/sound/card0/input1                               input: tegra-hda HDMI/DP,pcm=7 as /devices/3510000.hda/sound/card0/input1
                                                                                                   >    pre_t19x_iso_plat_register(): iso bandwidth 24576KB is not available, client ape_adma
                                                                                                   >    tegra_isomgr_adma_register: Failed to register adma isomgr client. err=-22
OPE platform probe                                                                                      OPE platform probe

Finally, the probe of isp and nvsci fail, which I believe leads to a bad pointer and the abort

isp 15600000.isp: initialized                                                                           isp 15600000.isp: initialized
                                                                                                   >    isp 15600000.isp: isp_probe: failed
                                                                                                   >    isp: probe of 15600000.isp failed with error -22
nvcsi 150c0000.nvcsi: initialized                                                                       nvcsi 150c0000.nvcsi: initialized
                                                                                                   >    nvcsi: probe of 150c0000.nvcsi failed with error -22
gpio tegra-gpio-aon wake29 for gpio=56(FF:0)                                                            gpio tegra-gpio-aon wake29 for gpio=56(FF:0)
gpio tegra-gpio-aon wake67 for gpio=57(FF:1)                                                            gpio tegra-gpio-aon wake67 for gpio=57(FF:1)
gpio tegra-gpio-aon wake68 for gpio=58(FF:2)                                                            gpio tegra-gpio-aon wake68 for gpio=58(FF:2)
input: gpio-keys as /devices/gpio-keys/input/input2                                                     input: gpio-keys as /devices/gpio-keys/input/input2
tegra-vi4 15700000.vi: initialized                                                                      tegra-vi4 15700000.vi: initialized
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--8 bound                                              |    Unable to handle kernel read from unreadable memory at virtual address 00000000
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--7 bound                                              |    Mem abort info:
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--6 bound                                              |      ESR = 0x96000005
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--5 bound                                              |      Exception class = DABT (current EL), IL = 32 bits
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--4 bound                                              |      SET = 0, FnV = 0
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--3 bound                                              |      EA = 0, S1PTW = 0
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--2 bound                                              |    Data abort info:
tegra-vi4 15700000.vi: subdev 150c0000.nvcsi--1 bound                                              |      ISV = 0, ISS = 0x00000005
tegra_rtc c2a0000.rtc: setting system clock to 2000-01-01 18:23:24 UTC (946751004)                 |      CM = 0, WnR = 0
mmcblk mmc0:0001: Card claimed for testing.                                                        |    [0000000000000000] user address but active_mm is swapper
bpmp: mounted debugfs mirror                                                                       |    Internal error: Oops: 96000005 [#1] PREEMPT SMP
bwmgr: missing cdev-type property                                                                  |    Modules linked in:
spmic-ldo0: disabling                                                                              |    CPU: 5 PID: 1455 Comm: kworker/u12:6 Not tainted 4.9.140 #2
spmic-ldo1: disabling                                                                              |    Hardware name: quill (DT)
en-vdd-sd: disabling                                                                               |    Workqueue: events_unbound async_run_entry_fn
en-vdd-cam: disabling                                                                              |    task: ffffffc1ebb0aa00 task.stack: ffffffc1ea3e0000
vdd-usb0-5v: disabling                                                                             |    PC is at v4l2_async_notifier_register+0x134/0x1a0
vdd-usb1-5v: disabling                                                                             |    LR is at v4l2_async_notifier_register+0x118/0x1a0
en-vdd-disp-3v3: disabling                                                                         |    pc : [<ffffff8008aeff94>] lr : [<ffffff8008aeff78>] pstate: 60400045
en-mdm-pwr-3v7: disabling                                                                          |    sp : ffffffc1ea3e3a70
en-vdd-disp-1v8: disabling                                                                         |    x29: ffffffc1ea3e3a70 x28: 0000000000000000 
en-vdd-cam-hv-2v8: disabling                                                                       |    x27: 0000000000000000 x26: 0000000000000000 
en-vdd-cam-1v2: disabling                                                                          |    x25: 00000000024080c0 x24: ffffff800a0c6f68 
vdd-fan: disabling                                                                                 |    x23: ffffff8009f80000 x22: ffffff8009f800a0 
vdd-3v3: disabling                                                                                 |    x21: ffffffc1e3ec4028 x20: fffffffffffffef0 
en-vdd-vcm-2v8: disabling                                                                          |    x19: ffffffc1ebb02ef8 x18: 0000000000000000 
vdd-usb2-5v: disabling                                                                             |    x17: 0000000000000000 x16: 0000000000000000 
vdd-sys-bl: disabling                                                                              |    x15: ffffffffffffffff x14: ffffffc1ebb07538 
en-vdd-sys: disabling                                                                              |    x13: ffffffc1ebb0752c x12: 0000000000000028 
ALSA device list:                                                                                  |    x11: 0000000000000038 x10: 0101010101010101 
  #0: tegra-hda at 0x3518000 irq 383                                                               |    x9 : 0000000000000008 x8 : ffffffc1e5631980 
  #1: tegra-snd-t186ref-mobile-rt565x                                                              |    x7 : 0000000000000000 x6 : 0000000000000020 
nct1008_nct72 7-004c: !!!Found deprecated property!!!                                              |    x5 : 0000000000000020 x4 : ffffffc1ebb02f10 
nct1008_nct72 7-004c: success parsing dt                                                           |    x3 : ffffff8009f80090 x2 : 0000000000000000 
nct1008_nct72 7-004c: success in enabling tmp451 VDD rail                                          |    x1 : ffffffc1e3edc138 x0 : 0000000000000000 
TS cti-4chrolan-encoder ttyS0                                                                     |    Process kworker/u12:6 (pid: 1455, stack limit = 0xffffffc1ea3e0000)
                                                                                                   >    Call trace:
                                                                                                   >    [<ffffff8008aeff94>] v4l2_async_notifier_register+0x134/0x1a0
                                                                                                   >    [<ffffff8008b0cb80>] tegra_vi_graph_init+0x210/0x290
                                                                                                   >    [<ffffff8008b069e8>] tegra_vi_media_controller_init+0x180/0x1b8
                                                                                                   >    [<ffffff800854e830>] tegra_vi4_probe+0x240/0x360
                                                                                                   >    [<ffffff8008759780>] platform_drv_probe+0x60/0xc8
                                                                                                   >    [<ffffff8008756d48>] driver_probe_device+0xd0/0x3f8
                                                                                                   >    [<ffffff8008757194>] __driver_attach+0x124/0x128
                                                                                                   >    [<ffffff800875487c>] bus_for_each_dev+0x74/0xb0
                                                                                                   >    [<ffffff8008756540>] driver_attach+0x30/0x40
                                                                                                   >    [<ffffff8008754e40>] driver_attach_async+0x20/0x60
                                                                                                   >    [<ffffff80080df8a0>] async_run_entry_fn+0x48/0x160
                                                                                                   >    [<ffffff80080d4e58>] process_one_work+0x1e8/0x490
                                                                                                   >    [<ffffff80080d5158>] worker_thread+0x58/0x4c0
                                                                                                   >    [<ffffff80080dbb28>] kthread+0xd8/0xf0
                                                                                                   >    [<ffffff8008083850>] ret_from_fork+0x10/0x40
                                                                                                   >    ---[ end trace 073ca4df96fc2449 ]---

Any reason why these clock and bandwidth values may be calculated as 0 if the
temperature exceeded a specific value?

Thanks,

Cary

hello cobrien,

there are total 12 lanes for 6 CSI ports.
please refer to Sensor Driver Programming Guide and check the [Port Index] session.

also, your device tree configuration looks incorrect.
suggest you check below 6-cam device tree for multiple camera reference.

$TOP/kernel_src/hardware/nvidia/platform/t18x/common/kernel-dts/t18x-common-modules/tegra186-camera-e3333-a00.dtsi

The carrier supports 4 video inputs configurable from 1080p (which requires 4 lanes) to Composite (which require 2 lanes). In order to support an arbitrary combination of devices there are 5 2-lane inputs defined, and 3 4-lane inputs defined. The required input device (/dev/videoN)
is selected based on the resolution detected by the input circuitry. This was the only
way to handle multiple arbitrarily configurable inputs, since the CSI lane mapping didn’t
seem to be easily re-configurable at run time. Note that this configuration works
fine if the temperature of the TX2 is below 60 degrees C.

I believe the underlying problem is in t19x_iso_plat_init() where clk and max bw are
both detected as 0.

hello cobrien,

The carrier supports 4 video inputs configurable from 1080p (which requires 4 lanes) to Composite (which require 2 lanes).
I’m still wondering what’s your use-case. could you have more description about what’s the usage.

In order to support an arbitrary combination of devices there are 5 2-lane inputs defined, and 3 4-lane inputs defined. The required input device (/dev/videoN)
may I have more details about your design.
since there are total 12 lanes for 6 CSI ports for Jetson-TX2. your setup of five 2-lane and three 4-lane inputs were beyond the hardware capability.

The application is a 4-channel video encoder.
Unfortunately due to the limitation in number of
CSI inputs we have to run it either in 4 input mode
where the inputs can be from Composite to 720p (2-lanes)
or 3 input mode where 3 inputs can run up to
1080p (4 lanes). Each of the 6 CSI inputs supports 2
lanes so a maximum of 6 2-lane inputs or 3 4-lane inputs.

Since it is not obvious how to adjust the CSI mapping
on the fly, we have devices configured for all possible
input configurations in the device tree, 5 2-lane inputs,
and 3 4-lane inputs. The proper /dev/videoN to use depends
on the mode, the input, and the data rate for that
input.

Note that the underlying problem startup crash due to
0 clock/bandwidth calculation will occur with
only a single 2-lane input configured in the
device tree.

Could you please also check the strapping settings to confirm your design is same as that of reference board? You can check that based on Strapping chapter in OEM DG.