Linux crashes/lock up shortly after kernel boot

I have a custom Tegra K1 board that is based off the Jetson reference design. I am able to flash and get my board to boot but shortly after boot (around 24 seconds), I am getting a series of errors/interrupts and then the OS just locks up. This is what I get right before the lock up and wondered if anyone knows what this error could be pointing to. Some sort of PMU interrupt? VDD GPU going low?

Ubuntu 14.04.1 LTS tegra-ubuntu ttyS0

tegra-ubuntu login: ubuntu (automatic login)

Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.10.40-gfb7a554-dirty armv7l)

0 packages can be updated.
0 updates are security updates.

[ 15.043859] init: plymouth-upstart-bridge main process ended, respawning
[ 15.692622] init: Failed to obtain startpar-bridge instance: Unknown parameter: INSTANCE
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$ [ 24.761697] gk20a gk20a.0: gk20a_pmu_isr: pmu exterr intr not implemented. Clearing interrupt.
[ 24.774918] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_os_r : 17997577
[ 24.782307] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_cpuctl_r : 0x0
[ 24.789394] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_idlestate_r : 0x5
[ 24.796809] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_mailbox0_r : 0x0
[ 24.804169] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_mailbox1_r : 0x0
[ 24.811421] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_irqstat_r : 0x60
[ 24.818683] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_irqmode_r : 0xfc24
[ 24.826126] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_irqmask_r : 0x7879
[ 24.833670] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_irqdest_r : 0x90372
[ 24.841306] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(0) : 0x0
[ 24.848493] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(1) : 0x0
[ 24.848610] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(2) : 0x0
[ 24.848727] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(3) : 0x0
[ 24.848845] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(4) : 0x0
[ 24.849061] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(5) : 0x0
[ 24.849192] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(6) : 0x0
[ 24.849311] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(7) : 0x0
[ 24.849429] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(8) : 0x0
[ 24.849548] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(9) : 0x0
[ 24.849667] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(10) : 0x0
[ 24.849786] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_mailbox_r(11) : 0x0
[ 24.849900] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_debug_r(0) : 0x0
[ 24.850120] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_debug_r(1) : 0x0
[ 24.850246] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_debug_r(2) : 0x20
[ 24.850365] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_debug_r(3) : 0x0
[ 24.850530] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (0) : 0x20002000
[ 24.850696] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (1) : 0x129aaf0
[ 24.850861] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (2) : 0x21300010
[ 24.851026] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (3) : 0xe00ff
[ 24.851297] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (4) : 0x0
[ 24.851460] gk20a gk20a.0: pmu_dump_falcon_stats: pmu_rstat (5) : 0x3f
[ 24.851574] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_bar0_error_status_r : 0x0
[ 24.851692] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_pmu_bar0_fecs_error_r : 0x0
[ 24.851806] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_exterrstat_r : 0x810017d2
[ 24.851919] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_exterraddr_r : 0x10022500
[ 24.851956] gk20a gk20a.0: pmu_dump_falcon_stats: pmc_enable : 0xf831312c
[ 24.852181] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_engctl_r : 0x0
[ 24.852297] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_curctx_r : 0x600f04ca
[ 24.852418] gk20a gk20a.0: pmu_dump_falcon_stats: pwr_falcon_nxtctx_r : 0x600f04ca
[ 24.852581] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_IMB : 0x80001
[ 24.852741] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_DMB : 0x8006f
[ 24.852903] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_CSW : 0x300001
[ 24.853174] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_CTX : 0x0
[ 24.853336] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_EXCI : 0x693
[ 24.853502] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_PC : 0x9b0
[ 24.853667] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_SP : 0x1c20
[ 24.853831] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_PC : 0x27a
[ 24.854093] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_SP : 0x1c20
[ 24.854262] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_PC : 0xf18
[ 24.854427] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_SP : 0x1c20
[ 24.854592] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_PC : 0xf18
[ 24.854752] gk20a gk20a.0: pmu_dump_falcon_stats: PMU_FALCON_REG_SP : 0x1c20
[ 24.854859] gk20a gk20a.0: pmu_dump_falcon_stats: elpg stat: 0
[ 24.854859]
[ 24.855073] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[ 24.855289] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x0
[ 24.855602] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[ 24.855810] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[ 24.856017] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[ 24.856228] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[ 24.856446] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[ 24.856756] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[ 24.856963] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[ 24.857181] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[ 24.857499] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x84288011
[ 24.857603] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x0
[ 24.857707] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0
[ 24.857811] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(2) : 0x209
[ 24.857918] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(3) : 0x0
[ 24.858021] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(4) : 0x0
[ 24.858231] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(5) : 0x0
[ 24.858338] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(6) : 0x0
[ 24.858446] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(7) : 0x0
[ 24.858657] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_engctl_r : 0x0
[ 24.858868] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_curctx_r : 0x0
[ 24.859190] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_nxtctx_r : 0x0
[ 24.859499] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_IMB : 0x80411
[ 24.859810] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_DMB : 0x8044d
[ 24.860130] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CSW : 0x110800
[ 24.860543] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_CTX : 0x0
[ 24.860853] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_EXCI : 0x0
[ 24.861172] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0x1ac2
[ 24.861478] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xf7c
[ 24.861788] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0x1ad7
[ 24.862107] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xf7c
[ 24.862413] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0x1acf
[ 24.862825] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xf7c
[ 24.863143] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_PC : 0x1ad7
[ 24.863457] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: FECS_FALCON_REG_SP : 0xf7c

ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$
ubuntu@tegra-ubuntu:~$ [ 34.804961] gk20a gk20a.0: gr_gk20a_ctx_wait_ucode: timeout waiting on ucode response
[ 34.813249] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_os_r : 0
[ 34.821425] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_cpuctl_r : 0x0
[ 34.829074] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_idlestate_r : 0x1
[ 34.836885] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox0_r : 0x0
[ 34.845322] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_mailbox1_r : 0x0
[ 34.853031] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqstat_r : 0x0
[ 34.861971] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmode_r : 0x4
[ 34.869880] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqmask_r : 0x8704
[ 34.877798] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_irqdest_r : 0x0
[ 34.885917] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debug1_r : 0x40
[ 34.893639] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_debuginfo_r : 0x84288011
[ 34.901937] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(0) : 0x0
[ 34.910224] gk20a gk20a.0: gk20a_fecs_dump_falcon_stats: gr_fecs_ctxsw_mailbox_r(1) : 0x0

Hello, rccola75:
Would you please provide more details of this issue?

  1. What version of Jetson SDK you are using?
  2. It happens randomly or always?
  3. It happens in specific devices or in all devices you’ve made?

That error may happen for many reasons.

With latest TK1 Jetson SDK (R21.4 I think.), if that happens frequently, it’s better to check voltage/power-supply.

br
Chenjian

I fixed the original issue that I posted about. I will give some details on the issue for people that may encounter this in the future.

The problem ended up being the JTAG_TRST_N pin on the Tegra. When I designed the board, I originally have this pin pulled down with a 100K resistor for “normal” operation. I have this JTAG chain connected to the processor first then a Cyclone V FPGA. During board bring-up I was able to see the FPGA in the chain and successfully programmed the FPGA, however, I didn’t see the processor in the JTAG chain. After reading the design guide, I saw that the JTAG_TRST_N pin must be pulled high for boundary scan purposes. I removed the 100k pull down and pulled it high to verify that it can be seen in the JTAG chain even though the Altera programmer doesn’t recognize the Tegra. I had forgotten to pull the pin back down during initial flashing and bring-up of the processor which is what caused the problem I posted above. The odd thing is that the processor was able to run when I put it in low power mode during boot. When I try to initiate the second core, it would crash/freeze. Found another post on the forums talking about the JTAG_TRST_N pin being pulled up causing all sorts of problems so I thought I would try pulling the JTAG_TRST_N pin back down to see if that fixes the problem (which it did but doesn’t make sense to me).

Anyone know why the processor functions with one core when the JTAG_TRST_N is pulled up? I would just expect nothing to work if it was pulled up. But for it semi-work is strange.

Thanks.

Can be said that under how to solve it?thanks

Hi Jachen,
We have also facing similar issue. It happens inconsistently in all boards, And we have test in Jetson Tk1 with R21.5. We have written a script to reboot the board after complete boot up, On testing whole night we are able to recreate the issue after ~50 boots. Is there any software fix/patch available for this issue?

gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.301642] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.301642]
[ 15.310807] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.310807]
[ 15.315571] ar0330_v4l2 2-003d: Chip ID: 0 not supported!
[ 15.324744] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.324744]
[ 15.334415] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.334415]
[ 15.342907] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.349113] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.349113]
[ 15.358252] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.358252]
[ 15.366779] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.366779]
[ 15.376429] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.376429]
[ 15.384914] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.391108] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.391108]
[ 15.400255] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.400255]
[ 15.408780] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.408780]
[ 15.418437] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.418437]
[ 15.426916] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.433100] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.433100]
[ 15.442239] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.442239]
[ 15.450763] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.450763]
[ 15.460407] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.460407]
[ 15.468867] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.475049] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.475049]
[ 15.484183] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.484183]
[ 15.492722] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.492722]
[ 15.502376] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.502376]
[ 15.510866] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.517051] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.517051]
[ 15.526191] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.526191]
[ 15.534723] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.534723]
[ 15.544377] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt
[ 15.544377]
[ 15.552854] gk20a gk20a.0: gk20a_pbus_isr: pmc_enable : 0xf831312c
[ 15.559042] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_0: 0x80400701
[ 15.559042]
[ 15.568175] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_SAVE_1: 0x0
[ 15.568175]
[ 15.576698] gk20a gk20a.0: gk20a_pbus_isr: NV_PTIMER_PRI_TIMEOUT_FECS_ERRCODE: 0xbadf1301
[ 15.576698]
[ 15.586345] gk20a gk20a.0: gk20a_pbus_isr: Unhandled pending pbus interrupt

Thanks and Regards,
Jeslin Paul

Any comments from any active users of R21.5?

Hi Jeslin,

Please check your TK1 status:
https://devtalk.nvidia.com/default/topic/898734/jetson-tk1/tk1-boot-failure-and-debug-serial-terminal-stops-working-as-well/post/4738098/#4738098

Thanks!

Hi Carolyuu,
Thanks for the link.
“JTAG_TRST_N” pin is already pulld down. But also we have got the same issue. And moreover it is recreated in Jetson TK1 with R21.5.

Thanks and Regards,
Jeslin Paul