No core dumps on tx1

Hi everybody,

I have a problem with generating core dumps on a tx1 board. No matter what I do, no core dumps are generated. To make a long story short, here are things I already did (with reboots, service restarts etc. Hopefully I didn’t forget any steps I made):

  1. coredump limits (using ulimit with options -c, -Sc and -Hc) set to unlimited
  2. check /var/crash, where the ubuntu-default apport is supposed to put core dumps
  3. check if the kernel has built-in core dumps by issuing
    zgrep COREDUMP /proc/config.gz
    

    (got CONFIG_COREDUMP=y)

  4. disable apport by changing enabled=0 in /etc/default/apport and putting "manual" in a newly created /etc/init/apport.override file
  5. changing values of kernel.core_uses_pid, kernel.core_pattern and fs.suid_dumpable through /etc/sysctl.conf (and reloading by issuing sysctl -p) and directly writing to corresponding /proc files (mainly core_pattern, different variants including paths starting with "/" as well as just "core" with % options. Tried putting paths to the internal emmc memory, sd card and a cifs mounted network drive with all writing permissions)

With each setting I issued (as a normal user as well as root) either

sleep 5 & sudo killall -SEGV sleep

(the result was Segmentation fault, but no (Core dumped) info),

had a program for segfaults:

main() {
    int a = *(int*)0;
}

or had programs going in an infinite loop (like e.g. yes) and pressing CTRL+.

The only thing I get is info in dmesg (but only after running my program for segfaults):

root@tegra-ubuntu:~# ./a.out 
Segmentation fault
root@tegra-ubuntu:~# dmesg -c
[ 1159.881097] a.out[2564]: unhandled level 3 translation fault (11) at 0x00000000, esr 0x92000007
[ 1159.881106] pgd = ffffffc0ca074000
[ 1159.881116] [00000000] *pgd=000000014ad7c003, *pmd=000000014a003003, *pte=0000000000000000
[ 1159.881132]
[ 1159.881139] CPU: 2 PID: 2564 Comm: a.out Not tainted 3.10.96-tegra #1
[ 1159.881145] task: ffffffc0ce192040 ti: ffffffc0ca060000 task.ti: ffffffc0ca060000
[ 1159.881153] PC is at 0x83c4
[ 1159.881158] LR is at 0xf75e3633
[ 1159.881163] pc : [<00000000000083c4>] lr : [<00000000f75e3633>] pstate: 60000030
[ 1159.881167] sp : 00000000ff955bc0
[ 1159.881171] x12: 00000000ff955c50
[ 1159.881177] x11: 0000000000000000 x10: 00000000f76f0000
[ 1159.881185] x9 : 0000000000000000 x8 : 0000000000000000
[ 1159.881193] x7 : 00000000ff955bc0 x6 : 00000000f76af000
[ 1159.881200] x5 : 0000000000000000 x4 : 0000000000000000
[ 1159.881207] x3 : 0000000000000000 x2 : 00000000ff955d2c
[ 1159.881214] x1 : 00000000ff955d24 x0 : 0000000000000001
[ 1159.881221]
[ 1159.881229] Library at 0x83c4: 0x8000 /root/a.out
[ 1159.881238] Library at 0xf75e3633: 0xf75cc000 /lib/arm-linux-gnueabihf/libc-2.19.so
[ 1159.881245] vdso base = 0xf76ed000

but no sign of core dumps…

I’m running Ubuntu 14.04.1 LTS with 3.10.96-tegra kernel (what a surprise).

Also all the things mentioned above where issued through an ssh session, though it shouldn’t matter (but for me at this moment all information seems relevant).

Thanks in advance for any help.

For reference, I compiled my test case with “-O0 -g” (the printfs is just for showing the program started normally):

#include <stdio.h>

int
main ()
{
        int foo = 0;
        printf("foo: %d\n", foo);

        int *bar = (size_t)(0);
        int baz = *bar;

        return 0;
}

Side note, not really what you’re asking: You can temporarily disable apport with “sudo systemctl stop apport.service”. I think permanent disable is “sudo systemctl disable apport.service” (but can later be re-enabled).

Another side note: If you want a core from a running program, check out “man gcore” (I haven’t tried it for your case).

The interesting thing is that each intentional crash is being logged in dmesg. This is unusual because it is giving registers as if the process were a crashing driver, not a user space program. Here’s what I get (my test program should generate SIGSEGV, and is named “main”):

[  781.543101] main[1984]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x92000006
[  781.543149] pgd = ffffffc0f1571000
[  781.546604] [00000000] *pgd=00000001589f7003, *pmd=0000000000000000

[  781.553579] CPU: 1 PID: 1984 Comm: main Not tainted 3.10.96-debug1_crosstool-ng-4.8.2 #1
[  781.553625] task: ffffffc0ffea0080 ti: ffffffc0a9d80000 task.ti: ffffffc0a9d80000
[  781.553667] PC is at 0x4005e4
[  781.553698] LR is at 0x4005dc
[  781.553732] pc : [<00000000004005e4>] lr : [<00000000004005dc>] pstate: 60000000
[  781.553759] sp : 0000007fea2f98f0
[  781.553787] x29: 0000007fea2f98f0 x28: 0000000000000000 
[  781.553835] x27: 0000000000000000 x26: 0000000000000000 
[  781.553877] x25: 0000000000000000 x24: 0000000000000000 
[  781.553918] x23: 0000000000000000 x22: 0000000000000000 
[  781.553958] x21: 0000000000000000 x20: 0000000000000000 
[  781.553997] x19: 00000000004005f8 x18: 0000000000000a03 
[  781.554037] x17: 0000007fb28c9018 x16: 0000000000000000 
[  781.554077] x15: 0000007fb2917000 x14: 00000000000003f3 
[  781.554116] x13: 0000000000000000 x12: 00000000000003f3 
[  781.554156] x11: 0000000000000018 x10: 0000000000000000 
[  781.554196] x9 : 0000007fb28c99f8 x8 : 0000000000000040 
[  781.554236] x7 : 0000000000000000 x6 : 0000000000412016 
[  781.554277] x5 : 5555040055445500 x4 : 0000000040100401 
[  781.554318] x3 : 0000000000000000 x2 : 0000000000000001 
[  781.554358] x1 : 0000000000000000 x0 : 0000000000000000 

[  781.554446] Library at 0x4005e4: 0x400000 /home/ubuntu/tmp/main
[  781.560577] Library at 0x4005dc: 0x400000 /home/ubuntu/tmp/main
[  781.566535] vdso base = 0x7fb2915000

Pay particular attention to this line:

unhandled level 2 translation fault

The core dump does not succeed because of a bug in the kernel.

There is an interesting thread on this:
https://patchwork.kernel.org/patch/8120651/

Here is a short excerpt:

>> I remember that I met the same problem on the A57 and fix it by enable
>> the [bit6] of the CPUECTLR_EL1 and enable MN,

I don’t have Trace32 completely working yet, but I set a break point at “fault_name()” (“arch/arm64/mm/fault.c”), and the debugger gave this error rather than stopping on the break point:

Warning: MMU translation inconsistency (2)! Check MMU/TRANSlation settings!

…perhaps this is a debugger issue, but it may also be related to the dmesg register dump and missing core file (I’ve gone through all the possibilities I can think of for making sure a core should dump, and I agree that a core should be seen despite not occurring).

Thanks for a very descriptive answer.

I applied the patch from patchwork. The funny thing is that the patch is incomplete - there is only a function definition, no function call. Luckily they keep the emails so one can see where the function call should be placed. Also, asm/tlbflush.h needs to be included for flush_tlb_mm.

Either way, that didn’t do the trick - still not getting core dumps.

I have not figured it out yet, but I’m trying to understand how the Trace32 JTAG debugger can be used to locate the problem when the debugger itself gives this note and fails (I think the debugger depends on the MMU and by default assumes the MMU code itself is correct…else a breakpoint could be set and then continued, but instead continue locks the target…without the JTAG debugger there is no target lock upon hitting this…the JTAG debugger turns a non-fatal error into a fatal error):

Warning: MMU translation inconsistency (2)! Check MMU/TRANSlation settings!

That earlier comment excerpt I saw related to this is something better checked by someone at nVidia (perhaps if bit6 of CPUECTLR_EL1 is not enabled a patch could be created to test):

>> I remember that I met the same problem on the A57 and fix it by enable
>> the [bit6] of the CPUECTLR_EL1 and enable MN,

Just refreshing the topic timestamp so that maybe somebody from NVIDIA sees this post and provides an answer :)

and again :) somebody from NVidia? Even if you are working on it or don’t know/care how to fix it just let me know :)

Hi rogus,

Sorry for the late reply, this issue is under investigating for the root cause.
Once there is any update, I will share the status with you, please stay tuned.

Thanks

Is CONFIG_ELF_CORE enabled in the kernel? If not could you try enabling it?

Hi KayCCC,

Can you pls update on this? We are also facing problem in generating core dump.

Thanks
Pal

This is a kernel config you need to enable. See “CONFIG_COREDUMP” and “CONFIG_ELF_CORE”.

Here’s a nice reference:
[url]High performance, low power Embedded Computing Systems | Toradex Developer Center

Any update on this ? I am also facing same issue .

Thanks
Saurabh Srivastava

See:
https://devtalk.nvidia.com/default/topic/990012/jetson-tx1/no-core-dumps-on-tx1/post/5218156/#5218156

Are those enabled? These are not enabled by default on all L4T versions.

zcat /proc/config.gz | egrep '(CONFIG_COREDUMP|CONFIG_ELF_CORE)'

If enabled, and still no dump, which L4T version?


CONFIG_ELF_CORE is not set

CONFIG_COREDUMP=y

CONFIG_ELF_CORE is not enabled , can you assist me how to enable this .

L4tversion given below
R24 (release), REVISION: 2.1,

It’s just a kernel feature…the build is mostly standard, the complication is from cross-compiling and fixing a couple of issues before building.

Note that these kernel options are both designed to be integrated into the Image, and not built as modules.

Do you have a serial console? This makes it much easier to safely test new kernel installs. In 24.2.1 everything is still determined from the “/boot/extlinux/extlinux.conf” file and you can simply add new installs as new entries and pick the one you want at boot time…otherwise if it fails you’ll probably flash it to put the old system back in (hint: you can clone now if you want to back up the rootfs).

Here is some information on cross-compiling the R24.2.1 kernel (there were a couple of spots where source had to be changed for compile to succeed):
[url]https://devtalk.nvidia.com/default/topic/936880/jetson-tx1/jetson-tx1-24-1-release-need-help-with-complier-directions-can-not-complie/post/4885136/#4885136[/url]

There are a number of places to get information on kernel builds, e.g., [url]https://elinux.org/Jetson_TX1[/url]. It’s a lot of information though, so just ask if you need more details.

If you are using an Ubuntu host you can install cross compile tools with apt-get…I’m not sure what package names those are. To get the various configuration utilities from “make menuconfig” or “make nconfig” you’d also want to add package “libncurses5-dev”.