CPU cores offline, had to manually enable?

After reflashing r19.3 I noticed my Jetson was very slow, and that only 1 CPU core was actually online.

( top showed 1 CPU; /sys/devices/system/cpu/online == “0” and offline == “1-3” )

I individually enabled each core by “echo 1 | sudo tee /sys/devices/system/cpuN/online” and that did the trick. Now my code is faster and top shows all four cores chugging away.

My question is, why did i have to manually enable the cores? Is that bad? How can I best configure that to happen after every boot?

Hi, I dont know how to help you but I have an off topic question: I’d like to update my jetson tk1 to l4t 19.3 (now im on stock 19.2) but im not sure how to do it (i ask before i break something) and it seems like you know so if you could please help to do it?

Thanks!

Is this the result of the default CPU scaling? The eLinux Power page has some details and some suggestion on how to run commands on boot: Jetson/Jetson TK1 Power - eLinux.org

MiFx - Strange that a forum search didn’t come up with anything, the instructions to flash can be found at:
Jetson TK1 support page > Linux for Tegra Rel-19 > Quick Start Guide
Further discussion about this should go into its own thread and not hijack this one any more ;)

That link was very helpful; thank you!

I’d still like someone to chime in if they know why the CPUs didn’t self-enable.

The scaling_governor defaults to “ondemand” and the demo I ran was pushing one CPU at > 95% load for almost 30 seconds. So I’d expect the other cores to enable (the demo is multi-threaded).

I wish I had checked “online” on R19.2, but have not looked at it…and am now at 19.3. The fast way to check:

cat `find /sys/devices/system/cpu -name 'online'`

Running this on my quad core Phenom II (x86_64) shows

cat `find /sys/devices/system/cpu -name 'online'`
1
1
1
0-3

I’m just guessing, but it seems the layout is that of a single cpu used to boot the system, plus 3 listed as enabled in SMP after boot. My R19.3 shows the same behavior as what you mentioned:

cat `find . -name 'online'`
1
0
0
0
0

So it does indeed look like flashing is leaving cores unused. What is interesting is that there is one extra core, corresponding to the jetson-tk1 “k1”, along with the 4 ARM Cortex A15 cores. Because core 0 is listed as the ARMv7, we know jetson-tk1 is not the boot core (if and only if my assumption above is correct).

So it seems we do need to do something to use all 4 ARM cores. Anyone with R19.2 still installed that can show the listing of this?

cat `find /sys/devices/system/cpu -name 'online'`

Don’t know if someone from nVidia will see this, but my R19.3 /proc/cmdline shows this:

console=ttyS0,115200n8 console=tty1 no_console_suspend=1 lp0_vec=2064@0xf46ff000 video=tegrafb mem=1862M@2048M memtype=255 ddr_die=2048M@2048M section=256M pmuboard=0x0177:0x0000:0x02:0x43:0x00 vpr=151M@3945M tsec=32M@3913M otf_key=c75e5bb91eb3bd947560357b64422f85 usbcore.old_scheme_first=1 core_edp_mv=1150 core_edp_ma=4000 tegraid=40.1.1.0.0 debug_uartport=lsport,3 power_supply=Adapter audio_codec=rt5640 modem_id=0 android.kerneltype=normal usb_port_owner_info=0 fbcon=map:1 commchip_id=0 usb_port_owner_info=0 lane_owner_info=6 emc_max_dvfs=0 touch_id=0@0 tegra_fbmem=32899072@0xad012000 board_info=0x0177:0x0000:0x02:0x43:0x00 root=/dev/mmcblk0p1 rw rootwait tegraboot=sdmmc gpt

I did flash my R19.3 kernel for the same thing plus network bridging and CDROM filesystems (under U-Boot instead of fastboot). If the problem is related to command line at boot, perhaps it is an oversight in U-Boot parameters which was not carried over from fastboot.

So what is the correct fix for this to have the cores available after flash without manually echoing 1 to the online files?

I think you’re right…in part this is the typical way it is done with ARM, and I tried an experiment that backs this up.

I downloaded, ran ./configure on, and installed htop (“top” on steroids). See:

When I ran htop, it showed only the first core active. Everything else was at 0% usage. I then ran a kernel compile with make -j4, and all cores pretty much maxed out. During the compile I ran this:
cat find /sys/devices/system/cpu -name 'online'

…result was that cores which were previously not listed online are now online. So I think this is solved…except for the original post’s problem with sluggishness. Can the auto scaling of enabling/disabling cpu cores be tuned? Or is this a bug of the mechanism triggering scaling?

The Linux for Tegra kernel implements cpuquiet, a mechanism for dynamically hot-plugging CPU cores based upon workload/policy. This mechanism can be disabled:

# check state of cpuquiet auto-hotplug
root@tegra-ubuntu:~# cat /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
1

# disable cpuquiet and verify
root@tegra-ubuntu:~# echo 0 >  /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
root@tegra-ubuntu:~# cat /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
0

# online CPU cores manually
root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu0/online
bash: echo: write error: Invalid argument
root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu1/online
root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu2/online
root@tegra-ubuntu:~# echo 1 > /sys/devices/system/cpu/cpu3/online

The ‘invalid argument’ error above can be ignored-- this occurs when it is requested to change online/offline state to the current state.

Hi, I’m running stock 19.2(g6a2d13a) - output shows

cat `find /sys/devices/system/cpu -name 'online'`
1
0
0
0
0

I wrote a quick multi threaded test program though, and while running it

cat `find /sys/devices/system/cpu -name 'online'`
1
1
1
1
0-3

L4T kernel does CPU core switching based on Load(CPU boundness), when there is not enough load for 4 cores to be ON, its scheduler will migrate the tasks to single core, and will let all other cores go off-line OR OFF, to save power, as and when you have enough load that a single core can;t handle, it switches others to ON and run at clockspeed proportional to the load till a max of 2.3GHz

If you have most of the CPUs off-line, that means, you don’t have enough load on CPU(distinguish CPU Bound Load and I/O bound load), before you conclude

root@tegra-ubuntu:~# . jetson-tk1-monitor.sh
CPU_CLUSTER     CLOCK(MHz)      CPU0    CPU1    CPU2    CPU3    CPU
G               204             1       0       0       0       0
LP              51              1       0       0       0       0
G               204             1       0       0       0       0
G               204             1       0       0       0       0
G               204             1       0       0       0       0
^C
root@tegra-ubuntu:~# cpu_loop_hfp -s2 -t 2 &
[1] 5400
root@tegra-ubuntu:~# Number of threads  = 2
Severity Level          = 2
thread_id = 1    thread_addr = 0x13008
thread_id = 2    thread_addr = 0x1300c

root@tegra-ubuntu:~# . jetson-tk1-monitor.sh
CPU_CLUSTER     CLOCK(MHz)      CPU0    CPU1    CPU2    CPU3    CPU
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
G               2320            1       1       0       0       0-1
^C
root@tegra-ubuntu:~# cpu_loop_hfp -s2 -t 3 &
[2] 5533
root@tegra-ubuntu:~# Number of threads  = 3
Severity Level          = 2
thread_id = 1    thread_addr = 0x13008
thread_id = 2    thread_addr = 0x1300c
thread_id = 3    thread_addr = 0x13010

root@tegra-ubuntu:~# . jetson-tk1-monitor.sh
CPU_CLUSTER     CLOCK(MHz)      CPU0    CPU1    CPU2    CPU3    CPU
G               2320            1       1       1       1       0-3
G               2320            1       1       1       1       0-3
G               2320            1       1       1       1       0-3
G               2320            1       1       1       1       0-3
G               2320            1       1       1       1       0-3
G               2320            1       1       1       1       0-3
^C

I hope this answers your queries
jetson-tk1-tools.zip (8 KB)

OK, it seems possible that the demo I ran was 1) querying the online CPUs, which defaulted to one and 2) launching an equal number of threads. (I don’t have time to dig through all the source code for OpenBLAS).

That might explain why manually bringing the cores online first resulted in a 4x speedup, but otherwise the demo maxed out a single core. The demo was CPU bound, because enabling cores resulted in heavy load on all four, and completed in ~ 1/4th the time.

Thanks for the insight folks!