[Jetson-TK1, TegraK1] System failure when setting EMC clock manually

I am running a set of experiments where I am controlling the different core frequencies on the Tegra K1 (for example for GPU, CPU complexes and EMC (RAM)) to model power usage. To do this, I execute the following commands:

root@jetson: echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@jetson: echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
root@jetson: echo G > /sys/kernel/cluster/active
root@jetson: echo N > /sys/module/cpuidle/parameters/power_down_in_idle
root@jetson: echo 1 > /sys/kernel/debug/clock/override.gbus/state
root@jetson: echo 72000000 > /sys/kernel/debug/clock/override.gbus/rate
root@jetson: echo 1 > /sys/kernel/debug/clock/override.emc/state
root@jetson: echo 12750000 > /sys/kernel/debug/clock/override.emc/rate

After that, I run some benchmarks that quickly make the platform hang up completely. It needs to be restarted by pushing the reset button manually (there is no way to use it in this state, not directly with keyboard and display nor over SSH).

The benchmarks are simple CUDA-accelerated C-programs that reads video files from a ram filesystem, processes them and writes the output to /dev/null.

I did some further experiments and this issue seems to occur when the system is operating a lower EMC clock speeds. Starting from 396 MHz, I worked my way down the following frequencies:

12750 20400 40800 68000 102000 204000 300000 396000

For each frequency I ran my benchmark.

At 20.4 MHz the system became unresponsive upon TCP transmission (I transmit some power logs from an external logger machine to the Tegra over TCP when the benchmark is complete).

Hi krisrst,

Could you share your benchmark program.

By the way, please check the debugfs:

cat /sys/kernel/debug/clock/clock_tree

In the low frequency cause, some clock rate of devices might not be satisfied.
Bracket parts is the clock rate requirement.

Thank you

Hi krisrst,

Could you use UART console to check if there are any timeout message such as:

BUG: soft lockup - CPU#0 stuck for 22s!

This message is print by watchdog which is to monitor if the lowest priority can be served in 22 second.
In low EMC frequency case, it is possible that high memory operation process occupies the resource more than 22 sec, and it may explain why TCP becomes unresponsive.

Thanks.