Themal control system is unstable

Dear All,

When I started to work with Jetson TX1 I used JetPack 2.0
Fan worked according current heat dissipation.

After some time of work on a system made by JetPack 2.1 I found my Jetson overheated because fan has stopped.

I have checked a fresh system, Fan works.

Now I have installed a new JetPack 2.2.1 and started a sample… Fan does not work. System is overheating and go slow down…

The only way to make it work is to type:

sudo echo 255 > /sys/kernel/debug/tegra_fan/target_pwm

Please comment.

what’s the value of /sys/kernel/debug/tegra_fan/target_pwm when overheat ?
Does the fan work with other value except 255

1> Measure the FAN_PWM_Q* signal(Pin3 of J11 fan connector)
2> Check the related node temperature which trigger the fan

I’ve checked. Hardware is working perfect. It works starting from about 60 and up to 255.

At the moment of overheating
cat /sys/kernel/debug/tegra_fan/target_pwm
returns 0

I think , it is a problem of broken software thermal control system, but I don’t know why…

Again, I can see it on the same hardware. (I have several Jetsons)
The fresh system made by JetPack 2.1 is Ok. (but once it was broken after some weeks, I cannot guarantee that our experiments had no effect on it…)

Now I have a fresh system made by JetPack 2.2.1 and cannot see thermal control.
Fan works well…

How can I check a thermal sensor?
Somewhere in sysfs?

See my posts here for info on thermal sensors: Thermal sensor - Jetson TX1 - NVIDIA Developer Forums

Thanks AlexP312!

I know this method.

Maybe you know how thermal control system in Linux for Tegra works.
I mean if you need to install lm_sensors, thermal control uses something else…

I just want to understand how it works to be able to check it and repair in my product system.

I would appreciate your help.

Update

We are talking about x64 version of rootfs.
It seems x64 version has a lot of problems…

Update

I have reflashed my Jetson with 24.1 x32 from JetPack 2.2.1.
It works fine.

The environment again looks similar to previous version, instead of x64 bit version.

x64 is buggy. Even X11 fails if MIPI DSI display is connected… :((((

Who knows how is a fan control loop made in L4T?

I found the thermal control loop inside a kernel, parameters in device tree

But I still cannot understand why:

  • my custom kernel has the same thermal options, but has no THERMAL messages in dmesg, and fun is working anyway
  • sometimes system is overheating even fan is starting
  • my kernel starts on overheated system, original turns system off… (dt has the same thermal limits)

To be continued…

Hello, Alex_Sharapov:
You can check the fan control table @ DTS
Search ‘active_trip_temps’, and it may show a table like following:

active_trip_temps = <0 51000 61000 71000 82000
				140000 150000 160000 170000 180000>;

‘active_pwm’

active_pwm = <0 80 120 160 255 255 255 255 255 255>;

During system running, you can check the CPU/GPU temp by:
cat /sys/kernel/debug/tegra_soctherm/cputemp
cat /sys/kernel/debug/tegra_soctherm/gputemp

once the temp values reach point, the corresponding PWM value will be set and fan should run.

That works well in my platform. Let me know your findings if that does not work in your platform.

br
Chenjian

Hi Jachen,

Thank you for the info.

cat /sys/kernel/debug/tegra_soctherm/cputemp
cat /sys/kernel/debug/tegra_soctherm/gputemp

Very useful.

I have almost found that the reason of a strange behavoir of the thermal control loop.
After I changed a kernel config the problem had disappeared.
Perhaps some debug option affects on it.

Best regards,
Alex

Hi ChenJian,

I’ve found out interesting thing…

  1. Some composition of kernel debug options in the kernel config paralyzes thermal control loop. I’m still trying to minimize quantity of possible reasons. But at least I have kernel config with working and unstable thermal subsystem.

  2. More interesting thing. Currently I’m debugging PCIe DMA. I have a data source in PCIe device and draw it in simple OpenGL app. App just transfers data to the texture on OpenGL context, it requires low CPU load.
    Unlike Particles sample app, my test app loads I’d say only GPU. Fan is not working!
    I mean, on the same system with the same kernel and DT Particles sample app causes the fan to start working, but app that only draws DMA data not.

PS

I have tested the same on my other units…

So, it seems I have one broken. Perhaps it is a overheat result…

I have a similar issue, on a TX1 although with JetPack 3.0

My fan won’t turn on despite the system getting hot

On jetpack 3 there seems to be no way to manually control or override the fan

ls /sys/kernel/debug/tegra_fan

ls: cannot access ‘/sys/kernel/debug/tegra_fan’: No such file or directory

Hi Corvus
Could you try Jepack 2.3.2 for the TX1 JetPack 2.3.2 should have no with different JetPack 3.0 , and I just try the 2.3.2 fan can control by the /sys/kernel/debug/tegra_fan

Thanks for the reply. The issue seems to be with the breakout board. I have the module running on an Auvidea J120, where as previously I had it running on the Jetson Developer kit.

Apparently the fan is connected differently on the J120 and the PWM-fan module doesn’t seem to detect it properly, so the entry in /sys/kernel/debug/tegra-fan is missing.

I have the same issue with both JetPack2.3.1 and 3.0

I have contacted Auvidea for support or an updated .dtb, but have’t gotten an answer yet. My workaround was to put a different always-on FAN for the meantime that’s directly hooked up to supply power for now.

With the Jetson dev kit everything seems to be working fine.