Temperature Throttling Information

Hey. I am trying to get more information on how the Tegra X1 throttles at higher temperatures.
In my device tree i found the following trigger limits.

102500 cpu_critical tegra-shutdown
98500 cpu_heavy tegra-heavy
89000 cpu_throttle cpu_balanced

103000 gpu_critical tegra-shutdown
100000 gpu_heavy tegra-heavy
90500 gpu_throttle gpu-balanced

Yet I am seeing throttling still occur when hitting 75C.

Also is there any way to disable throttling so I can confirm this?

Which thermal zone’s value is this 75C from?

CPU-thermal 1.

Seems it might not be exactly 75C and the throttling happens at varying degrees.

Just some more detail: I am running a GPU intensive program w/ CUDA which makes the GPU utilize 100% GPU.
My CPU clock frequency is always running at 1734000 which is the max.

When I initialize my worker program it seems the cpu frequency will start to lower down depending on how hot the temperature is.
If 75-85C it will lower down to 1326000
45-70C it will lower down to 1555500 or 1428000

Anything below that the cpu frequency will stary at 1734000

Hence why I want to disable all throttling if possible to see if this is the system trying to restrict performance.
Also this may be power related since in dmesg I often see “WARNING - Battery Over Current Limit hit, please refer to the Jetson Power management application note” while executing the program (though my hardware does not run on a battery).

OK, per your detail info, it might not be caused by temperature…what’s the value of your input voltage? 19V or? If not 19V, please try 19V to see if this throttling still happen.

Hi x1tester62,

Has this issue been clarified and resolved?
Any further information can be shared?

Thanks

Hi,

I am seeing a similar issue. CPU is regulating speed at expected 89°C, however GPU is throttling back seemingly at temperature 79°C.
For CPU I am using onboard thermal zone1 (CPU-therm), and for GPU am using onboard thermal zone2 (GPU-therm).

Secondary issue is that the silicon seems to be seeing a 10°C temp drop across it, I would be expected these numbers to be similar temperatures.

I am using CPU and GPU stressing software which has all cores running at 100%. VDD input rail at 12V.

Thanks

The maximum operating temperature are: T. cpu = 89°C, T. gpu = 90.5°C. What’s the value of zone1 when you saw zone2 reach 79°C? Throttling might be caused by T. cpu not T. gpu in your case.

EDIT:
Thermal zones are in different area of chip, which zones you observed reach 10C gap? How long does ti keep working when you see that? Is it in stable work status? The temperature gap is usually less than 3C, but sometimes it could reach peak 10C, it depends on system work status and heat dissipation.

When zone2 started throttling at 79°C, zone 1 was at ~89°C. In these conditions, CPU temp stabilised at 91°C with some throttle and GPU at 84. This was left for an hour.

In a second test I left it again for an hour, GPU throttled at zone2=86°C and zone1=89°C. This was a stable condition.

Does the GPU throttle back at the hottest silicon temperature then, regardless of GPU reading?

Per your info, all throttling took place based on zone1, not on zone2. The throttling only start when zone1 reaches 89°C OR zone2 reaches 90.5°C.

Yes, essentially. What still doesn’t make a huge amount of sense is that zone 1 only reached 89°C when the GPU throttled. Unless zone 2 is not actually reading GPU temp and there is a 1.5°C temperature drop across the silicon?? Zone 2 never rose above 85°C with high throttling.

Generally zone 1 reaching 89C is faster than zone 2 reaching 90.5C, but in case the GPU loading is much higher, the throttling could be caused by zone 2, it is also due to the thermal design.

It’s just a concern that zone 2 is controlling the GPU speed and that is throttling back well below 90.5°C. I have noticed from the TX2 thermal design guide now that the GPU temp is measured by AO-Therm (zone0). This would explain the temperature discrepancy. Can you confirm that this is the same for the TX1 and the diode naming convention is just a bit misleading?

Hi max, it is different to TX1, NV characterization showed AO-Therm to provide the most effective temperature control for the GPU on TX2.