Jetson Nano shows 100% CPU Usage after 30 minutes with Deepstream-app demo

Hi,
I originally repeated this issue using my own live streams.
However for ease of anyone to repeat this issue, I have repeated it with the deepstream-app demo. While running the demo with some minor tweaks (details later), the video o/p & performance starts off ok. However, in less than 30 minutes, the CPU (4 cores) will rail at 100% & the video performance of all 8 streams have dramatically decreased.
The following is a graph of how the 1st video stream degrades over time - note that each sample is 5s:
https://i.imgur.com/qSB9Pi1.png

The following are the details for repeating the issue with deep-stream app:
-The latest Deepstream 4.0.1 is being used & the Jetson Nano is jumpered for high power mode with a 4A power supply. A fan is also attached & continuously running.
-The following modifications were done to the demo configuration file, source8_1080p_dec_infer-resnet_tracker_tiled_display_fp16_nano.txt:

  • [b]In [sink0],"sync" was changed from 1 to 0 In [tests], "file-loop" was changed from 0 to 1[/b]

With the above settings, the test runs without any issue. The overall CPU usage is approx. 45% (measured with top or jtop).The reported performance FPS in the terminal is approximately the same at the end of an overnight run as it was at the beginning i.e. approx. 30FPS average.

However we made one change to the configuration file before retesting:
In [source0], “drop-frame-interval” which was originally commented out was enabled & set to 3 as shown:
drop-frame-interval=3

The test was restarted & it seemed to start up correctly. The sample mp4 was repeated 8 times in the 8 windows & object detection/tracking seemed to work well. The performance was shown as follows:**PERF: FPS 0 (Avg) FPS 1 (Avg) FPS 2 (Avg) FPS 3 (Avg) FPS 4 (Avg)
FPS 5 (Avg) FPS 6 (Avg) FPS 7 (Avg)
**PERF: 12.70 (12.70) 25.38 (25.38) 11.90 (11.90) 12.45 (12.45) 8.41 (8.41)
26.85 (26.85) 17.34 (17.34) 8.99 (8.99)
**PERF: 15.97 (15.74) 14.91 (15.74) 10.57 (10.67) 12.18 (12.21) 7.38 (7.44)
19.88 (20.53) 14.90 (15.10) 8.03 (8.10)
However after 8 minutes, the CPU cores started getting pegged as shown with the jtop utility:
https://i.imgur.com/DvcQnSa.png

After 30 minutes, the cars/buses in the videos shown on the output are “slowing down”.
The performance output on the terminal has been greatly reduced:
**PERF: 4.13 (7.80) 4.59 (7.77) 4.41 (7.76) 4.43 (7.71) 4.19 (7.72)4.53 (7.72) 4.13 (7.80) 4.25 (7.73)
**PERF: 4.44 (7.79) 4.36 (7.76) 4.06 (7.75) 4.50 (7.70) 4.56 (7.71)4.26 (7.71) 4.36 (7.79) 4.45 (7.72)

The top utility confirms that it is deepstream that is using all of the CPU:


Here are the details shown by the jtop utility:
https://i.imgur.com/7bbWmev.png

Previous experimentation prompted us to enable the drop-frame-interval & set it to 2 or 3 (depending on the bandwidth of the attached IP camera).

-With this setting disabled, high latency was seen between actual movement & output seen on screen. By changing it to 2 or 3 (for a 30fps camera), improved this latency & didn’t seem to affect the Object detection/tracking.
-This variable seemed a way to “normalize” the inputs if different camera bandwidths are used.
-Our understanding from the documentation is that this setting determines which frames that the hardware decoder outputs e.g. 3 would mean that the decoder outputs every 3rd frame.
-If that understanding is correct, we would expect that the CPU usage would actually decrease as this value increases. Or is that an incorrect assumption?

We appreciate any help that you can offer with this.
I have attached the configuration file below.

Thanks,
Vince

# Copyright (c) 2019 NVIDIA Corporation.  All rights reserved.
#
# NVIDIA Corporation and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA Corporation is strictly prohibited.

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=2
columns=4
##Orig was width-1280 & height=720
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=3
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=8
##Orig was commented out, 3 causes High CPU usage,1 worked with demo mp4
drop-frame-interval=3
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=5
##Orig was 1 but we have seen stuttering with it
sync=0
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1

[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=2000000
output-file=out.mp4
source-id=0

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
sync=0
bitrate=4000000
# set below properties in case of RTSPStreaming
rtsp-port=8554
udp-port=5400

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=8
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=../../models/Primary_Detector_Nano/resnet10.caffemodel_b8_fp16.engine
batch-size=8
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=4
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_nano.txt

[tracker]
enable=1
tracker-width=480
tracker-height=272
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-config-file required for IOU only
#ll-config-file=iou_config.txt
gpu-id=0

[tests]
##Orig was 0
file-loop=1

Hi vincent.mcgarry,

We can’t reproduce your issue on JetPack-4.2.2 + DS-4.0.1 on Jetson-Nano.
After running 30 mins, CPUs running about 40%:

RAM 2570/3956MB (lfb 70x4MB) SWAP 3/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) <b>CPU [42%@1428,40%@1428,34%@1428,39%@1428]</b> EMC_FREQ 46%@1600 GR3D_FREQ 99%@921 NVDEC 716 APE 25 PLL@80C CPU@84C PMIC@100C GPU@81C AO@91.5C thermal@82.5C POM_5V_IN 10651/8963 POM_5V_GPU 3775/2633 POM_5V_CPU 1659/1615
RAM 2570/3956MB (lfb 70x4MB) SWAP 3/1978MB (cached 0MB) IRAM 0/252kB(lfb 252kB) <b>CPU [40%@1428,39%@1428,40%@1428,40%@1428]</b> EMC_FREQ 46%@1600 GR3D_FREQ 99%@921 NVDEC 716 APE 25 PLL@79.5C CPU@84C PMIC@100C GPU@81C AO@92C thermal@82.5C POM_5V_IN 10553/8964 POM_5V_GPU 3859/2634 POM_5V_CPU 1579/1615

FPS keeps on 30:

**PERF: FPS 0 (Avg)	FPS 1 (Avg)	FPS 2 (Avg)	FPS 3 (Avg)	FPS 4 (Avg)	FPS 5 (Avg)	FPS 6 (Avg)	FPS 7 (Avg)	
**PERF: 30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	30.06 (30.00)	
**PERF: 29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	29.98 (30.00)	
**PERF: 30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	30.00 (30.00)	
**PERF: 29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	
**PERF: 29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)	29.99 (30.00)

Could you run below two commands to set max performance mode and run again?

sudo nvpmodel -m 0
sudo jetson_clocks

What Jetson-Nano are you using? emmc or sdcard?
And what’s JetPack and Deepstream version are you testing?

Thanks!

Hi,

Thanks very much for trying to repeat this.

I am using the original Jetson Nano Developer kit with a 64GB Class 10 sdcard (no eMMC).
I created my sdcard image from your latest updates on 9/24/19…Jetpack 4.2.2 and Deepstream 4.0.1.

Did you make the changes that I suggested - especially the following?

–In [sink0],“sync” was changed from 1 to 0
–drop-frame-interval=3

From your 30FPS performance, I am guessing that you don’t have the drop-frame-interval enabled to 3.

Anyhow, I power cycled my Nano this morning & decided to repeat the experiment but I am using the configuration file that I attached yesterday - that includes the 2 changes above.
First of all I ran the two commands you suggested:

hduser@NVNano4:~/LENS$ sudo nvpmodel -m 0
hduser@NVNano4:~/LENS$ sudo jetson_clocks
hduser@NVNano4:~/LENS$ date
Thu Oct 24 08:27:59 MDT 2019

Everything started off ok & Performance was as expected at approx. 10 FPS (because of drop-frame-interval=3).

**PERF: 10.66 (11.25)   16.46 (16.33)   18.40 (15.31)   17.19 (12.46)   8.29 (11.83)       16.26 (16.20)   7.54 (10.66)    9.43 (9.83)
**PERF: 11.59 (11.27)   15.10 (16.24)   14.33 (15.24)   16.92 (12.79)   8.51 (11.59)       15.64 (16.16)   7.63 (10.44)    13.84 (10.12)

I will use the same utility that you used - tegrastats. As shown, the CPUs started off ok also.

hduser@NVNano4:~/LENS$ sudo tegrastats
RAM 3160/3956MB (lfb 110x4MB) SWAP 62/1978MB (cached 1MB) IRAM 0/252kB(lfb 252kB) CPU [26%@1428,26%@1428,29%@1428,27%@1428] EMC_FREQ 39%@1600 GR3D_FREQ 0%@921 NVDEC 716 APE 25 PLL@26C CPU@29.5C PMIC@100C GPU@27.5C AO@34C thermal@28.75C POM_5V_IN 4971/4           971 POM_5V_GPU 328/328 POM_5V_CPU 820/820
RAM 3160/3956MB (lfb 110x4MB) SWAP 62/1978MB (cached 1MB) IRAM 0/252kB(lfb 252kB) CPU [29%@1428,28%@1428,25%@1428,27%@1428] EMC_FREQ 39%@1600 GR3D_FREQ 99%@921 NVDEC 716 APE 25 PLL@26C CPU@29.5C PMIC@100C GPU@28C AO@34C thermal@28.75C POM_5V_IN 6279/56           25 POM_5V_GPU 1779/1053 POM_5V_CPU 728/774

After a few minutes, the CPU usage started going up & FPS going down.

In <30 minutes, the Performance had dramatically dropped as shown:

hduser@NVNano4:~/LENS$ date;tail log_ds4_wip.log
Thu Oct 24 08:54:33 MDT 2019
**PERF: 5.04 (8.58)     5.57 (8.62)     4.72 (8.68)     5.40 (8.59)     5.40 (8.49)5.11 (8.68)     4.80 (8.67)     4.80 (8.59)

Tegrastats confirm that the 4 CPUs are pegged at 100%:

hduser@NVNano4:~/LENS$ date;sudo tegrastats --interval 5000
Thu Oct 24 08:55:53 MDT 2019
RAM 3295/3956MB (lfb 107x4MB) SWAP 168/1978MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [100%@1428,100%@1428,100%@1428,100%@1428] EMC_FREQ 27%@1600 GR3D_FREQ 0%@921 NVDEC 716 APE 25 PLL@30C CPU@33C PMIC@100C GPU@31.5C AO@38C thermal@32.25C POM_5V_IN 6033/6033 POM_5V_GPU 815/815 POM_5V_CPU 1510/1510
RAM 3290/3956MB (lfb 107x4MB) SWAP 175/1978MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [100%@1428,100%@1428,100%@1428,100%@1428] EMC_FREQ 27%@1600 GR3D_FREQ 0%@921 NVDEC 716 APE 25 PLL@30C CPU@33.5C PMIC@100C GPU@31.5C AO@38C thermal@32.5C POM_5V_IN 5879/5956 POM_5V_GPU 694/754 POM_5V_CPU 1469/1489

Please let me know if you need any more details.

Thanks,
Vince

Hi,

this issue maybe the same with me.

[url]https://devtalk.nvidia.com/default/topic/1065188/deepstream-sdk/why-nvv4l2decoder-use-too-much-cpu-/post/5395132/#5395132[/url]

Thanks.

Hi ClancyLian,

Thanks for your update.

So I read your post & my take-away from it is that you had normal performance if you don’t set the drop-frame-interval attribute.
Since you are using a different SOC to me… “tesla P4 gpu” & we have narrowed down our extremely high use of CPUs to the one variable, it sure smells like a bug to me.

Hopefully NVIDIA will give some priority to this issue & provide a solution!

Thanks,
Vince

Hi vincent.mcgarry,

We will investigate this issue and update to you. Thanks!

hi, Vince , carolyuu

Is there a temporary solution to drop frame before decoder ?

Thanks.

Hi ClancyLian,

Our internal team still investigate issue.
Sorry for late responds and inconvenience.

Hi,

I have had the same problem for several days. I set “drop-frame-interval” too, and the cpu increase over time.

Has this issue been solved?

I also meet the issue in Jetson tx2. once enable drop-frame-interval, the problem will happen in one hour. Jetpack 4.2.2 and deepstream 4.0.1. Can you help me? my project will deploy in December.

Hi,

Does anyone from Nvidia have any update on this issue?

It has been over a month & it appears from this thread that several others are experiencing the same issue.

Thanks,
Vince

Hi
we can recreated your issue, Our dev team is looking into this issue, sorry for the late response.

Hi,
Please apply the attached patch to gst-v4l2, rebuild and replace libgstnvvideo4linux2.so

gst-v4l2source is in
https://developer.nvidia.com/embedded/r32-2-3_Release_v1.0/Sources/T186/public_sources.tbz2
https://developer.nvidia.com/embedded/r32-2-3_Release_v1.0/Sources/T210/public_sources.tbz2
DS401_JETSON_TEST_0001-gstv4l2dec-Fix-high-CPU-usage-in-drop-frame.zip (1.08 KB)

Hi, DaneLLL

Have any solution for T4 and V100?

Hi,

You can also build it for x86 PC with NVIDIA dGPU. Please check information about gst-v4l2:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/index.html
Plugin and Library Source Details

Hi DaneLLL,

Thanks very much for the fix to my issue.
I followed your instructions & applied the patch to my Jetson Nano test.

Ran the deepstream-app demo test for at least 16 hours without any failure or increase in CPU usage (actually using 8 internet cameras).
The tegrastats utility shows the following:

RAM 3094/3956MB (lfb 105x4MB) SWAP 75/1978MB (cached 4MB) IRAM 0/252kB(lfb 252kB) CPU [14%@1428,10%@1428,8%@1428,11%@1428] EMC_FREQ 19%@1600 GR3D_FREQ 0%@921 NVDEC 716 APE 25 PLL@29.5C CPU@33C PMIC@100C GPU@31.5C AO@36.5C thermal@32.75C POM_5V_IN 3886/3844 POM_5V_GPU 250/208 POM_5V_CPU 500/521

(Previously the 4 cores would be pegged at 100% within 30 minutes).

Do you know when this “fix” will be formally released?

Thanks,
Vince

Hi vincent, can you share the so file? I have no envirment to build it. thanks very much.

Hi Mike_deng,

I actually built it on the Nano using the instructions from Post#13 (Used T210 source for the Nano).

Since I couldn’t attach the .so, I have compressed it & attached.
(Initially I changed the name & compressed to a .zip, but when attached, the forum detected it as “Infected”. So I compressed to a .7z without changing name & it was accepted)

Thanks,
Vince
libgstnvvideo4linux2.7z (67.6 KB)

To vincent, thanks for your share.

Hi,
Patch in #13 is a quick fix. We have reviewed and updated to new patch set. Please check the attachment. Thanks.
DS40X_TEST_0001-gstv4l2dec-Fix-high-CPU-usage-in-drop-frame.zip (1.87 KB)

1 Like