I have test YoloV3 example of DS 4.0 with some issue.
1.I’m running with default config file deepstream_app_config_yoloV3.txt. The overall frame rate is about 1~2 fps. Is it correct ?
2.It takes several minutes to do “Building the TensorRT Engine…”. Is there way to do it ONLY for the very first time running ?
Below is the procdure and result
$ bzip2 -dc deepstream_sdk_v4.0_jetson.tbz2 | tar xvf -
$ cd deepstream_sdk_v4.0_jetson
$ sudo apt-get install \
libssl1.0.0 \
libgstreamer1.0-0 \
gstreamer1.0-tools \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-bad \
gstreamer1.0-plugins-ugly \
gstreamer1.0-libav \
libgstrtspserver-1.0-0 \
libjansson4
$ sudo tar -xvf binaries.tbz2 -C /
$ sudo ./install.sh
$ sudo ldconfig
$ rm ${HOME}/.cache/gstreamer-1.0/registry.aarch64.bin
$ cd sources/objectDetector_Yolo
$ ./prebuild.sh
$ cd nvdsinfer_custom_impl_Yolo
$ export CUDA_VER=10.0
$ make
g++ -c -o nvdsinfer_yolo_engine.o -Wall -std=c++11 -shared -fPIC -I../../includes -I/usr/local/cuda-10.0/include nvdsinfer_yolo_engine.cpp
g++ -c -o nvdsparsebbox_Yolo.o -Wall -std=c++11 -shared -fPIC -I../../includes -I/usr/local/cuda-10.0/include nvdsparsebbox_Yolo.cpp
g++ -c -o yoloPlugins.o -Wall -std=c++11 -shared -fPIC -I../../includes -I/usr/local/cuda-10.0/include yoloPlugins.cpp
g++ -c -o trt_utils.o -Wall -std=c++11 -shared -fPIC -I../../includes -I/usr/local/cuda-10.0/include trt_utils.cpp
g++ -c -o yolo.o -Wall -std=c++11 -shared -fPIC -I../../includes -I/usr/local/cuda-10.0/include yolo.cpp
/usr/local/cuda-10.0/bin/nvcc -c -o kernels.o --compiler-options '-fPIC' kernels.cu
g++ -o libnvdsinfer_custom_impl_Yolo.so nvdsinfer_yolo_engine.o nvdsparsebbox_Yolo.o yoloPlugins.o trt_utils.o yolo.o kernels.o -shared -Wl,--start-group -lnvinfer_plugin -lnvinfer -lnvparsers -L/usr/local/cuda-10.0/lib64 -lcudart -lcublas -lstdc++fs -Wl,--end-group
$ cd ..
$ deepstream-app -c deepstream_app_config_yoloV3.txt
Using winsys: x11
Creating LL OSD context new
0:00:00.777696804 8123 0x36493010 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:01.032222953 8123 0x36493010 WARN nvinfer gstnvinfer.cpp:515:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): INT8 not supported by platform. Trying FP16 mode.
Loading pre-trained weights...
Loading complete!
Total Number of weights read : 62001757
layer inp_size out_size weightPtr
(1) conv-bn-leaky 3 x 608 x 608 32 x 608 x 608 992
(2) conv-bn-leaky 32 x 608 x 608 64 x 304 x 304 19680
(3) conv-bn-leaky 64 x 304 x 304 32 x 304 x 304 21856
(4) conv-bn-leaky 32 x 304 x 304 64 x 304 x 304 40544
(5) skip 64 x 304 x 304 64 x 304 x 304 -
(6) conv-bn-leaky 64 x 304 x 304 128 x 152 x 152 114784
(7) conv-bn-leaky 128 x 152 x 152 64 x 152 x 152 123232
(8) conv-bn-leaky 64 x 152 x 152 128 x 152 x 152 197472
(9) skip 128 x 152 x 152 128 x 152 x 152 -
(10) conv-bn-leaky 128 x 152 x 152 64 x 152 x 152 205920
(11) conv-bn-leaky 64 x 152 x 152 128 x 152 x 152 280160
(12) skip 128 x 152 x 152 128 x 152 x 152 -
(13) conv-bn-leaky 128 x 152 x 152 256 x 76 x 76 576096
(14) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 609376
(15) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 905312
(16) skip 256 x 76 x 76 256 x 76 x 76 -
(17) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 938592
(18) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 1234528
(19) skip 256 x 76 x 76 256 x 76 x 76 -
(20) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 1267808
(21) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 1563744
(22) skip 256 x 76 x 76 256 x 76 x 76 -
(23) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 1597024
(24) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 1892960
(25) skip 256 x 76 x 76 256 x 76 x 76 -
(26) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 1926240
(27) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 2222176
(28) skip 256 x 76 x 76 256 x 76 x 76 -
(29) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 2255456
(30) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 2551392
(31) skip 256 x 76 x 76 256 x 76 x 76 -
(32) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 2584672
(33) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 2880608
(34) skip 256 x 76 x 76 256 x 76 x 76 -
(35) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 2913888
(36) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 3209824
(37) skip 256 x 76 x 76 256 x 76 x 76 -
(38) conv-bn-leaky 256 x 76 x 76 512 x 38 x 38 4391520
(39) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 4523616
(40) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 5705312
(41) skip 512 x 38 x 38 512 x 38 x 38 -
(42) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 5837408
(43) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 7019104
(44) skip 512 x 38 x 38 512 x 38 x 38 -
(45) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 7151200
(46) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 8332896
(47) skip 512 x 38 x 38 512 x 38 x 38 -
(48) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 8464992
(49) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 9646688
(50) skip 512 x 38 x 38 512 x 38 x 38 -
(51) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 9778784
(52) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 10960480
(53) skip 512 x 38 x 38 512 x 38 x 38 -
(54) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 11092576
(55) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 12274272
(56) skip 512 x 38 x 38 512 x 38 x 38 -
(57) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 12406368
(58) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 13588064
(59) skip 512 x 38 x 38 512 x 38 x 38 -
(60) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 13720160
(61) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 14901856
(62) skip 512 x 38 x 38 512 x 38 x 38 -
(63) conv-bn-leaky 512 x 38 x 38 1024 x 19 x 19 19624544
(64) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 20150880
(65) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 24873568
(66) skip 1024 x 19 x 19 1024 x 19 x 19 -
(67) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 25399904
(68) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 30122592
(69) skip 1024 x 19 x 19 1024 x 19 x 19 -
(70) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 30648928
(71) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 35371616
(72) skip 1024 x 19 x 19 1024 x 19 x 19 -
(73) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 35897952
(74) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 40620640
(75) skip 1024 x 19 x 19 1024 x 19 x 19 -
(76) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 41146976
(77) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 45869664
(78) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 46396000
(79) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 51118688
(80) conv-bn-leaky 1024 x 19 x 19 512 x 19 x 19 51645024
(81) conv-bn-leaky 512 x 19 x 19 1024 x 19 x 19 56367712
(82) conv-linear 1024 x 19 x 19 255 x 19 x 19 56629087
(83) yolo 255 x 19 x 19 255 x 19 x 19 56629087
(84) route - 512 x 19 x 19 56629087
(85) conv-bn-leaky 512 x 19 x 19 256 x 19 x 19 56761183
(86) upsample 256 x 19 x 19 256 x 38 x 38 -
(87) route - 768 x 38 x 38 56761183
(88) conv-bn-leaky 768 x 38 x 38 256 x 38 x 38 56958815
(89) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 58140511
(90) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 58272607
(91) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 59454303
(92) conv-bn-leaky 512 x 38 x 38 256 x 38 x 38 59586399
(93) conv-bn-leaky 256 x 38 x 38 512 x 38 x 38 60768095
(94) conv-linear 512 x 38 x 38 255 x 38 x 38 60898910
(95) yolo 255 x 38 x 38 255 x 38 x 38 60898910
(96) route - 256 x 38 x 38 60898910
(97) conv-bn-leaky 256 x 38 x 38 128 x 38 x 38 60932190
(98) upsample 128 x 38 x 38 128 x 76 x 76 -
(99) route - 384 x 76 x 76 60932190
(100) conv-bn-leaky 384 x 76 x 76 128 x 76 x 76 60981854
(101) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61277790
(102) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 61311070
(103) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61607006
(104) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 61640286
(105) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61936222
(106) conv-linear 256 x 76 x 76 255 x 76 x 76 62001757
(107) yolo 255 x 76 x 76 255 x 76 x 76 62001757
Output blob names :
yolo_83
yolo_95
yolo_107
Total number of layers: 257
Total number of layers on DLA: 0
Building the TensorRT Engine...
Building complete!
0:18:18.114447752 8123 0x36493010 INFO nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary_gie_classifier> NvDsInferContext[UID 1]:generateTRTModel(): Storing the serialized cuda engine to file at /home/gigijoe/deepstream_sdk_v4.0_jetson/sources/objectDetector_Yolo/model_b1_fp16.engine
Deserialize yoloLayerV3 plugin: yolo_83
Deserialize yoloLayerV3 plugin: yolo_95
Deserialize yoloLayerV3 plugin: yolo_107
Runtime commands:
h: Print this help
q: Quit
p: Pause
r: Resume
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.
**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
** INFO: <bus_callback:163>: Pipeline ready
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
** INFO: <bus_callback:149>: Pipeline running
Creating LL OSD context new
**PERF: 1.90 (1.90)
**PERF: 1.90 (1.90)
**PERF: 1.91 (1.90)
...
...
...
**PERF: 1.89 (1.91)
**PERF: 1.90 (1.91)
** INFO: <bus_callback:186>: Received EOS. Exiting ...
Quitting
App run successful
I am also getting around the same fps. I believe this sample does not make use of tracker. It would be great if we can figure out how to use tracker with this sample so that the performance significantly improves.
Hello, I’m currently experiencing the same problem with reference YoloV3 app.
FPS 1.91
GR3D_FREQ 99%921
Jetson Nano
DeepStream 4.0
TensorRT 5.1.6.1
CUDA 10.0
Do you have any tips?
Thanks
UPDATE
After some testing and reading docs and DeepStreamSDK’s page I think the YoloV3 example app is not supposed/optimized for running on Nano platform. The YoloV3_tiny however really is. On Tiny version I have constant ~25fps for 1080p stream.
i am trying to test deepstream with yolo-tiny, i have followed the steps in the readme, but whenever i run it i get a black window.
here is the terminal output:
Using winsys: x11
Creating LL OSD context new
Deserialize yoloLayerV3 plugin: yolo_17
Deserialize yoloLayerV3 plugin: yolo_24
Runtime commands:
h: Print this help
q: Quit
p: Pause
r: Resume
NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
To go back to the tiled display, right-click anywhere on the window.
** INFO: <bus_callback:163>: Pipeline ready
Opening in BLOCKING MODE
NvMMLiteOpen : Block : BlockType = 261
NVMEDIA: Reading vendor.tegra.display-size : status: 6
NvMMLiteBlockCreate : Block : BlockType = 261
**PERF: FPS 0 (Avg)
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
**PERF: 0.00 (0.00)
please note that i am getting “** INFO: <bus_callback:163>: Pipeline ready”
but not “** INFO: <bus_callback:149>: Pipeline running” perhaps this is related to the issue?
Can you resubmit the example of YoloV3 with DS 4.0 and a video that really works? we are many people who We will have the same problem with this example.
There are some pre-requirement to enable the YOLOv3 model with Deepstream 4.0.
We can see the YOLO result with the instruction shared in /opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo/README.
Could you follow the steps and try it again?
If you still meet issues, please share the error log or issue with us.
1-2 fps you are seeing is the expected performance for yolov3 on Nano. You can switch inference mode to fp16 or you can try using yolov3-tiny for a higher throughput.
Yes, yolov3 is a compute intensive model and it takes time to build it on a nano. Once it’s built, it will be saved in the same directory as the model file. You can update the “model-engine-file” config param in config_infer_primary_yolov3.txt file to use this engine file for subsequent runs and it wont be recreated.
If the tracking results are bad for your test video, you can reduce the interval to improve the accuracy further but the FPS will drop as well. Its trade-off that needs to be tuned for your use-case.