deepstream-yolo-app performance vs Tensor-Core optimized yolo-darknet

I’m in the process of researching real time object detection and tracking, preferably around about 30fps on HD footage or more.

I came across deepstream-yolo-app and the memory/cpu optimizations that it provides seems promising to improve performance over ‘plain’ yolo (GitHub - pjreddie/darknet: Convolutional Neural Networks). However when i run deepstream-yolo-app the results aren’t much better than plain yolo.

I wanted to ask what kind of FPS you are getting with deepstream-yolo-app for yolov3 (full, non-tiny) on the Xavier.

I’ve also tried the GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet ) darknet fork which has support for Tensor Cores and Cuda Cores.On the Jetson Xavier i’m able to get about 20FPS using AlexeyAB’s Darknet. The deepstream-yolo-app seems much slower (i’m using the same file, 720p deepstream sample stream). But I don’t know how to accurately measure it (probably need to modify the code to measure fps?)

Because the deepstream yolo app uses all the Xavier hardware parts and is optimized to not use much CPU i figured i would be getting better results than with AlexeyAB’s code.

Small sidenote; when using AlexeyAB’s yolov3, FPS significantly improves when i minimize the video preview window which leads me to believe that they render that using CPU or something. Once minimized i get the 20fps i was talking about before. Maybe the same issue is happening with deepstream-yolo-app?

Any help figuring the FPS out or optimizing the deepstream app further would be much appreciated! I’ve been experimenting for weeks and i’m starting to get a little stuck

Hi,

AFAIK, YOLOv3 can reach 50fps with TensorRT on Xavier.

The main problem is that we don’t officially support YOLOv3 with Deepstream SDK.

There is some inference sample inside DeepStreamSDK(ex. deepstream-app).
They are well-optimized based on the Tegra architecture but doesn’t support the YOLO model.

For deepstream-yolo-app, which is designed for plugin demonstration, there is some zoon for improvement. (Ex. memcpy between CPU/GPU)
To get an optimized YOLOv3 pipeline on Jetson, it’s recommended to port nvyolo plugin into deepstream-app.
You can find more information here:
https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/yolo/samples/objectDetector_YoloV3

Thanks.

Hi,

Any update on YOLOv3 support with Deepstream SDK?

I would also like to know if Deepstream will be integrating support for Yolov3. Currently a customized Yolov3 network is working best for our needs, and if it’s not possible to get Yolo working well with Deepstream then we need to look at implementing without Deepstream. And that’s something I’d prefer not to do.

Hi,

I’m planning to accomplish a similar project using YoloV3 along with Deepstream SDK4 on Jetson Xavier. since DeepStream SDK4 has been released only recently, I would like to know if NVIDIA has fixed this (DeepStream’s Yolo not optimized for jetson xavier) in this new version of DeepStream SDK.

Thanks

deep stream V4.0.1:

yolov3 416*416 52fps

Hi 8519120,

could you please share some details how you reached this FPS rate?

On the Xavier I reach up to 30 FPS with 416 x 416.

Did you ever find a solution to your own question? I am in a similar situation at the moment. With CUDA and TENSOR cores enabled i reach a maximum of 15 fps at 416x416 (but not stable, the output speeds up and hangs constantly. so AVERAGE might be 15fps but that does not translate to real time experience). If i decrease the output window from maximum screen size to a little box the fps increases to 19fps (and steady!!).

Hi ruudscheenen,

Please help to open a new topic for your issue. Thanks

1 Like