Hi every one.
I am developing an app using Deepstream sdk4. I have a single image of a vehicle’s front view, and by reapeating that single image I can make a 30 second video stream. my goal is to use yolov2-tiny and yolov3-tiny to consecutively detect license plate, and then find all characters in that license plate. so, I use the following setup:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl
[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=4
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
#uri=file:/opt/nvidia/deepstream/deepstream-4.0/samples/streams/out1.h264
uri=file:/opt/nvidia/deepstream/deepstream-4.0/samples/streams/Grill-Haydari1519.h264
num-sources=1
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=model_b1_fp16_plate_alone.engine
labelfile-path=Labels_plate_alone.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;1;1
bbox-border-color1=0;0;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;1;1
gie-unique-id=2
#operate-on-gie-id=1
#process-mode=2
#gie-mode=2
#is-classifier=1
#classifier-async-mode=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2_tiny_plate_alone.txt
[secondary-gie]
enable=1
gpu-id=0
[b]model-engine-file=model_b1_fp32_ocr.engine
[/b]labelfile-path=labels_ocr.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;1;1
bbox-border-color1=0;0;1;1
bbox-border-color2=0;1;1;1
bbox-border-color3=0;1;1;1
gie-unique-id=3
operate-on-gie-id=2
#process-mode=2
#gie-mode=2
#is-classifier=1
#classifier-async-mode=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3_tiny.txt
[ds-example]
enable=1
processing-width=10
processing-height=10
full-frame=0
unique-id=15
gpu-id=0
[tests]
file-loop=0
the setup works well, it detects the license plate and draws a bounding box around it, it then finds the characters. however it miss-classifies the characters as shown in the image below:
https://drive.google.com/open?id=1KOBvgk0fltikCW7GH_44qEYSRGlyyvBG
only the numbers “2” and “7” have been correctly classified, and the result is frustrating! (in case You are not familiar with Arabic numbers, they are all correctly classified in the next image! )
now I test another scenario. by modifying GstDsExample, In the above setup, I save
the detected license plate (the license plate that was detected by the yolov2-tiny network in the video of the car’s front view image ) as a jpg file. then I make a 30 second video of that by repeating that single image in multiple frames. so, now I have a 30 second video of a license plate.
then, I use the below setup to detect characters in this video of license plate, and more importantly is to note that I use exactly the same yolov3-tiny network that I used in the prevoius setup:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl
[tiled-display]
enable=1
rows=1
columns=1
width=1280
height=720
#width=640
#height=480
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file://../../samples/streams/image006.h264
#uri=file://../../samples/streams/out1.h264
num-sources=1
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
<b>model-engine-file=model_b1_fp32_ocr.engine</b>
labelfile-path=labels_ocr.txt
batch-size=1
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV3_tiny.txt
[tests]
file-loop=0
however, this time all characters (all numbers that we trained the network on)are correctly detected as shown in the following image:
https://drive.google.com/open?id=1sj3NUy0NW3hgJAZaBQNhb7H48x1H5vW8
so basically, it seems like the process of cascading two classifiers in deepstream sdk is affecting performance!
In my first setup I detected license plate using yolov2-tiny as primary gie, and then I
detected characters in it using yolov3-tiny as secondary gie and the characters were classified with a high error!
in the second scenario I used the same yolov3-tiny for character recognition, and the license plate used was the one detected in the first setup (I just saved the detected plate by modifying gstdsexample and made a video out of it); however, this time all the characters were correctly classified!
my question precisely is "why this happens and how should I solve this so that my secondary classifier for character recognition can work well when used in cascade?
thanks