Realised that the post-processing step is extremely slow (up to a second for 1 frame). The cause of the slowdown is
# E.g. in YOLOv3-608, there are three output tensors, which we associate with their
# respective masks. Then we iterate through all output-mask pairs and generate candidates
# for bounding boxes, their corresponding category predictions and their confidences:
boxes, categories, confidences = list(), list(), list()
for output, mask in zip(outputs_reshaped, self.masks):
box, category, confidence = self._process_feats(output, mask)
box, category, confidence = self._filter_boxes(box, category, confidence)
boxes.append(box)
categories.append(category)
confidences.append(confidence)
which can be found in
yolov3_onnx/data_processing.py
Any suggestions on how to improve the post-processing speed?
It use np.vectorize which is basically for loop. I think the sigmoid function can be replaced with np.exp() for speed. Or use Keras/Pytorch sigmoid instead.
I am experiencing the same problem and I tired using np.exp() and it seems to have made it slower. I also tried to do import keras.backend as K and change np.vectorize(K.sigmoid) but that ran into some errors. Was this what you were referring to when recommending those improvements?
Thanks @hengchenkim! I also did the same thing by fixing the issues i had in using Keras.backend.sigmoid function. Improvement was there but like you, i still expect it to be much faster especially how TensorRT is advertising it.
Anybody else have any suggestions to improve speed even further?
I experienced a 10 times speedup by using np.exp() and a bit faster with scipy.special.expit. Still not as fast as other YOLOv3 implementations as @hengchenkim said. What I think you can do to improve it further using Keras is modify this repo: https://github.com/qqwweee/keras-yolo3 by replace yolo_body with tensorrt model and implement similar post-processing function to the yolo_eval function in https://github.com/qqwweee/keras-yolo3/blob/master/yolo3/model.py