Slow post-processing (up to a second per frame) for yolov3_onnx

Hi all,

I’m using the sample code for converting yolov3 for use in TensorRT. Sample code documentation can be found at https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#yolov3_onnx

Realised that the post-processing step is extremely slow (up to a second for 1 frame). The cause of the slowdown is

# E.g. in YOLOv3-608, there are three output tensors, which we associate with their
# respective masks. Then we iterate through all output-mask pairs and generate candidates
# for bounding boxes, their corresponding category predictions and their confidences:
boxes, categories, confidences = list(), list(), list()
for output, mask in zip(outputs_reshaped, self.masks):
     box, category, confidence = self._process_feats(output, mask)
     box, category, confidence = self._filter_boxes(box, category, confidence)
     boxes.append(box)
     categories.append(category)
     confidences.append(confidence)

which can be found in

yolov3_onnx/data_processing.py

Any suggestions on how to improve the post-processing speed?

Basically this line in function _process_feats() cause most of the slowdown:

box_class_probs = sigmoid_v(output_reshaped[..., 5:])

It use np.vectorize which is basically for loop. I think the sigmoid function can be replaced with np.exp() for speed. Or use Keras/Pytorch sigmoid instead.

I am experiencing the same problem and I tired using np.exp() and it seems to have made it slower. I also tried to do import keras.backend as K and change np.vectorize(K.sigmoid) but that ran into some errors. Was this what you were referring to when recommending those improvements?

I sped up this part via using an alternative function for sigmoid. But its still not fast enough compared to the original yolov3.

Check out https://stackoverflow.com/questions/10732027/fast-sigmoid-algorithm/15703984#15703984 for some of the things you can try.

Thanks @hengchenkim! I also did the same thing by fixing the issues i had in using Keras.backend.sigmoid function. Improvement was there but like you, i still expect it to be much faster especially how TensorRT is advertising it.

Anybody else have any suggestions to improve speed even further?

I experienced a 10 times speedup by using np.exp() and a bit faster with scipy.special.expit. Still not as fast as other YOLOv3 implementations as @hengchenkim said. What I think you can do to improve it further using Keras is modify this repo: https://github.com/qqwweee/keras-yolo3 by replace yolo_body with tensorrt model and implement similar post-processing function to the yolo_eval function in https://github.com/qqwweee/keras-yolo3/blob/master/yolo3/model.py