Recap on tensorflow object detection API on TX2
Hi users, I just wanted to summarize developers experience and sharing some tips about tensorflow object detection API on TX2. At the moment I am just talking about what is actually doable and not, with a focus on inference, rather than training. Browsing the forum, my experience and other resources, this is what I understood. Let's consider only available pretrained frozen graph. TF obj.det. API can be used for inference with ssd_mobilenet_v1 network architecture at approx ~5-8 fps. Faster Rcnn resnet pretrained models seems to cause OOM errors (in my experience, all of them). Was anybody able to run one of the Faster Rcnn resnet model? If yes, could you share some tips? Second question. Does the conversion to TensorRT have an impact also on memory usage? What I mean is the following: even if I am not able to run a specific network architecture due to OOM error on TX2 from TF Obj Det API, I could potentially train a model in a different, more powerful machine, export trained graph to UFF format (through python API), then transfer it to TX2 where it can be imported using C++ API for inference. Does it sound correct? performance would certainly benefit from a TF->TensorRT conversion, but I am not sure about memory usage. I am considering this as an option beacuse I've notice in jetson-inference DetectNet a FasterRcnnResnet50 network. Thanks for your contribution!
Hi users,
I just wanted to summarize developers experience and sharing some tips about tensorflow object detection API on TX2.
At the moment I am just talking about what is actually doable and not, with a focus on inference, rather than training.
Browsing the forum, my experience and other resources, this is what I understood.
Let's consider only available pretrained frozen graph.

TF obj.det. API can be used for inference with ssd_mobilenet_v1 network architecture at approx ~5-8 fps. Faster Rcnn resnet pretrained models seems to cause OOM errors (in my experience, all of them). Was anybody able to run one of the Faster Rcnn resnet model? If yes, could you share some tips?

Second question.
Does the conversion to TensorRT have an impact also on memory usage?
What I mean is the following: even if I am not able to run a specific network architecture due to OOM error on TX2 from TF Obj Det API, I could potentially train a model in a different, more powerful machine, export trained graph to UFF format (through python API), then transfer it to TX2 where it can be imported using C++ API for inference. Does it sound correct? performance would certainly benefit from a TF->TensorRT conversion, but I am not sure about memory usage.
I am considering this as an option beacuse I've notice in jetson-inference DetectNet a FasterRcnnResnet50 network.

Thanks for your contribution!

#1
Posted 01/12/2018 04:41 PM   
Hi, Thanks for the sharing. We are also checking TensorFlow object detection API. Appreciated for sharing your experience with us. Although TensorFlow can run [color="gray"][i]ssd_mobilenet_v1[/i][/color] with GPU mode correctly, we find the GPU utilization is pretty low. Do you also meet this issue? Could you share the tegrastats data when you inference with the [color="gray"][i]ssd_mobilenet_v1[/i][/color]? [code]sudo ./tegrastats[/code] For your second question: [b]1.[/b] Workflow is correct. Only concern is that we have yet to support the custom API for UFF user. If there is a non-supported layer in your model, there is no WAR to run this layer with TensorRT. [b]2.[/b] TensorRT support fp16 mode which can cut memory in half and it will be extremely helpful for your use case. Thanks.
Answer Accepted by Forum Admin
Hi,

Thanks for the sharing.

We are also checking TensorFlow object detection API.
Appreciated for sharing your experience with us.

Although TensorFlow can run ssd_mobilenet_v1 with GPU mode correctly, we find the GPU utilization is pretty low.
Do you also meet this issue?
Could you share the tegrastats data when you inference with the ssd_mobilenet_v1?
sudo ./tegrastats


For your second question:
1. Workflow is correct. Only concern is that we have yet to support the custom API for UFF user.
If there is a non-supported layer in your model, there is no WAR to run this layer with TensorRT.

2. TensorRT support fp16 mode which can cut memory in half and it will be extremely helpful for your use case.

Thanks.

#2
Posted 01/15/2018 03:29 AM   
Scroll To Top

Add Reply