What optimizations do I need to use? And how do I translate the googlenet output into a bounding box? It is unclear to me from just the .prototxt and .caffemodel files how to do this? Could somebody share an example program?
I think you misunderstood me. In the link I posted I am referring to the following excerpt.
“In contrast to core image recognition, object detection provides bounding locations within the image in addition to the classification, making it useful for tracking and obstacle avoidance. The Multimedia API sample network is derived from GoogleNet with additional layers for extracting the bounding boxes. At 960×540 half-HD input resolution, the object detection network captures at higher resolution than the original GoogleNet, while retaining real-time performance on Jetson TX1 using TensorRT.”
I attended the Jetson dev meetup a few months back, and I saw this net being deployed in real time at 30 fps on video. I would like to test out this functionality for myself. Can you please guide me?
For car detector, please follow this page to train a network for your own use-case.
For better performance, the page you mentioned use tensorRT and MMAPI.
Both can be installed with JetPack as well as some sample can guide you to use it.
I believe the detectNet demos in the repo you linked use tensorRT and MMAPI and do not run anywhere close to 30 fps. How was it that the demo being shown off by the Jetson/NVIDIA team at the Jetson meetup I went to ran at 30fps on HD video stream?
DetectNet can reach about 11fps and gives you an overview of how to deal with DIGITs/TensorRT/MMAPI.
If you are care more about performance, it’s recommended to replace googlenet, which is embedded in detectNet, with other light weight models.
We provided detailed tutorial on how to train and deploy your own model fast with our GPU.
Would you be able to offer any hints as to how to optimize GoogleNet specifically?
11 fps on DetectNet is a signficiant dropoff in performance compared to 30 fps GoogleNet. One is barely real time and pretty much unusable in any application while the other is extremely powerful. That is a pretty big discrepancy.
You can set --gie-proc-interval to 3 which force application to run prediction every 3 frames.
This will show you 30fps display rate as well as 10fps detection rate.
Thanks but that doesn’t really answer my question. These examples run at 10 fps detection rate. My question. was how to achieve a 30 fps detection rate like the example I referred to from the dev meetup - or at least hints other than use TensorRT and the multimedia API, or use a shallower network, as all the examples you’ve shown do exactly that but don’t run in real time