I have created a deep network in tensorRT python API manually. I saved the engine into *.engine file. I want to load this engine into C++ and I am unable to find the necessary function to load the saved engine file into C++. Can we do this?
Note that the engine should be created on the actual platform - Jetson TX1 - because TensorRT runs device-specific profiling during the optimization phase. Since Python API isn’t supported on Jetson at this time, it would seem that you are creating the optimized engine on a different platform (like a PC with another GPU).
Assuming I have built a tensorrt engine with my frozen tensorflow model, how can I load the engine and make inference in C++ like the python example:
from tensorrt.lite import Engine
from tensorrt.infer import LogSeverity
import tensorrt
# Create a runtime engine from plan file using TensorRT Lite API
engine_single = Engine(PLAN="keras_vgg19_b1_FP32.engine",
postprocessors={"dense_2/Softmax":analyze})
images_trt, images_tf = load_and_preprocess_images()
results = []
for image in images_trt:
result = engine_single.infer(image) # Single function for inference
results.append(result)
You shouldn’t be running an optimized TensorRT engine that was frozen from Python on another machine, because TensorRT performs device-specific profiling and optimizations when building the TensorRT engine. See my quote from above:
What should happen on the Jetson is loading of the UFF file like in the samples, which has been converted from frozen .pb on another machine with Python. The TensorRT engine is then still created properly for the Jetson TX1’s GPU. You can copy the serialized TensorRT engine between different Jetson TX1 boards (if you are moving to Jetson TX2, you should re-create the engine because it’s a different GPU).
Agreed. I have read that is it necessary to build on the tx1 if we want to perform inference on a tx1.
So, assuming I made the engine on the tx1, how can I make inference with C++ code like the very simple example I posted in python?
I don’t code in C++, but I do code in python. The issue is that a user only uses C++ and I need to find an example to do inference in C++ like the very simple example of the python code above. I can’t seem to find an example and can’t read C++ well enough to read the docs… :/
If the networks you are using are for image recognition (Alexnet, Googlenet, Resnet, ect.), object detection (DetectNet), or segmentation (Segnet, FCN-Alexnet), then you may be able to use or adapt these higher-level samples for your purposes:
I have a very simple tensorflow model with one input and one output: “prefix/inputs” and “prefix/yhat”. I really just need to create a UFF file from the frozen graph to send to someone using a TX1/2. The intention is to have them build the tensorrt engine on the TX1/2 to run inference on video real time. ( the engine must be built on the hardware it will make inference on, but does building the UFF file require the same thing?)
I actually use docker to train and freeze graphs. Is there a simple way to build a UFF model without grabbing all the tensorrt stuff?
or do you know of an image that has tensorrt built in so that I can just attach the frozen graph to build a UFF file for the TX users?
Hi ljstrnadiii, Thank you very much for initiating this discussion.
I just started to use the Jetson TX2 device (JetPack3.2). I have created a UFF file from a pre-trained tensorflow model (GitHub - argman/EAST: A tensorflow implementation of EAST text detector) using Python on the host machine (Intel). This model is based on resnet v1 50 used to detect the text segments from the images. I would like to create an optimized inference engine with the available UFF file using TensorRT C++ API without using any wrapper as in the ChatBot example.
@dusty_nv I checked the samples, but it assumes that you create the engine within the same script. What is the way to load a .engine file that I have on disk? The nvinfer1::ICudaEngine* nvinfer1::IRuntime::deserializeCudaEngine asks for the memory that holds the engine, I am unsure of the correct way of reading the .engine file into memory.
That codebase is made to check if the cached engine already exists on disk, and if it does load it, otherwise run the TensorRT optimizations and then save it for next time.
Could you check if the serialized engine is good without corruption?
It will also help if you can run the jetson_inference sample to check if everything good on your environment.