Using TF-TRT to convert MobileNet / SSDLite model gives errors

Hi,

I am trying to convert a Tensorflow MobileNet graph (*.pb file) into TensorRT via the TF-TRT module. However I get an error

`NotFoundError: No attr named 'shape' in NodeDef:`

.

I am running the following:
Jetson TX2
Jetpack 3.3
TensorRT 4.0.2.0-1+cuda9.0
tensorflow-gpu==1.9.0+nv18.8

Here is my code:

import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import PIL
from timeit import default_timer as timer
from tqdm import tqdm

'''
This script performs inference on a pure Tensorflow (TF) model vs a converted TensorRT model
In order to run it, it assumes you have a frozen TF graph in the form of a *.pb file.
The specific model is MobileNetv2 
(https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet(
Frozen graph can be downloaded from: https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.4_224.tgz

The image file 'panda.jpg' is downloaded from:
https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG
'''

## File paths (needs to be set)
pb_path = "mobilenet_v2_1.0_224_frozen.pb"    # frozen TF graph
input_name = 'input'                                    # input layer of the graph
output_names = ['MobilenetV2/Predictions/Reshape_1']    # output layer of the graph. (Can be multiple)
img_path = 'data/panda.jpg'     # image to perform inference on (from https://upload.wikimedia.org/wikipedia/commons/f/fe/Giant_Panda_in_Beijing_Zoo_1.JPG)

## Load image
img_raw = np.array(PIL.Image.open(img_path).resize((224, 224))).astype(np.float) / 128 - 1
img = img_raw.reshape(1, 224,224, 3)

##### Run w/ TensorRT
trt_graph = trt.create_inference_graph(
        input_graph_def=frozen_graph,
        outputs=output_names,
        max_batch_size=1,
        precision_mode='FP16',  # 'INT8'/'FP16'
)

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')
tf_input = tf_sess.graph.get_tensor_by_name(input_name + ':0')
tf_output = tf_sess.graph.get_tensor_by_name(output_names[0] + ':0')

output = tf_sess.run(tf_output, feed_dict={tf_input: img + np.random.random(img.shape)/10})

And here is the full error trace:

2018-11-09 16:43:48.943803: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-11-09 16:43:48.943953: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
/home/nvidia/Projects/SANATA/trt_test.py in <module>()
     71             outputs=output_names,
     72             max_batch_size=1,
---> 73             precision_mode='FP16',  # 'INT8'/'FP16'
     74     )
     75

/home/nvidia/Projects/SANATA/.venv/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/trt_convert.py in create_inference_graph(input_graph_def, outputs, max_batch_size, max_workspace_size_bytes, precision_mode, minimum_segment_size)
    113     # pylint: disable=protected-access
    114     raise _impl._make_specific_exception(None, None, ";".join(msg[1:]),
--> 115                                          int(msg[0]))
    116     # pylint: enable=protected-access
    117   output_graph_def = graph_pb2.GraphDef()

NotFoundError: No attr named 'shape' in NodeDef:
         [[Node: input = Placeholder[dtype=DT_FLOAT]()]] for 'input' (op: 'Placeholder') with input shapes:

Note, I get essentially the same error even when using the SSDLite graph from: http://download.tensorflow.org/models/object_detection/ssdlite_mobilenet_v2_coco_2018_05_09.tar.gz.

How can I fix this?

Thanks,
Roman

Hi

You can check our tutorial for converting ssd_mobilenet_v1_coco to TF-TRT model:
https://github.com/NVIDIA-AI-IOT/tf_trt_models#object-detection

The workflow should like this:
Build TensorRT / Jetson compatible graph

from tf_trt_models.detection import build_detection_graph

frozen_graph, input_names, output_names = build_detection_graph(
    config=config_path,
    checkpoint=checkpoint_path
)

Optimize with TensorRT

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',
    minimum_segment_size=50
)

Thanks.

Thank you @aastall for the reference. That was exactly what I was looking for.

For the record, I tried comparing inference speed between the pure Tensorflow vs TF-TRT graphs on the MobileNetV1 and MobileNetV2 networks. I used a 640x480 image for both tests and ran

sudo ~/jetson_clocks.sh

prior to the tests.

Code for MobileNetV1 benchmark is here, and for MobileNetV2 benchmark is here.

The results were:

MobileNetV1:
Pure tensorflow: 89ms
TF-TRT: 119ms
(~33% speedup)

MobileNetV2:
Pure tensorflow: 203ms
TF-TRT: 203ms
(0% speedup)

I am not sure why MobileNetV2 is so inefficient here and sees no speedup from TF-TRT. Any help would be welcome.

For the record, this Caffe implmentation of MobileNetv2 took 86ms on the Jetson using NVCaffe (not TRT optimized).