Extremely long time to load TRT-optimized frozen TF graphs

dariusz.filipski · January 21, 2019, 9:46am

Hello,

Does anyone experience extremely long load times for TensorFlow frozen graphs optimized with TensorRT? Non-optimized ones load quickly but loading optimized ones takes over 10 minutes by the very same code:

trt_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_path, 'rb') as pf:
   trt_graph_def.ParseFromString(pf.read())

I’m on Drive PX 2 device, with TensorFlow 1.12.0, CUDA 9.2 and TensorRT 4.1.1.
I’m suspecting protobuf so here’s it’s config:

$ dpkg -l | grep protobuf
ii libmirprotobuf3:arm64 0.26.3+16.04.20170605-0ubuntu1.1 arm64 Display server for Ubuntu - RPC definitions
ii libprotobuf-dev:arm64 2.6.1-1.3 arm64 protocol buffers C++ library (development files)
ii libprotobuf-lite9v5:arm64 2.6.1-1.3 arm64 protocol buffers C++ library (lite version)
ii libprotobuf9v5:arm64 2.6.1-1.3 arm64 protocol buffers C++ library
ii protobuf-compiler 2.6.1-1.3 arm64 compiler for protocol buffer definition files

$ pip3 freeze | grep protobuf
protobuf==3.6.1

Here’s the way I convert non-optimized models to TRT ones:

def get_frozen_graph(graph_file):
  """Read Frozen Graph file from disk."""
  with tf.gfile.FastGFile(graph_file, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def
  
print("Load frozen graph from disk")

frozen_graph = get_frozen_graph(DATA_DIR + MODEL + '.pb')

print("Optimize the model with TensorRT")

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 26,
    precision_mode='FP16',
    minimum_segment_size=2
)

print("Write optimized model to the file")
with open(DATA_DIR + MODEL + '_fp16_trt.pb', 'wb') as f:
    f.write(trt_graph.SerializeToString())

What’s actually wrong? Do you have any hints how to fix it? This makes debugging/executing the code extremely annoying…

NVES · January 24, 2019, 5:09pm

Hello,

this is not expected. how big are the .pb files and converted files? what is the pure tf frozen file load time compared to the 10min?

dariusz.filipski · January 24, 2019, 9:46pm

Thank you for looking into this!
Both files are pretty similar in size: 67MB for ssd_mobilenet_v2_coco from model zoo. Exact sizes in bytes:

69688296 - original model
69219036 - optimized model

Pure TF model loads in 12 seconds.

I’m trying to build libprotobuf and python-protobuf from sources, let’s see how that goes. Unfortunately I’m facing some build issues with the latter, librotobuf went smoothly.

NVES · January 25, 2019, 4:58pm

Hello,

To help us debug, can you share a small repro containing the orginal model, optimized model, and full converting source and load source that demonstrate the performance you are seeing?

dariusz.filipski · January 29, 2019, 2:04pm

Sure thing, I will provide you with all needed pieces soon. Thanks for your support!

dariusz.filipski · February 4, 2019, 9:13am

Ok, so here it is:

Fetch and extract the model from model zoo

mkdir trt_test
cd trt_test
wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
tar xzf ssd_mobilenet_v2_coco_2018_03_29.tar.gz --strip-components=1 -C ./ ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb
mv frozen_inference_graph.pb ssd_mobilenet_v2_coco.pb

Run build.py script to convert the model from the zoo into a TRT-optimized one

python3 build.py

Script content:

# build.py
# The script to build TRT-optimized graph from a given non-optimized one

import os
import tensorflow.contrib.tensorrt as trt
import tensorflow as tf

DATA_DIR = './'
MODEL = 'ssd_mobilenet_v2_coco'
TRT_SUFFIX = '_fp16_trt'

BOXES_NAME='detection_boxes'
CLASSES_NAME='detection_classes'
SCORES_NAME='detection_scores'
NUM_DETECTIONS_NAME='num_detections'
output_names = [BOXES_NAME, CLASSES_NAME, SCORES_NAME, NUM_DETECTIONS_NAME]

print("------------- Load frozen graph from disk -------------")
with tf.gfile.GFile(DATA_DIR + MODEL + '.pb', "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

print("------------- Optimize the model with TensorRT -------------")
trt_graph = trt.create_inference_graph(
    input_graph_def=graph_def,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 26,
    precision_mode='FP16',
    minimum_segment_size=2
)

print("------------- Write optimized model to the file -------------")
with open(DATA_DIR + MODEL + TRT_SUFFIX + '.pb', 'wb') as f:
    f.write(trt_graph.SerializeToString())

print("------------- DONE! -------------")

Run load.py script to measure the time to load both models

python load.py

Script content:

# load.py
# The script to measure model load time

import time
import tensorflow as tf

DATA_DIR = './'
MODEL = 'ssd_mobilenet_v2_coco'
TRT_SUFFIX = '_fp16_trt'

def load_pb(pb_path):
    """Load the TF graph from the pre-build pb file."""
    print('------------- Load the TF graph from the pre-build pb file: {} -------------'.format(pb_path))
    start_time = time.time()
    graph_def = tf.GraphDef()
    with tf.gfile.GFile(pb_path, 'rb') as pf:
        graph_def.ParseFromString(pf.read())
    
    stop_time = time.time()
    print('------------- Load time: {0:.2f} sec'.format(stop_time - start_time))
    return graph_def

_ = load_pb(DATA_DIR + MODEL + '.pb')
_ = load_pb(DATA_DIR + MODEL + TRT_SUFFIX + '.pb')

And here’s the output from both scripts on my side. I was lucky this time and the TRT-optimized model loaded in just 7 minutes :)

# <b>FROM load.py</b>
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
------------- Load time: 8.19 sec
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
------------- Load time: 421.29 sec

# <b>FROM build.py</b>
------------- Load frozen graph from disk -------------
------------- Optimize the model with TensorRT -------------
2019-02-04 09:26:40.726557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-02-04 09:26:40.794589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
2019-02-04 09:26:40.794786: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-02-04 09:26:40.795102: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-04 09:26:40.814134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: DRIVE PX 2 AutoChauffeur major: 6 minor: 1 memoryClockRate(GHz): 1.29
pciBusID: 0000:04:00.0
totalMemory: 3.75GiB freeMemory: 3.68GiB
2019-02-04 09:26:40.814277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.275
pciBusID: 0000:00:00.0
totalMemory: 6.24GiB freeMemory: 2.20GiB
2019-02-04 09:26:40.814428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1496] Ignoring visible gpu device (device: 1, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
2019-02-04 09:26:40.814472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-04 09:26:42.014100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-04 09:26:42.014248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1
2019-02-04 09:26:42.014275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N N
2019-02-04 09:26:42.014293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   N N
2019-02-04 09:26:42.014483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3382 MB memory) -> physical GPU (device: 0, name: DRIVE PX 2 AutoChauffeur, pci bus id: 0000:04:00.0, compute capability: 6.1)
2019-02-04 09:26:57.204819: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 2
2019-02-04 09:26:57.223562: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
2019-02-04 09:26:57.223811: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-02-04 09:26:57.251539: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
2019-02-04 09:26:57.251758: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2019-02-04 09:26:57.253704: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-02-04 09:26:59.666888: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
2019-02-04 09:26:59.667564: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
2019-02-04 09:26:59.713881: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 3 nodes succeeded.
2019-02-04 09:27:04.478200: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 09:27:04.480121: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 09:27:04.485573: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 09:27:04.486869: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 09:27:04.488511: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-02-04 09:27:04.488607: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6035 nodes (-1940), 10082 edges (-2174), time = 2862.97095ms.
2019-02-04 09:27:04.488643: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 6226 nodes (191), 10284 edges (202), time = 756.717ms.
2019-02-04 09:27:04.488695: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6223 nodes (-3), 10281 edges (-3), time = 5475.65576ms.
2019-02-04 09:27:04.488731: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6053 nodes (-170), 10111 edges (-170), time = 1619.82605ms.
2019-02-04 09:27:04.488765: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6053 nodes (0), 10111 edges (0), time = 2764.40894ms.
2019-02-04 09:27:04.488797: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
2019-02-04 09:27:04.488830: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 2.048ms.
2019-02-04 09:27:04.488862: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 8 nodes (0), 7 edges (0), time = 0.914ms.
2019-02-04 09:27:04.488905: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.26ms.
2019-02-04 09:27:04.488968: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 1.603ms.
2019-02-04 09:27:04.489003: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.269ms.
2019-02-04 09:27:04.489036: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
2019-02-04 09:27:04.489069: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 2.194ms.
2019-02-04 09:27:04.489101: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.962ms.
2019-02-04 09:27:04.489135: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.244ms.
2019-02-04 09:27:04.489167: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 1.006ms.
2019-02-04 09:27:04.489200: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.227ms.
------------- Write optimized model to the file -------------
------------- DONE! -------------

NVES · February 4, 2019, 5:36pm

hello,

I’m not seeing the performance issue described above:

root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python load.py
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
------------- Load time: 0.17 sec
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
------------- Load time: 0.26 sec

root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python build.py
------------- Load frozen graph from disk -------------
------------- Optimize the model with TensorRT -------------
2019-02-04 17:34:11.776013: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 8
2019-02-04 17:34:11.776257: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-04 17:34:11.793709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.794382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.795045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.795682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0b:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.796312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 4 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:85:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.796945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 5 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:86:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.797599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 6 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.798238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 7 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:8a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.798534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-04 17:34:15.868392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-04 17:34:15.868455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-04 17:34:15.868466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
2019-02-04 17:34:15.868473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-04 17:34:15.868484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-04 17:34:15.868521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-04 17:34:15.868529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-04 17:34:15.868535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-04 17:34:15.868558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-04 17:34:15.868565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-04 17:34:15.871958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30342 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-04 17:34:15.872761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30342 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-04 17:34:15.873333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30342 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-04 17:34:15.873934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30342 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-04 17:34:15.874547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30342 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30342 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30342 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30342 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
2019-02-04 17:34:17.998323: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:868] MULTIPLE tensorrt candidate conversion: 2
2019-02-04 17:34:18.001120: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
2019-02-04 17:34:18.001139: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-04 17:34:18.007434: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
2019-02-04 17:34:18.007455: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-04 17:34:20.488707: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
2019-02-04 17:34:20.548541: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_1 creation for segment 1, composed of 3 nodes succeeded.
2019-02-04 17:34:21.991633: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:21.992225: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.000877: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.001457: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.006006: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-02-04 17:34:22.006029: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6035 nodes (-1940), 10082 edges (-2174), time = 832.385ms.
2019-02-04 17:34:22.006037: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 6226 nodes (191), 10284 edges (202), time = 233.816ms.
2019-02-04 17:34:22.006044: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6223 nodes (-3), 10281 edges (-3), time = 3354.4ms.
2019-02-04 17:34:22.006050: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6053 nodes (-170), 10111 edges (-170), time = 562.532ms.
2019-02-04 17:34:22.006058: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6053 nodes (0), 10111 edges (0), time = 774.5ms.
2019-02-04 17:34:22.006092: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
2019-02-04 17:34:22.006099: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 3.896ms.
2019-02-04 17:34:22.006120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 8 nodes (0), 7 edges (0), time = 0.283ms.
2019-02-04 17:34:22.006127: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.094ms.
2019-02-04 17:34:22.006134: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 0.461ms.
2019-02-04 17:34:22.006141: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.07ms.
2019-02-04 17:34:22.006164: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
2019-02-04 17:34:22.006171: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.521ms.
2019-02-04 17:34:22.006178: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.284ms.
2019-02-04 17:34:22.006185: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.077ms.
2019-02-04 17:34:22.006199: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.495ms.
2019-02-04 17:34:22.006207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.07ms.
------------- Write optimized model to the file -------------
------------- DONE! -------------

dariusz.filipski · February 4, 2019, 9:35pm

Well, apart from the fact that you are not on Drive PX 2 (which is probably irrelevant to this problem) - which TensorFlow version do you have? Which version protobuf and libprotobuf? Do you have python-protobuf with anabled cpp implementation? I’m suspecting protobuf to be the cause of the problem.

NVES · February 4, 2019, 9:40pm

Hello,

I’m using TensorFlow 1.12.0.

root@0760a6daacdc:/home/scratch.zhenyi_sw/repro2490943# pip show protobuf
Name: protobuf
Version: 3.6.1
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: None
Author-email: None
License: 3-Clause BSD License
Location: /usr/local/lib/python3.5/dist-packages
Requires: six, setuptools
Required-by: tensorflow-gpu, tensorboard

dariusz.filipski · February 5, 2019, 12:45pm

I was right, it’s a protobuf issue. As far as I understand the problem, DPX2 has protobuf 2.6.1 with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION set to python. I have compiled and installed libprotobuf and python-protobuf 3.6.1 from sources with --cpp_implementation and then the models load in a fraction of a second (both - original and TensorRT ones).

However, this breaks my OpenCV installation (and I have an app which uses both TensorFlow and OpenCV). Apparently OpenCV’s GTK support is linked with libmirc which is built with protobuf 2.6.1. When I tried using OpenCV initially built before switching the protobuf - it simply segfaults after ‘import cv2’. So I tried rebuilding OpenCV with just protobuf 3.6.1 available - it failed due to missing dependencies for libmirclient. When I put back the libprotobuf-lite.so.9.0.1 (so from v2.6.1) then OpenCV builds fine but fails in runtime with:

>>> import cv2
[libprotobuf FATAL google/protobuf/stubs/common.cc:79] This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.6.1).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.6.1).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
Aborted (core dumped)

So, now I’m hit by the same problem as described at

and
https://stackoverflow.com/questions/43236034/opencv-3-2-includes-libmir-and-protobuf-2-6-which-is-conflicting-with-protobuf

I’ll check if turning off GTK support in OpenCV and switching to Qt instead solves this problem for me…

I guess there’s no simple way to upgrade Mir (libmirclient) on DPX2 to a version built with protobuf 3.6.1?

NVES · February 5, 2019, 5:34pm

Will triage and keep you updated.

dariusz.filipski · February 15, 2019, 7:42am

OK, I think I got it sorted out. I left protobuf 2.6.1 almost untouched, just installed 3.6.1 next to it and set the symlinks in a way that 3.6.1 is the default one. I rebuilt OpenCV with the following options:

-D WITH_PROTOBUF=OFF \
-D BUILD_PROTOBUF=OFF \
-D PROTOBUF_UPDATE_FILES=OFF \

and everything seems fine. After:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp

both models load in a fraction of a second.

Side question: is the way I build TensorRT models described above in build.py (so with TFT-TRT api) the correct one or shall I rather go through UFF? So far I see no improvement in the inference time comparing to the original models; so far I tried on

ssd_inception_v2_coco
ssd_mobilenet_v2_coco
mask_rcnn_inception_v2_coco
faster_rcnn_resnet50_coco

from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md.
Looking at https://github.com/NVIDIA-AI-IOT/tf_trt_models#models-1 I should see ~2x faster inference with ssd_inception_v2_coco at least, but I see no improvement whatsoever.

dariusz.filipski · February 18, 2019, 12:28pm

For reference, as others already ask for steps I done to update the protobuf, here’s the full description:

# Check current version
$ protoc --version
libprotoc 2.6.1
 
# Create a backup of the current config, just in case
mkdir protobuf
cd protobuf/
mkdir backup_originals
mkdir backup_originals/protoc
cp /usr/bin/protoc backup_originals/protoc/
tar cvzf backup_originals/libprotobuf.tgz /usr/lib/aarch64-linux-gnu/libprotobuf*
# Original include files located at: /usr/include/google/protobuf/
# I did not backed them up
 
# Original configuration of the libraries
$ ls -l /usr/lib/aarch64-linux-gnu/libprotobuf*
-rw-r--r-- 1 root root 2464506 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.a
-rw-r--r-- 1 root root  430372 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.a
lrwxrwxrwx 1 root root      25 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so -> libprotobuf-lite.so.9.0.1
lrwxrwxrwx 1 root root      25 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so.9 -> libprotobuf-lite.so.9.0.1
-rw-r--r-- 1 root root  199096 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so.9.0.1
lrwxrwxrwx 1 root root      20 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so -> libprotobuf.so.9.0.1
lrwxrwxrwx 1 root root      20 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so.9 -> libprotobuf.so.9.0.1
-rw-r--r-- 1 root root 1153872 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so.9.0.1
 
# Fetch and upack the sources of version 3.6.1
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-python-3.6.1.zip
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-aarch_64.zip
unzip protoc-3.6.1-linux-aarch_64.zip -d protoc-3.6.1
unzip protobuf-python-3.6.1.zip
 
# Update the protoc
sudo cp protoc-3.6.1/bin/protoc /usr/bin/protoc
 
$ protoc --version
libprotoc 3.6.1
 
# BUILD AND INSTALL THE LIBRARIES
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
cd protobuf-3.6.1/
./autogen.sh
./configure
make
make check
sudo make install
 
# Remove unnecessary links to the old version
sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf.a
sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf-lite.a
sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so
sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf.so
 
# Move old version of the libraries to the same folder where the new ones have been installed, for clarity
sudo cp -d /usr/lib/aarch64-linux-gnu/libproto* /usr/local/lib/
sudo rm /usr/lib/aarch64-linux-gnu/libproto*

sudo ldconfig # Refresh shared library cache   

# Check the updated version
$ protoc --version
libprotoc 3.6.1

# Final configuration of the libraries after the update
$ ls -l /usr/local/lib/libproto*
-rw-r--r-- 1 root root 77064022 Feb  9 11:07 /usr/local/lib/libprotobuf.a
-rwxr-xr-x 1 root root      978 Feb  9 11:07 /usr/local/lib/libprotobuf.la
-rw-r--r-- 1 root root  9396522 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.a
-rwxr-xr-x 1 root root     1013 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.la
lrwxrwxrwx 1 root root       26 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so -> libprotobuf-lite.so.17.0.0
lrwxrwxrwx 1 root root       26 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so.17 -> libprotobuf-lite.so.17.0.0
-rwxr-xr-x 1 root root  3722376 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so.17.0.0
lrwxrwxrwx 1 root root       25 Feb  9 11:19 /usr/local/lib/libprotobuf-lite.so.9 -> libprotobuf-lite.so.9.0.1
-rw-r--r-- 1 root root   199096 Feb  9 11:19 /usr/local/lib/libprotobuf-lite.so.9.0.1
lrwxrwxrwx 1 root root       21 Feb  9 11:07 /usr/local/lib/libprotobuf.so -> libprotobuf.so.17.0.0
lrwxrwxrwx 1 root root       21 Feb  9 11:07 /usr/local/lib/libprotobuf.so.17 -> libprotobuf.so.17.0.0
-rwxr-xr-x 1 root root 30029352 Feb  9 11:07 /usr/local/lib/libprotobuf.so.17.0.0
lrwxrwxrwx 1 root root       20 Feb  9 11:19 /usr/local/lib/libprotobuf.so.9 -> libprotobuf.so.9.0.1
-rw-r--r-- 1 root root  1153872 Feb  9 11:19 /usr/local/lib/libprotobuf.so.9.0.1
-rw-r--r-- 1 root root 99883696 Feb  9 11:07 /usr/local/lib/libprotoc.a
-rwxr-xr-x 1 root root      994 Feb  9 11:07 /usr/local/lib/libprotoc.la
lrwxrwxrwx 1 root root       19 Feb  9 11:07 /usr/local/lib/libprotoc.so -> libprotoc.so.17.0.0
lrwxrwxrwx 1 root root       19 Feb  9 11:07 /usr/local/lib/libprotoc.so.17 -> libprotoc.so.17.0.0
-rwxr-xr-x 1 root root 32645760 Feb  9 11:07 /usr/local/lib/libprotoc.so.17.0.0
lrwxrwxrwx 1 root root       18 Feb  9 11:19 /usr/local/lib/libprotoc.so.9 -> libprotoc.so.9.0.1
-rw-r--r-- 1 root root   991440 Feb  9 11:19 /usr/local/lib/libprotoc.so.9.0.1
 
# Reboot, just in case :)
sudo reboot
 
# BUILD AND INSTALL THE PYTHON-PROTOBUF MODULE
cd protobuf-3.6.1/python/
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp

# Fix setup.py to force compilation with c++11 standard
vim setup.py
 
$ diff setup.py setup.py~
205,208c205,208
<     #if v:
<     #  extra_compile_args.append('-std=c++11')
<     #elif os.getenv('KOKORO_BUILD_NUMBER') or os.getenv('KOKORO_BUILD_ID'):
<     extra_compile_args.append('-std=c++11')
---
>     if v:
>       extra_compile_args.append('-std=c++11')
>     elif os.getenv('KOKORO_BUILD_NUMBER') or os.getenv('KOKORO_BUILD_ID'):
>       extra_compile_args.append('-std=c++11')
 
# Build, test and install
python3 setup.py build --cpp_implementation
python3 setup.py test --cpp_implementation
sudo python3 setup.py install --cpp_implementation
 
# Make the cpp backend a default one when user logs in
sudo sh -c "echo 'export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp' >> /etc/profile.d/protobuf.sh"

I found that this update tends to break pip, so simply updated it with:

wget http://se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python3-pip_9.0.1-2_all.deb
wget http://se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python-pip-whl_9.0.1-2_all.deb
sudo dpkg -i *.deb

smit.sheth · February 19, 2019, 12:33pm

Facing this issue while installing opencv with option:
-D WITH_PROTOBUF=OFF
-D BUILD_PROTOBUF=OFF
-D PROTOBUF_UPDATE_FILES=OFF \

/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_deleteLayer(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16420:5: error: ‘LayerId’ was not declared in this scope
LayerId layer;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16424:34: error: ‘layer’ was not declared in this scope
pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getFLOPS(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16603:5: error: ‘vector_MatShape’ was not declared in this scope
vector_MatShape netInputShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16608:43: error: ‘netInputShapes’ was not declared in this scope
pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16634:5: error: ‘vector_MatShape’ was not declared in this scope
vector_MatShape netInputShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16639:43: error: ‘netInputShapes’ was not declared in this scope
pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getLayer(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16675:5: error: ‘LayerId’ was not declared in this scope
LayerId layerId;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16680:36: error: ‘layerId’ was not declared in this scope
pyopencv_to(pyobj_layerId, layerId, ArgInfo(“layerId”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getLayersShapes(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16788:5: error: ‘vector_MatShape’ was not declared in this scope
vector_MatShape netInputShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16790:5: error: ‘vector_vector_MatShape’ was not declared in this scope
vector_vector_MatShape inLayersShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16791:28: error: expected ‘;’ before ‘outLayersShapes’
vector_vector_MatShape outLayersShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16795:43: error: ‘netInputShapes’ was not declared in this scope
pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16797:69: error: ‘inLayersShapes’ was not declared in this scope
ERRWRAP2(self->getLayersShapes(netInputShapes, layersIds, inLayersShapes, outLayersShapes));
^
/home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
expr;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16797:85: error: ‘outLayersShapes’ was not declared in this scope
ERRWRAP2(self->getLayersShapes(netInputShapes, layersIds, inLayersShapes, outLayersShapes));
^
/home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
expr;
^
In file included from /home/nvidia/opencv/modules/python/src2/cv2.cpp:1681:0:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16798:79: error: ‘inLayersShapes’ was not declared in this scope
return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16798:110: error: ‘outLayersShapes’ was not declared in this scope
return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16807:5: error: ‘vector_vector_MatShape’ was not declared in this scope
vector_vector_MatShape inLayersShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16808:28: error: expected ‘;’ before ‘outLayersShapes’
vector_vector_MatShape outLayersShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16814:68: error: ‘inLayersShapes’ was not declared in this scope
ERRWRAP2(self->getLayersShapes(netInputShape, layersIds, inLayersShapes, outLayersShapes));
^
/home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
expr;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16814:84: error: ‘outLayersShapes’ was not declared in this scope
ERRWRAP2(self->getLayersShapes(netInputShape, layersIds, inLayersShapes, outLayersShapes));
^
/home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
expr;
^
In file included from /home/nvidia/opencv/modules/python/src2/cv2.cpp:1681:0:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16815:79: error: ‘inLayersShapes’ was not declared in this scope
return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16815:110: error: ‘outLayersShapes’ was not declared in this scope
return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getMemoryConsumption(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16850:5: error: ‘vector_MatShape’ was not declared in this scope
vector_MatShape netInputShapes;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16856:43: error: ‘netInputShapes’ was not declared in this scope
pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getParam(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16893:5: error: ‘LayerId’ was not declared in this scope
LayerId layer;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16899:34: error: ‘layer’ was not declared in this scope
pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) )
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_setParam(PyObject*, PyObject*, PyObject*)’:
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17051:5: error: ‘LayerId’ was not declared in this scope
LayerId layer;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17058:34: error: ‘layer’ was not declared in this scope
pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) &&
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17069:5: error: ‘LayerId’ was not declared in this scope
LayerId layer;
^
/home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17076:34: error: ‘layer’ was not declared in this scope
pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) &&
^
modules/python2/CMakeFiles/opencv_python2.dir/build.make:62: recipe for target 'modules/python2/CMakeFiles/opencv_python2.dir//src2/cv2.cpp.o’ failed
make[2]: *** [modules/python2/CMakeFiles/opencv_python2.dir//src2/cv2.cpp.o] Error 1
CMakeFiles/Makefile2:11608: recipe for target ‘modules/python2/CMakeFiles/opencv_python2.dir/all’ failed
make[1]: *** [modules/python2/CMakeFiles/opencv_python2.dir/all] Error 2
Makefile:160: recipe for target ‘all’ failed
make: *** [all] Error 2
Make did not successfully build
Please fix issues and retry build

dbusby · April 14, 2019, 5:14am

I am also seeing long load times.
I am running JetPack 4.2 on both a TX2 and an Xavier, the host is an x86 with a 2070 GPU.
On the x86 I took an inception_v3 graph and finetuned it with the flowers photos.
The resultant frozen graph is then copied to the TX2 and Xavier.
I learned that a TensorRT graph built on one does not work on the other.

Building or reloading an existing TRT graph on the Xavier takes about 2 minutes, on the TX2 about 20.
Each was built using the sdkmanger.
Each has a 16GB swap space.
For the TX2 it is on a 128GB sd card.
For the Xavier it is on a 1000GB ssd memory stick.

Is the openCV really the solution?
Is it loaded when the sdkmanager builds the TX2 and Xavier?
What is the proper way the install it if it is not?

dbusby · April 14, 2019, 4:04pm

After my previous post I decided there were too many variables.
Here is a sample all can do.

Get this package:
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/tensorrt

I modified the script for debug purposes as follows:

diff tftrt_sample.py tftrt_sample.py.org 
92d91
<   print( datetime.datetime.now(), " getResnet50" )
111d109
<   print( datetime.datetime.now(), " getFP32" )
121d118
<   print( datetime.datetime.now(), " getFP16" )
146c143
<   print(datetime.datetime.now(), "Starting execution")
---
>   tf.logging.info("Starting execution")
172c169
<     print(datetime.datetime.now(), " Starting Warmup cycle")
---
>     tf.logging.info("Starting Warmup cycle")
203c200
<     print(datetime.datetime.now(), "Warmup done. Starting real timing")
---
>     tf.logging.info("Warmup done. Starting real timing")
267,268c264
<   print(datetime.datetime.now(), " Starting")
< 
---
>   print("Starting at",datetime.datetime.now())

I also removed the --INT8 option from run_all.sh

After $ run_all > TX2.log

The log contains:

Namespace(FP16=True, FP32=True, INT8=False, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=2048)
2019-04-13 23:26:59.405493  Starting
2019-04-13 23:27:06.691047  getResnet50
2019-04-13 23:27:08.879296 Starting execution
2019-04-13 23:27:12.120849  Starting Warmup cycle
2019-04-13 23:27:38.190371 Warmup done. Starting real timing
iter  0   0.1170225191116333
iter  1   0.11706938743591308
iter  2   0.11715104579925537
iter  3   0.11713536262512207
iter  4   0.11703117370605469
iter  5   0.11687781810760497
iter  6   0.11692732810974121
iter  7   0.11688094139099121
iter  8   0.11711055755615235
iter  9   0.11685168743133545
Comparison= True
images/s : 34.2 +/- 0.0, s/batch: 0.11701 +/- 0.00011
RES, Native, 4, 34.19, 0.03, 0.11701, 0.00011
2019-04-13 23:28:39.120928  getFP32
2019-04-13 23:28:39.122388  getResnet50
2019-04-14 00:05:18.587516 Starting execution
2019-04-14 00:39:11.500612  Starting Warmup cycle
2019-04-14 00:39:55.384542 Warmup done. Starting real timing
iter  0   0.06356308937072754
iter  1   0.06371050834655761
iter  2   0.06345504283905029
iter  3   0.06329115867614746
iter  4   0.06343845844268799
iter  5   0.06320501804351807
iter  6   0.06346035480499268
iter  7   0.0631892728805542
iter  8   0.06757570266723632
iter  9   0.06330945014953614
Comparison= True
images/s : 62.7 +/- 1.2, s/batch: 0.06382 +/- 0.00126
RES, TRT-FP32, 4, 62.68, 1.18, 0.06382, 0.00126
2019-04-14 00:41:19.378257  getFP16
2019-04-14 00:41:19.380426  getResnet50
2019-04-14 00:59:41.581313 Starting execution
2019-04-14 01:32:10.168278  Starting Warmup cycle
2019-04-14 01:32:42.214924 Warmup done. Starting real timing
iter  0   0.03612914085388184
iter  1   0.03567664623260498
iter  2   0.03541929721832275
iter  3   0.03596384525299072
iter  4   0.03592778205871582
iter  5   0.035593876838684084
iter  6   0.0354670524597168
iter  7   0.03562225341796875
iter  8   0.03560783863067627
iter  9   0.035287847518920896
Comparison= True
images/s : 112.1 +/- 0.8, s/batch: 0.03567 +/- 0.00025
RES, TRT-FP16, 4, 112.14, 0.78, 0.03567, 0.00025
Done timing 2019-04-14 01:33:36.285660
native ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
FP32 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
FP16 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']

This snippet is the issue, why sooooo looooong.

2019-04-13 23:28:39.120928  getFP32
2019-04-13 23:28:39.122388  getResnet50
2019-04-14 00:05:18.587516 Starting execution
2019-04-14 00:39:11.500612  Starting Warmup cycle
2019-04-14 00:39:55.384542 Warmup done. Starting real timing

On Xavier the same script runs much quicker.

RES, TRT-FP32, 4, 160.46, 0.48, 0.02493, 0.00008
2019-04-13 23:18:35.111292  getFP16
2019-04-13 23:18:35.111525  getResnet50
2019-04-13 23:21:33.759369 Starting execution
2019-04-13 23:22:10.313379  Starting Warmup cycle
2019-04-13 23:22:11.314511 Warmup done. Starting real timing

Anyone have any thoughts?

dariusz.filipski · April 14, 2019, 8:22pm

It’s most likely a protobuf issue, please read carefully what I wrote in https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/post/5313240/#5313240

OpenCV issue was never a cause of the long loading problem, just a result of updating protobuf.

So, I’d suggest you start with:

export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp

And if that doesn’t help - update protobuf with the steps I described at https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/post/5315675/#5315675

dbusby · April 15, 2019, 2:59am

Either I botched the TX2 build or NVIDIA builds the TX2 with a different set of software from the Xavier.
On the Xavier I get:
$ protoc --version
libprotoc 3.0.0

On the TX2 I get:
$ protoc --version
-bash: protoc: command not found

Both systems were built with 4.2 using the sdkmanager

Thanks for your replay.
I now something to try.

jvliai · May 3, 2019, 3:22pm

I have the same issue with Jetpack 4.2. protoc is not found, and loading the TF-TRT graph also takes a long time (~5 minutes) vs. the original graph (~20 s) on both an Xavier and a TX2i. Updating protobuf via sudo pip3 install protobuf did not fix this.

jvliai · May 6, 2019, 5:43pm

Just an update, I tried the steps so kindly indicated by dariusz.filipski, and my issue was resolved. Both on a TX2i and on an Xavier.

This is confusing because I used Jetpack as the installer and the NVidia provided installers for Tensorflow on those platforms. Why is Nvidia using a setup that results in a suboptimal protobuf version being installed? In my case, the graph loading time went down from ~5 min to ~10 s.