Extremely long time to load TRT-optimized frozen TF graphs

Hello,

Does anyone experience extremely long load times for TensorFlow frozen graphs optimized with TensorRT? Non-optimized ones load quickly but loading optimized ones takes over 10 minutes by the very same code:

trt_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_path, 'rb') as pf:
   trt_graph_def.ParseFromString(pf.read())

I’m on Drive PX 2 device, with TensorFlow 1.12.0, CUDA 9.2 and TensorRT 4.1.1.
I’m suspecting protobuf so here’s it’s config:

$ dpkg -l | grep protobuf
ii libmirprotobuf3:arm64 0.26.3+16.04.20170605-0ubuntu1.1 arm64 Display server for Ubuntu - RPC definitions
ii libprotobuf-dev:arm64 2.6.1-1.3 arm64 protocol buffers C++ library (development files)
ii libprotobuf-lite9v5:arm64 2.6.1-1.3 arm64 protocol buffers C++ library (lite version)
ii libprotobuf9v5:arm64 2.6.1-1.3 arm64 protocol buffers C++ library
ii protobuf-compiler 2.6.1-1.3 arm64 compiler for protocol buffer definition files

$ pip3 freeze | grep protobuf
protobuf==3.6.1

Here’s the way I convert non-optimized models to TRT ones:

def get_frozen_graph(graph_file):
  """Read Frozen Graph file from disk."""
  with tf.gfile.FastGFile(graph_file, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def
  
print("Load frozen graph from disk")

frozen_graph = get_frozen_graph(DATA_DIR + MODEL + '.pb')

print("Optimize the model with TensorRT")

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=1 << 26,
    precision_mode='FP16',
    minimum_segment_size=2
)

print("Write optimized model to the file")
with open(DATA_DIR + MODEL + '_fp16_trt.pb', 'wb') as f:
    f.write(trt_graph.SerializeToString())

What’s actually wrong? Do you have any hints how to fix it? This makes debugging/executing the code extremely annoying…

Hello,

this is not expected. how big are the .pb files and converted files? what is the pure tf frozen file load time compared to the 10min?

Thank you for looking into this!
Both files are pretty similar in size: 67MB for ssd_mobilenet_v2_coco from model zoo. Exact sizes in bytes:

  • 69688296 - original model
  • 69219036 - optimized model

Pure TF model loads in 12 seconds.

I’m trying to build libprotobuf and python-protobuf from sources, let’s see how that goes. Unfortunately I’m facing some build issues with the latter, librotobuf went smoothly.

Hello,

To help us debug, can you share a small repro containing the orginal model, optimized model, and full converting source and load source that demonstrate the performance you are seeing?

Sure thing, I will provide you with all needed pieces soon. Thanks for your support!

Ok, so here it is:

  • Fetch and extract the model from model zoo
  • mkdir trt_test
    cd trt_test
    wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
    tar xzf ssd_mobilenet_v2_coco_2018_03_29.tar.gz --strip-components=1 -C ./ ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb
    mv frozen_inference_graph.pb ssd_mobilenet_v2_coco.pb
    
  • Run build.py script to convert the model from the zoo into a TRT-optimized one
  • python3 build.py
    

    Script content:

    # build.py
    # The script to build TRT-optimized graph from a given non-optimized one
    
    import os
    import tensorflow.contrib.tensorrt as trt
    import tensorflow as tf
    
    DATA_DIR = './'
    MODEL = 'ssd_mobilenet_v2_coco'
    TRT_SUFFIX = '_fp16_trt'
    
    BOXES_NAME='detection_boxes'
    CLASSES_NAME='detection_classes'
    SCORES_NAME='detection_scores'
    NUM_DETECTIONS_NAME='num_detections'
    output_names = [BOXES_NAME, CLASSES_NAME, SCORES_NAME, NUM_DETECTIONS_NAME]
    
    print("------------- Load frozen graph from disk -------------")
    with tf.gfile.GFile(DATA_DIR + MODEL + '.pb', "rb") as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
    
    print("------------- Optimize the model with TensorRT -------------")
    trt_graph = trt.create_inference_graph(
        input_graph_def=graph_def,
        outputs=output_names,
        max_batch_size=1,
        max_workspace_size_bytes=1 << 26,
        precision_mode='FP16',
        minimum_segment_size=2
    )
    
    print("------------- Write optimized model to the file -------------")
    with open(DATA_DIR + MODEL + TRT_SUFFIX + '.pb', 'wb') as f:
        f.write(trt_graph.SerializeToString())
    
    print("------------- DONE! -------------")
    
  • Run load.py script to measure the time to load both models
  • python load.py
    

    Script content:

    # load.py
    # The script to measure model load time
    
    import time
    import tensorflow as tf
    
    DATA_DIR = './'
    MODEL = 'ssd_mobilenet_v2_coco'
    TRT_SUFFIX = '_fp16_trt'
    
    def load_pb(pb_path):
        """Load the TF graph from the pre-build pb file."""
        print('------------- Load the TF graph from the pre-build pb file: {} -------------'.format(pb_path))
        start_time = time.time()
        graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_path, 'rb') as pf:
            graph_def.ParseFromString(pf.read())
        
        stop_time = time.time()
        print('------------- Load time: {0:.2f} sec'.format(stop_time - start_time))
        return graph_def
    
    _ = load_pb(DATA_DIR + MODEL + '.pb')
    _ = load_pb(DATA_DIR + MODEL + TRT_SUFFIX + '.pb')
    

    And here’s the output from both scripts on my side. I was lucky this time and the TRT-optimized model loaded in just 7 minutes :)

    # <b>FROM load.py</b>
    ------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
    ------------- Load time: 8.19 sec
    ------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
    ------------- Load time: 421.29 sec
    
    # <b>FROM build.py</b>
    ------------- Load frozen graph from disk -------------
    ------------- Optimize the model with TensorRT -------------
    2019-02-04 09:26:40.726557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
    2019-02-04 09:26:40.794589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:931] ARM64 does not support NUMA - returning NUMA node zero
    2019-02-04 09:26:40.794786: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
    2019-02-04 09:26:40.795102: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
    2019-02-04 09:26:40.814134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
    name: DRIVE PX 2 AutoChauffeur major: 6 minor: 1 memoryClockRate(GHz): 1.29
    pciBusID: 0000:04:00.0
    totalMemory: 3.75GiB freeMemory: 3.68GiB
    2019-02-04 09:26:40.814277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
    name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.275
    pciBusID: 0000:00:00.0
    totalMemory: 6.24GiB freeMemory: 2.20GiB
    2019-02-04 09:26:40.814428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1496] Ignoring visible gpu device (device: 1, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
    2019-02-04 09:26:40.814472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    2019-02-04 09:26:42.014100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-02-04 09:26:42.014248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1
    2019-02-04 09:26:42.014275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N N
    2019-02-04 09:26:42.014293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   N N
    2019-02-04 09:26:42.014483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3382 MB memory) -> physical GPU (device: 0, name: DRIVE PX 2 AutoChauffeur, pci bus id: 0000:04:00.0, compute capability: 6.1)
    2019-02-04 09:26:57.204819: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 2
    2019-02-04 09:26:57.223562: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
    2019-02-04 09:26:57.223811: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
    2019-02-04 09:26:57.251539: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
    2019-02-04 09:26:57.251758: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
    2019-02-04 09:26:57.253704: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
    2019-02-04 09:26:59.666888: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
    2019-02-04 09:26:59.667564: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
    2019-02-04 09:26:59.713881: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 3 nodes succeeded.
    2019-02-04 09:27:04.478200: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 09:27:04.480121: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 09:27:04.485573: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 09:27:04.486869: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 09:27:04.488511: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
    2019-02-04 09:27:04.488607: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6035 nodes (-1940), 10082 edges (-2174), time = 2862.97095ms.
    2019-02-04 09:27:04.488643: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 6226 nodes (191), 10284 edges (202), time = 756.717ms.
    2019-02-04 09:27:04.488695: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6223 nodes (-3), 10281 edges (-3), time = 5475.65576ms.
    2019-02-04 09:27:04.488731: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6053 nodes (-170), 10111 edges (-170), time = 1619.82605ms.
    2019-02-04 09:27:04.488765: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6053 nodes (0), 10111 edges (0), time = 2764.40894ms.
    2019-02-04 09:27:04.488797: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
    2019-02-04 09:27:04.488830: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 2.048ms.
    2019-02-04 09:27:04.488862: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 8 nodes (0), 7 edges (0), time = 0.914ms.
    2019-02-04 09:27:04.488905: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.26ms.
    2019-02-04 09:27:04.488968: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 1.603ms.
    2019-02-04 09:27:04.489003: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.269ms.
    2019-02-04 09:27:04.489036: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
    2019-02-04 09:27:04.489069: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 2.194ms.
    2019-02-04 09:27:04.489101: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.962ms.
    2019-02-04 09:27:04.489135: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.244ms.
    2019-02-04 09:27:04.489167: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 1.006ms.
    2019-02-04 09:27:04.489200: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.227ms.
    ------------- Write optimized model to the file -------------
    ------------- DONE! -------------
    

    hello,

    I’m not seeing the performance issue described above:

    root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python load.py
    ------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
    ------------- Load time: 0.17 sec
    ------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
    ------------- Load time: 0.26 sec
    
    root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python build.py
    ------------- Load frozen graph from disk -------------
    ------------- Optimize the model with TensorRT -------------
    2019-02-04 17:34:11.776013: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 8
    2019-02-04 17:34:11.776257: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
    2019-02-04 17:34:11.793709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:06:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.794382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:07:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.795045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:0a:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.795682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:0b:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.796312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 4 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:85:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.796945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 5 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:86:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.797599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 6 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:89:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.798238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 7 with properties:
    name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
    pciBusID: 0000:8a:00.0
    totalMemory: 31.72GiB freeMemory: 31.31GiB
    2019-02-04 17:34:11.798534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
    2019-02-04 17:34:15.868392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-02-04 17:34:15.868455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
    2019-02-04 17:34:15.868466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
    2019-02-04 17:34:15.868473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
    2019-02-04 17:34:15.868484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
    2019-02-04 17:34:15.868521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
    2019-02-04 17:34:15.868529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
    2019-02-04 17:34:15.868535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
    2019-02-04 17:34:15.868558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
    2019-02-04 17:34:15.868565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
    2019-02-04 17:34:15.871958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30342 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.872761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30342 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.873333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30342 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.873934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30342 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.874547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30342 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.875042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30342 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.875513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30342 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
    2019-02-04 17:34:15.875966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30342 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
    2019-02-04 17:34:17.998323: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:868] MULTIPLE tensorrt candidate conversion: 2
    2019-02-04 17:34:18.001120: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
    2019-02-04 17:34:18.001139: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
    2019-02-04 17:34:18.007434: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
    2019-02-04 17:34:18.007455: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
    2019-02-04 17:34:20.488707: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
    2019-02-04 17:34:20.548541: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_1 creation for segment 1, composed of 3 nodes succeeded.
    2019-02-04 17:34:21.991633: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 17:34:21.992225: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 17:34:22.000877: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 17:34:22.001457: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
    2019-02-04 17:34:22.006006: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
    2019-02-04 17:34:22.006029: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6035 nodes (-1940), 10082 edges (-2174), time = 832.385ms.
    2019-02-04 17:34:22.006037: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 6226 nodes (191), 10284 edges (202), time = 233.816ms.
    2019-02-04 17:34:22.006044: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6223 nodes (-3), 10281 edges (-3), time = 3354.4ms.
    2019-02-04 17:34:22.006050: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 6053 nodes (-170), 10111 edges (-170), time = 562.532ms.
    2019-02-04 17:34:22.006058: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 6053 nodes (0), 10111 edges (0), time = 774.5ms.
    2019-02-04 17:34:22.006092: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
    2019-02-04 17:34:22.006099: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 3.896ms.
    2019-02-04 17:34:22.006120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 8 nodes (0), 7 edges (0), time = 0.283ms.
    2019-02-04 17:34:22.006127: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.094ms.
    2019-02-04 17:34:22.006134: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 0.461ms.
    2019-02-04 17:34:22.006141: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.07ms.
    2019-02-04 17:34:22.006164: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
    2019-02-04 17:34:22.006171: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.521ms.
    2019-02-04 17:34:22.006178: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.284ms.
    2019-02-04 17:34:22.006185: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.077ms.
    2019-02-04 17:34:22.006199: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.495ms.
    2019-02-04 17:34:22.006207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.07ms.
    ------------- Write optimized model to the file -------------
    ------------- DONE! -------------
    

    Well, apart from the fact that you are not on Drive PX 2 (which is probably irrelevant to this problem) - which TensorFlow version do you have? Which version protobuf and libprotobuf? Do you have python-protobuf with anabled cpp implementation? I’m suspecting protobuf to be the cause of the problem.

    Hello,

    I’m using TensorFlow 1.12.0.

    root@0760a6daacdc:/home/scratch.zhenyi_sw/repro2490943# pip show protobuf
    Name: protobuf
    Version: 3.6.1
    Summary: Protocol Buffers
    Home-page: https://developers.google.com/protocol-buffers/
    Author: None
    Author-email: None
    License: 3-Clause BSD License
    Location: /usr/local/lib/python3.5/dist-packages
    Requires: six, setuptools
    Required-by: tensorflow-gpu, tensorboard
    

    I was right, it’s a protobuf issue. As far as I understand the problem, DPX2 has protobuf 2.6.1 with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION set to python. I have compiled and installed libprotobuf and python-protobuf 3.6.1 from sources with --cpp_implementation and then the models load in a fraction of a second (both - original and TensorRT ones).

    However, this breaks my OpenCV installation (and I have an app which uses both TensorFlow and OpenCV). Apparently OpenCV’s GTK support is linked with libmirc which is built with protobuf 2.6.1. When I tried using OpenCV initially built before switching the protobuf - it simply segfaults after ‘import cv2’. So I tried rebuilding OpenCV with just protobuf 3.6.1 available - it failed due to missing dependencies for libmirclient. When I put back the libprotobuf-lite.so.9.0.1 (so from v2.6.1) then OpenCV builds fine but fails in runtime with:

    >>> import cv2
    [libprotobuf FATAL google/protobuf/stubs/common.cc:79] This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.6.1).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
    terminate called after throwing an instance of 'google::protobuf::FatalException'
      what():  This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.6.1).  Contact the program author for an update.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
    Aborted (core dumped)
    

    So, now I’m hit by the same problem as described at

    and
    https://stackoverflow.com/questions/43236034/opencv-3-2-includes-libmir-and-protobuf-2-6-which-is-conflicting-with-protobuf

    I’ll check if turning off GTK support in OpenCV and switching to Qt instead solves this problem for me…

    I guess there’s no simple way to upgrade Mir (libmirclient) on DPX2 to a version built with protobuf 3.6.1?

    Will triage and keep you updated.

    OK, I think I got it sorted out. I left protobuf 2.6.1 almost untouched, just installed 3.6.1 next to it and set the symlinks in a way that 3.6.1 is the default one. I rebuilt OpenCV with the following options:

    -D WITH_PROTOBUF=OFF \
    -D BUILD_PROTOBUF=OFF \
    -D PROTOBUF_UPDATE_FILES=OFF \
    

    and everything seems fine. After:

    export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
    

    both models load in a fraction of a second.

    Side question: is the way I build TensorRT models described above in build.py (so with TFT-TRT api) the correct one or shall I rather go through UFF? So far I see no improvement in the inference time comparing to the original models; so far I tried on

    • ssd_inception_v2_coco
    • ssd_mobilenet_v2_coco
    • mask_rcnn_inception_v2_coco
    • faster_rcnn_resnet50_coco

    from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md.
    Looking at https://github.com/NVIDIA-AI-IOT/tf_trt_models#models-1 I should see ~2x faster inference with ssd_inception_v2_coco at least, but I see no improvement whatsoever.

    For reference, as others already ask for steps I done to update the protobuf, here’s the full description:

    # Check current version
    $ protoc --version
    libprotoc 2.6.1
     
    # Create a backup of the current config, just in case
    mkdir protobuf
    cd protobuf/
    mkdir backup_originals
    mkdir backup_originals/protoc
    cp /usr/bin/protoc backup_originals/protoc/
    tar cvzf backup_originals/libprotobuf.tgz /usr/lib/aarch64-linux-gnu/libprotobuf*
    # Original include files located at: /usr/include/google/protobuf/
    # I did not backed them up
     
    # Original configuration of the libraries
    $ ls -l /usr/lib/aarch64-linux-gnu/libprotobuf*
    -rw-r--r-- 1 root root 2464506 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.a
    -rw-r--r-- 1 root root  430372 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.a
    lrwxrwxrwx 1 root root      25 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so -> libprotobuf-lite.so.9.0.1
    lrwxrwxrwx 1 root root      25 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so.9 -> libprotobuf-lite.so.9.0.1
    -rw-r--r-- 1 root root  199096 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so.9.0.1
    lrwxrwxrwx 1 root root      20 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so -> libprotobuf.so.9.0.1
    lrwxrwxrwx 1 root root      20 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so.9 -> libprotobuf.so.9.0.1
    -rw-r--r-- 1 root root 1153872 Oct 24  2015 /usr/lib/aarch64-linux-gnu/libprotobuf.so.9.0.1
     
    # Fetch and upack the sources of version 3.6.1
    wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-python-3.6.1.zip
    wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protoc-3.6.1-linux-aarch_64.zip
    unzip protoc-3.6.1-linux-aarch_64.zip -d protoc-3.6.1
    unzip protobuf-python-3.6.1.zip
     
    # Update the protoc
    sudo cp protoc-3.6.1/bin/protoc /usr/bin/protoc
     
    $ protoc --version
    libprotoc 3.6.1
     
    # BUILD AND INSTALL THE LIBRARIES
    export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
    cd protobuf-3.6.1/
    ./autogen.sh
    ./configure
    make
    make check
    sudo make install
     
    # Remove unnecessary links to the old version
    sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf.a
    sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf-lite.a
    sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf-lite.so
    sudo rm /usr/lib/aarch64-linux-gnu/libprotobuf.so
     
    # Move old version of the libraries to the same folder where the new ones have been installed, for clarity
    sudo cp -d /usr/lib/aarch64-linux-gnu/libproto* /usr/local/lib/
    sudo rm /usr/lib/aarch64-linux-gnu/libproto*
    
    sudo ldconfig # Refresh shared library cache   
    
    # Check the updated version
    $ protoc --version
    libprotoc 3.6.1
    
    # Final configuration of the libraries after the update
    $ ls -l /usr/local/lib/libproto*
    -rw-r--r-- 1 root root 77064022 Feb  9 11:07 /usr/local/lib/libprotobuf.a
    -rwxr-xr-x 1 root root      978 Feb  9 11:07 /usr/local/lib/libprotobuf.la
    -rw-r--r-- 1 root root  9396522 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.a
    -rwxr-xr-x 1 root root     1013 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.la
    lrwxrwxrwx 1 root root       26 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so -> libprotobuf-lite.so.17.0.0
    lrwxrwxrwx 1 root root       26 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so.17 -> libprotobuf-lite.so.17.0.0
    -rwxr-xr-x 1 root root  3722376 Feb  9 11:07 /usr/local/lib/libprotobuf-lite.so.17.0.0
    lrwxrwxrwx 1 root root       25 Feb  9 11:19 /usr/local/lib/libprotobuf-lite.so.9 -> libprotobuf-lite.so.9.0.1
    -rw-r--r-- 1 root root   199096 Feb  9 11:19 /usr/local/lib/libprotobuf-lite.so.9.0.1
    lrwxrwxrwx 1 root root       21 Feb  9 11:07 /usr/local/lib/libprotobuf.so -> libprotobuf.so.17.0.0
    lrwxrwxrwx 1 root root       21 Feb  9 11:07 /usr/local/lib/libprotobuf.so.17 -> libprotobuf.so.17.0.0
    -rwxr-xr-x 1 root root 30029352 Feb  9 11:07 /usr/local/lib/libprotobuf.so.17.0.0
    lrwxrwxrwx 1 root root       20 Feb  9 11:19 /usr/local/lib/libprotobuf.so.9 -> libprotobuf.so.9.0.1
    -rw-r--r-- 1 root root  1153872 Feb  9 11:19 /usr/local/lib/libprotobuf.so.9.0.1
    -rw-r--r-- 1 root root 99883696 Feb  9 11:07 /usr/local/lib/libprotoc.a
    -rwxr-xr-x 1 root root      994 Feb  9 11:07 /usr/local/lib/libprotoc.la
    lrwxrwxrwx 1 root root       19 Feb  9 11:07 /usr/local/lib/libprotoc.so -> libprotoc.so.17.0.0
    lrwxrwxrwx 1 root root       19 Feb  9 11:07 /usr/local/lib/libprotoc.so.17 -> libprotoc.so.17.0.0
    -rwxr-xr-x 1 root root 32645760 Feb  9 11:07 /usr/local/lib/libprotoc.so.17.0.0
    lrwxrwxrwx 1 root root       18 Feb  9 11:19 /usr/local/lib/libprotoc.so.9 -> libprotoc.so.9.0.1
    -rw-r--r-- 1 root root   991440 Feb  9 11:19 /usr/local/lib/libprotoc.so.9.0.1
     
    # Reboot, just in case :)
    sudo reboot
     
    # BUILD AND INSTALL THE PYTHON-PROTOBUF MODULE
    cd protobuf-3.6.1/python/
    export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
    
    # Fix setup.py to force compilation with c++11 standard
    vim setup.py
     
    $ diff setup.py setup.py~
    205,208c205,208
    <     #if v:
    <     #  extra_compile_args.append('-std=c++11')
    <     #elif os.getenv('KOKORO_BUILD_NUMBER') or os.getenv('KOKORO_BUILD_ID'):
    <     extra_compile_args.append('-std=c++11')
    ---
    >     if v:
    >       extra_compile_args.append('-std=c++11')
    >     elif os.getenv('KOKORO_BUILD_NUMBER') or os.getenv('KOKORO_BUILD_ID'):
    >       extra_compile_args.append('-std=c++11')
     
    # Build, test and install
    python3 setup.py build --cpp_implementation
    python3 setup.py test --cpp_implementation
    sudo python3 setup.py install --cpp_implementation
     
    # Make the cpp backend a default one when user logs in
    sudo sh -c "echo 'export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp' >> /etc/profile.d/protobuf.sh"
    

    I found that this update tends to break pip, so simply updated it with:

    wget http://se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python3-pip_9.0.1-2_all.deb
    wget http://se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python-pip-whl_9.0.1-2_all.deb
    sudo dpkg -i *.deb
    

    Facing this issue while installing opencv with option:
    -D WITH_PROTOBUF=OFF
    -D BUILD_PROTOBUF=OFF
    -D PROTOBUF_UPDATE_FILES=OFF \

    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_deleteLayer(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16420:5: error: ‘LayerId’ was not declared in this scope
    LayerId layer;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16424:34: error: ‘layer’ was not declared in this scope
    pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getFLOPS(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16603:5: error: ‘vector_MatShape’ was not declared in this scope
    vector_MatShape netInputShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16608:43: error: ‘netInputShapes’ was not declared in this scope
    pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16634:5: error: ‘vector_MatShape’ was not declared in this scope
    vector_MatShape netInputShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16639:43: error: ‘netInputShapes’ was not declared in this scope
    pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getLayer(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16675:5: error: ‘LayerId’ was not declared in this scope
    LayerId layerId;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16680:36: error: ‘layerId’ was not declared in this scope
    pyopencv_to(pyobj_layerId, layerId, ArgInfo(“layerId”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getLayersShapes(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16788:5: error: ‘vector_MatShape’ was not declared in this scope
    vector_MatShape netInputShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16790:5: error: ‘vector_vector_MatShape’ was not declared in this scope
    vector_vector_MatShape inLayersShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16791:28: error: expected ‘;’ before ‘outLayersShapes’
    vector_vector_MatShape outLayersShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16795:43: error: ‘netInputShapes’ was not declared in this scope
    pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16797:69: error: ‘inLayersShapes’ was not declared in this scope
    ERRWRAP2(self->getLayersShapes(netInputShapes, layersIds, inLayersShapes, outLayersShapes));
    ^
    /home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
    expr;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16797:85: error: ‘outLayersShapes’ was not declared in this scope
    ERRWRAP2(self->getLayersShapes(netInputShapes, layersIds, inLayersShapes, outLayersShapes));
    ^
    /home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
    expr;
    ^
    In file included from /home/nvidia/opencv/modules/python/src2/cv2.cpp:1681:0:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16798:79: error: ‘inLayersShapes’ was not declared in this scope
    return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16798:110: error: ‘outLayersShapes’ was not declared in this scope
    return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16807:5: error: ‘vector_vector_MatShape’ was not declared in this scope
    vector_vector_MatShape inLayersShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16808:28: error: expected ‘;’ before ‘outLayersShapes’
    vector_vector_MatShape outLayersShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16814:68: error: ‘inLayersShapes’ was not declared in this scope
    ERRWRAP2(self->getLayersShapes(netInputShape, layersIds, inLayersShapes, outLayersShapes));
    ^
    /home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
    expr;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16814:84: error: ‘outLayersShapes’ was not declared in this scope
    ERRWRAP2(self->getLayersShapes(netInputShape, layersIds, inLayersShapes, outLayersShapes));
    ^
    /home/nvidia/opencv/modules/python/src2/cv2.cpp:87:5: note: in definition of macro ‘ERRWRAP2’
    expr;
    ^
    In file included from /home/nvidia/opencv/modules/python/src2/cv2.cpp:1681:0:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16815:79: error: ‘inLayersShapes’ was not declared in this scope
    return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16815:110: error: ‘outLayersShapes’ was not declared in this scope
    return Py_BuildValue(“(NNN)”, pyopencv_from(layersIds), pyopencv_from(inLayersShapes), pyopencv_from(outLayersShapes));
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getMemoryConsumption(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16850:5: error: ‘vector_MatShape’ was not declared in this scope
    vector_MatShape netInputShapes;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16856:43: error: ‘netInputShapes’ was not declared in this scope
    pyopencv_to(pyobj_netInputShapes, netInputShapes, ArgInfo(“netInputShapes”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_getParam(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16893:5: error: ‘LayerId’ was not declared in this scope
    LayerId layer;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:16899:34: error: ‘layer’ was not declared in this scope
    pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) )
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h: In function ‘PyObject* pyopencv_cv_dnn_dnn_Net_setParam(PyObject*, PyObject*, PyObject*)’:
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17051:5: error: ‘LayerId’ was not declared in this scope
    LayerId layer;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17058:34: error: ‘layer’ was not declared in this scope
    pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) &&
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17069:5: error: ‘LayerId’ was not declared in this scope
    LayerId layer;
    ^
    /home/nvidia/opencv/build/modules/python_bindings_generator/pyopencv_generated_types.h:17076:34: error: ‘layer’ was not declared in this scope
    pyopencv_to(pyobj_layer, layer, ArgInfo(“layer”, 0)) &&
    ^
    modules/python2/CMakeFiles/opencv_python2.dir/build.make:62: recipe for target 'modules/python2/CMakeFiles/opencv_python2.dir//src2/cv2.cpp.o’ failed
    make[2]: *** [modules/python2/CMakeFiles/opencv_python2.dir/
    /src2/cv2.cpp.o] Error 1
    CMakeFiles/Makefile2:11608: recipe for target ‘modules/python2/CMakeFiles/opencv_python2.dir/all’ failed
    make[1]: *** [modules/python2/CMakeFiles/opencv_python2.dir/all] Error 2
    Makefile:160: recipe for target ‘all’ failed
    make: *** [all] Error 2
    Make did not successfully build
    Please fix issues and retry build

    I am also seeing long load times.
    I am running JetPack 4.2 on both a TX2 and an Xavier, the host is an x86 with a 2070 GPU.
    On the x86 I took an inception_v3 graph and finetuned it with the flowers photos.
    The resultant frozen graph is then copied to the TX2 and Xavier.
    I learned that a TensorRT graph built on one does not work on the other.

    Building or reloading an existing TRT graph on the Xavier takes about 2 minutes, on the TX2 about 20.
    Each was built using the sdkmanger.
    Each has a 16GB swap space.
    For the TX2 it is on a 128GB sd card.
    For the Xavier it is on a 1000GB ssd memory stick.

    Is the openCV really the solution?
    Is it loaded when the sdkmanager builds the TX2 and Xavier?
    What is the proper way the install it if it is not?

    After my previous post I decided there were too many variables.
    Here is a sample all can do.

    Get this package:
    https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/tensorrt

    I modified the script for debug purposes as follows:

    diff tftrt_sample.py tftrt_sample.py.org 
    92d91
    <   print( datetime.datetime.now(), " getResnet50" )
    111d109
    <   print( datetime.datetime.now(), " getFP32" )
    121d118
    <   print( datetime.datetime.now(), " getFP16" )
    146c143
    <   print(datetime.datetime.now(), "Starting execution")
    ---
    >   tf.logging.info("Starting execution")
    172c169
    <     print(datetime.datetime.now(), " Starting Warmup cycle")
    ---
    >     tf.logging.info("Starting Warmup cycle")
    203c200
    <     print(datetime.datetime.now(), "Warmup done. Starting real timing")
    ---
    >     tf.logging.info("Warmup done. Starting real timing")
    267,268c264
    <   print(datetime.datetime.now(), " Starting")
    < 
    ---
    >   print("Starting at",datetime.datetime.now())
    

    I also removed the --INT8 option from run_all.sh

    After $ run_all > TX2.log

    The log contains:

    Namespace(FP16=True, FP32=True, INT8=False, batch_size=4, dump_diff=False, native=True, num_loops=10, topN=5, update_graphdef=False, with_timeline=False, workspace_size=2048)
    2019-04-13 23:26:59.405493  Starting
    2019-04-13 23:27:06.691047  getResnet50
    2019-04-13 23:27:08.879296 Starting execution
    2019-04-13 23:27:12.120849  Starting Warmup cycle
    2019-04-13 23:27:38.190371 Warmup done. Starting real timing
    iter  0   0.1170225191116333
    iter  1   0.11706938743591308
    iter  2   0.11715104579925537
    iter  3   0.11713536262512207
    iter  4   0.11703117370605469
    iter  5   0.11687781810760497
    iter  6   0.11692732810974121
    iter  7   0.11688094139099121
    iter  8   0.11711055755615235
    iter  9   0.11685168743133545
    Comparison= True
    images/s : 34.2 +/- 0.0, s/batch: 0.11701 +/- 0.00011
    RES, Native, 4, 34.19, 0.03, 0.11701, 0.00011
    2019-04-13 23:28:39.120928  getFP32
    2019-04-13 23:28:39.122388  getResnet50
    2019-04-14 00:05:18.587516 Starting execution
    2019-04-14 00:39:11.500612  Starting Warmup cycle
    2019-04-14 00:39:55.384542 Warmup done. Starting real timing
    iter  0   0.06356308937072754
    iter  1   0.06371050834655761
    iter  2   0.06345504283905029
    iter  3   0.06329115867614746
    iter  4   0.06343845844268799
    iter  5   0.06320501804351807
    iter  6   0.06346035480499268
    iter  7   0.0631892728805542
    iter  8   0.06757570266723632
    iter  9   0.06330945014953614
    Comparison= True
    images/s : 62.7 +/- 1.2, s/batch: 0.06382 +/- 0.00126
    RES, TRT-FP32, 4, 62.68, 1.18, 0.06382, 0.00126
    2019-04-14 00:41:19.378257  getFP16
    2019-04-14 00:41:19.380426  getResnet50
    2019-04-14 00:59:41.581313 Starting execution
    2019-04-14 01:32:10.168278  Starting Warmup cycle
    2019-04-14 01:32:42.214924 Warmup done. Starting real timing
    iter  0   0.03612914085388184
    iter  1   0.03567664623260498
    iter  2   0.03541929721832275
    iter  3   0.03596384525299072
    iter  4   0.03592778205871582
    iter  5   0.035593876838684084
    iter  6   0.0354670524597168
    iter  7   0.03562225341796875
    iter  8   0.03560783863067627
    iter  9   0.035287847518920896
    Comparison= True
    images/s : 112.1 +/- 0.8, s/batch: 0.03567 +/- 0.00025
    RES, TRT-FP16, 4, 112.14, 0.78, 0.03567, 0.00025
    Done timing 2019-04-14 01:33:36.285660
    native ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
    FP32 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
    FP16 ['bow tie, bow-tie, bowtie', 'cornet, horn, trumpet, trump', 'military uniform', 'sweatshirt', 'bulletproof vest']
    

    This snippet is the issue, why sooooo looooong.

    2019-04-13 23:28:39.120928  getFP32
    2019-04-13 23:28:39.122388  getResnet50
    2019-04-14 00:05:18.587516 Starting execution
    2019-04-14 00:39:11.500612  Starting Warmup cycle
    2019-04-14 00:39:55.384542 Warmup done. Starting real timing
    

    On Xavier the same script runs much quicker.

    RES, TRT-FP32, 4, 160.46, 0.48, 0.02493, 0.00008
    2019-04-13 23:18:35.111292  getFP16
    2019-04-13 23:18:35.111525  getResnet50
    2019-04-13 23:21:33.759369 Starting execution
    2019-04-13 23:22:10.313379  Starting Warmup cycle
    2019-04-13 23:22:11.314511 Warmup done. Starting real timing
    

    Anyone have any thoughts?

    It’s most likely a protobuf issue, please read carefully what I wrote in https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/post/5313240/#5313240

    OpenCV issue was never a cause of the long loading problem, just a result of updating protobuf.

    So, I’d suggest you start with:

    export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
    

    And if that doesn’t help - update protobuf with the steps I described at https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/post/5315675/#5315675

    Either I botched the TX2 build or NVIDIA builds the TX2 with a different set of software from the Xavier.
    On the Xavier I get:
    $ protoc --version
    libprotoc 3.0.0

    On the TX2 I get:
    $ protoc --version
    -bash: protoc: command not found

    Both systems were built with 4.2 using the sdkmanager

    Thanks for your replay.
    I now something to try.

    I have the same issue with Jetpack 4.2. protoc is not found, and loading the TF-TRT graph also takes a long time (~5 minutes) vs. the original graph (~20 s) on both an Xavier and a TX2i. Updating protobuf via sudo pip3 install protobuf did not fix this.

    Just an update, I tried the steps so kindly indicated by dariusz.filipski, and my issue was resolved. Both on a TX2i and on an Xavier.

    This is confusing because I used Jetpack as the installer and the NVidia provided installers for Tensorflow on those platforms. Why is Nvidia using a setup that results in a suboptimal protobuf version being installed? In my case, the graph loading time went down from ~5 min to ~10 s.