use tensorflow tensorrt API convert failed

xszxaa · April 10, 2018, 11:02am

my code :
#!/usr/bin/python

-- coding: UTF-8 --

from future import absolute_import
from future import division
from future import print_function

import argparse
import imghdr
import json
import os
import sys
import time

import numpy as np
import tensorflow as tf
from tensorflow.contrib.saved_model.python.saved_model import reader
import tensorflow.contrib.tensorrt as trt

def get_frozen_graph(graph_file):
“”“Read Frozen Graph file from disk.”“”
with tf.gfile.FastGFile(graph_file, “rb”) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def

def write_graph_to_file(graph_name, graph_def, output_dir):
“”“Write Frozen Graph file to disk.”“”
output_path = os.path.join(output_dir, graph_name)
with tf.gfile.GFile(output_path, “wb”) as f:
f.write(graph_def.SerializeToString())

def get_trt_graph(graph_name, graph_def, precision_mode, output_dir,
output_node, batch_size=1, workspace_size=10<<30):
trt_graph = trt.create_inference_graph(
graph_def, [output_node], max_batch_size=batch_size,
max_workspace_size_bytes=workspace_size,
precision_mode=precision_mode)
write_graph_to_file(graph_name, trt_graph, output_dir)

if name == “main”:
frozen_graph_def = get_frozen_graph(‘/home/he/下载/deeplabv3_mnv2_cityscapes_train/frozen_inference_graph.pb’)
get_trt_graph(‘deeplabv3’,frozen_graph_def , “FP32”, ‘./’,
‘SemanticPredictions’)

get error like this:
018-04-10 18:48:18.252942: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::287, condition: dims.d[i] > 0
2018-04-10 18:48:18.252957: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:12 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 19 nodes)
2018-04-10 18:48:18.254062: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::287, condition: dims.d[i] > 0
2018-04-10 18:48:18.254075: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:13 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 18 nodes)
2018-04-10 18:48:18.255165: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::287, condition: dims.d[i] > 0
2018-04-10 18:48:18.255177: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:14 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 5 nodes)
2018-04-10 18:48:18.257232: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 1 max workspace size= 540395264
2018-04-10 18:48:18.257246: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
*** Error in `python’: munmap_chunk(): invalid pointer: 0x00007ffe8f46d550 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fd5270067e5]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x1a8)[0x7fd527013698]
/opt/deep_learn/tensorflow_object/vir/lib/python3.5/site-packages/tensorflow/python/…/libtensorflow_framework.so(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeISsLb1EEE+0xfc)[0x7fd4ea6dfa0c]
/usr/local/cuda-9.0/lib64/libnvinfer.so.4(_ZNSt10_HashtableISsSsSaISsENSt8__detail9_IdentityESt8equal_toISsESt4hashISsENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb1ELb1ELb1EEEE9_M_insertIRKSsNS1_10_AllocNodeISaINS1_10_Hash_nodeISsLb1EEEEEEEESt4pairINS1_14_Node_iteratorISsLb1ELb1EEEbEOT_RKT0_St17integral_constantIbLb1EE+0x96)[0x7fd4a4b2ea26]
/usr/local/cuda-9.0/lib64/libnvinfer.so.4(_ZNK8nvinfer17Network8validateERKNS_5cudnn15HardwareContextEbbi+0x1a6)[0x7fd4a4b1cb36]
/usr/local/cuda-9.0/lib64/libnvinfer.so.4(_ZN8nvinfer17builder11buildEngineERNS_21CudaEngineBuildConfigERKNS_5cudnn15HardwareContextERKNS_7NetworkE+0x46)[0x7fd4a4b09156]
/usr/local/cuda-9.0/lib64/libnvinfer.so.4(_ZN8nvinfer17Builder15buildCudaEngineERNS_18INetworkDefinitionE+0x11)[0x7fd4a4af3e81]
/opt/deep_learn/tensorflow_object/vir/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(_ZN10tensorflow8tensorrt7convert32ConvertSubGraphToTensorRTNodeDefERNS1_14SubGraphParamsE+0x2020)[0x7fd4a43a9d90]
/opt/deep_learn/tensorflow_object/vir/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(_ZN10tensorflow8tensorrt7convert25ConvertGraphDefToTensorRTERKNS_8GraphDefERKSt6vectorISsSaISsEEmmPS2_ii+0x200b)[0x7fd4a438988b]
/opt/deep_learn/tensorflow_object/vir/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(+0x4de8f)[0x7fd4a4380e8f]
/opt/deep_learn/tensorflow_object/vir/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/_wrap_conversion.so(+0x4e51a)[0x7fd4a438151a]
python(PyCFunction_Call+0x4f)[0x4e9b7f]
python(PyEval_EvalFrameEx+0x614)[0x5372f4]
python[0x540199]
python(PyEval_EvalFrameEx+0x50b2)[0x53bd92]
python[0x540199]
python(PyEval_EvalFrameEx+0x50b2)[0x53bd92]
python[0x540199]
python(PyEval_EvalCode+0x1f)[0x540e4f]
python[0x60c272]
python(PyRun_FileExFlags+0x9a)[0x60e71a]
python(PyRun_SimpleFileExFlags+0x1bc)[0x60ef0c]
python(Py_Main+0x456)[0x63fb26]
python(main+0xe1)[0x4cfeb1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fd526faf830]
python(_start+0x29)[0x5d6049]
======= Memory map: ========
00400000-007a9000 r-xp 00000000 08:02 17601641 /opt/deep_learn/tensorflow_object/vir/bin/python3
009a9000-009ab000 r–p 003a9000 08:02 17601641 /opt/deep_learn/tensorflow_object/vir/bin/python3
009ab000-00a42000 rw-p 003ab000 08:02 17601641 /opt/deep_learn/tensorflow_object/vir/bin/python3
00a42000-00a73000 rw-p 00000000 00:00 0
01f1f000-0ab2b000 rw-p 00000000 00:00 0 [heap]
10000000-10001000 rw-s 00000000 00:06 513 /dev/nvidia0
10001000-10002000 rw-s 00000000 00:06 513 /dev/nvidia0
10002000-10003000 rw-s 00000000 00:06 513 /dev/nvidia0
10003000-10004000 rw-s 00000000 00:06 513 /dev/nvidia0
10004000-10005000 rw-s 00000000 00:06 513 /dev/nvidia0

fengwuxuan · April 11, 2018, 7:36am

I have same problem：subgraph conversion error for subgraph_index:9 due to: “Invalid argument: Failed to create Input layer” SKIPPING…( 18 nodes).
Did you solve it?

leiverandres04p · April 19, 2018, 11:30pm

I have same problem, trying to convert an object detection model from tensorflow API (SSD with mobilenet). It outputs an error for every subgraph_index from 0 to 151.

Have anyone solved this?

yong.wang · April 24, 2018, 2:42am

Same problem, anyone solved this? Thanks

SiddharthSharma_TPM · April 26, 2018, 9:26pm

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth

mvillmow · May 2, 2018, 4:52am

Please file a bug here: https://developer.nvidia.com/nvidia-developer-program
Please include the steps used to reproduce the problem along with the output of infer_device.

yong.wang · May 2, 2018, 6:33am

It turns out TensorRT in tensorflow 1.7 just supports the optimization of graph that has the input of constant spatial resolution, but it’s not the case for my model.

Can you help confirm if any plan to support the opt of graphs with variable input size?

Thanks.

Regards
Yong

skama · May 2, 2018, 5:25pm

Hi @xszxaa,

Are you using pip tensorflow-gpu with TensorRT libraries for 16.04? If you are using pre-built TensorFlow packages with pip install tensorflow-gpu, you have to use TensorRT 3.0.4 for 14.04 due to distribution requirements of TensorFlow. We are trying to address this in the upcoming TensorRT release.

@yong.wang, as of now, we need the shapes of the TensorRT engine input known at the compile time. Batch size should also be given but it is more flexible. TensorRT will try to optimize to the given batch size so smaller batch sizes will work but probably not as performant as they can be. Larger batch sizes will not work.

We are working on to remove this constraint if you are using same dimension in every execution of Session.run() at the expense of slower startup times.

@leiverandres04p and @fengwuxuan, the errors you see are also related with shape inference. Also SSD contains many ops that are not supported by TensorRT that are interleaved with supported ops so during the conversion you end up with many tiny engines. Unfortunately, some of these ops rely on dynamic shapes which prevent us to generate the engines. These issues will be addressed by upcoming TensorRT and TFTRT releases.

Cheers,
Sami