Not able to deploy .etlt file in deepstream test app 1

I have configured my config_file.txt according to the specifications given at Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation .
I am trying to deploy the detect_netv2 model with Resnet 18 that i have trained using the transfer learning toolkit. This is the error log

Now playing: sample_720p.h264

Using winsys: x11 
Opening in BLOCKING MODE 
Creating LL OSD context new
0:00:02.949047261 25442   0x558b9f1f50 INFO                 nvinfer gstnvinfer.cpp:519:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]:initialize(): Trying to create engine from model files
0:00:03.321378562 25442   0x558b9f1f50 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]:log(): UffParser: Could not read buffer.
NvDsInferCudaEngineGetFromTltModel: Failed to parse UFF model
0:00:03.338591708 25442   0x558b9f1f50 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]:generateTRTModel(): Failed to create network using custom network creation function
0:00:03.338655511 25442   0x558b9f1f50 ERROR                nvinfer gstnvinfer.cpp:511:gst_nvinfer_logger:<primary-nvinference-engine> NvDsInferContext[UID 1]:initialize(): Failed to create engine from model files
0:00:03.338714887 25442   0x558b9f1f50 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary-nvinference-engine> error: Failed to create NvDsInferContext instance
0:00:03.338744627 25442   0x558b9f1f50 WARN                 nvinfer gstnvinfer.cpp:692:gst_nvinfer_start:<primary-nvinference-engine> error: Config file path: dstest1_pgie_config.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED
Running...
ERROR from element primary-nvinference-engine: Failed to create NvDsInferContext instance
Error details: /dvs/git/dirty/git-master_linux/deepstream/sdk/src/gst-plugins/gst-nvinfer/gstnvinfer.cpp(692): gst_nvinfer_start (): /GstPipeline:dstest1-pipeline/GstNvInfer:primary-nvinference-engine:
Config file path: dstest1_pgie_config.txt, NvDsInfer Error: NVDSINFER_CUSTOM_LIB_FAILED

I am running this on the Jetson nano platform.

Hi,
Could you please paste your dstest1_pgie_config.txt?

################################################################################
# Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8)
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   enable-dbscan(Default=false), interval(Primary mode only, Default=0)
#   custom-lib-path,
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, gie-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
# preprocessing parameters.
net-scale-factor=0.0039215697906911373
model-color-format=0

# model paths.
int8-calib-file=/home/god/deepstream_sdk_v4.0.1_jetson/sources/apps/sample_apps/deepstream-test1/calibration.bin
labelfile-path=/home/god/deepstream_sdk_v4.0.1_jetson/sources/apps/sample_apps/deepstream-test1/labels.txt
tlt-encoded-model=/home/god/deepstream_sdk_v4.0.1_jetson/sources/apps/sample_apps/deepstream-test1/resnet18_detector.etlt
tlt-model-key=YmI2a3U0cGk5aGdzmXNkem5taWY3Yzd1OGg6MDM2NjMyYzktYWI4OS00OTQ1LWE4NmYtM2Y5YTA5ZTQ4NDVi
input-dims=3;400;600;0 # where c = number of channels, h = height of the model input, w = width of model input, 0: implies CHW format.
uff-input-blob-name=input_1
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=1
interval=0
gie-unique-id=1
is-classifier=0
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
#enable_dbscan=0

[class-attrs-all]
threshold=0.2
group-threshold=1
## Set eps=0.7 and minBoxes for enable-dbscan=1
eps=0.2
#minBoxes=3
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

This was copy-pasted directly from Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation

There might be something wrong in line 69 or 70. Could you double check

  1. the etlt model is available at that link
  2. the key is correct and it is the exact one which is used in generating etlt model
  3. there should be no additional space character in the end of line 70. Otherwise,if your key is “1234”, then the wrong key “1234 ” is not expected.
  4. try to generate trt engine directly, then config it into config-file. Comment out line 69 and 70 , then add a new line .For example,
model-engine-file=./experiment_dir_final/detectnet_int8.engine

I did double check the model and key. The key cannot be wrong as I still have to TLT session up and running. And I also checked for additional spaces.
The problem with trt is the converter also did not work (Jetson Nano) - same error as (https://devtalk.nvidia.com/default/topic/1065680/transfer-learning-toolkit/tlt-converter-uff-parser-error/#reply).
Could it be a problem with exporting? Can I export it into an engine file directly without using trt-converter? What are the formats in which the model can be exported(tlt-export)?

You can generate trt engine directly with tlt-converter,and then config it into config-file to narrow down the issue as I mentioned in previous comment (4).
You can find the deploy process clearly in https://devtalk.nvidia.com/default/topic/1065558/transfer-learning-toolkit/trt-engine-deployment/.

Hi sivaishere96,
If possible, you can send an email to me and attach the etlt model link and your key.
I would like to test it on my side.

Hi Morgan. I have mailed you the files. Kindly do the needful.

Hi sivaishere,
With your etlt model and key, I can reproduce your issue.
Can your double check or try again the generation of etlt model? Would you please paste the full log of “tlt-export” too?

I will try running the training again and generate a new model. Meanwhile this is the output log for the tlt-export command on the model I sent you.

!tlt-export $USER_EXPERIMENT_DIR/experiment_dir_unpruned/weights/resnet18_detector.tlt  \
            -o $USER_EXPERIMENT_DIR/experiment_dir_final/resnet18_detector.etlt \
            --outputs output_cov/Sigmoid,output_bbox/BiasAdd \
            --enc_key $KEY \
            --input_dims 3,400,480 \
            --export_module detectnet_v2
Using TensorFlow backend.
2019-10-31 05:44:14,748 [INFO] iva.common.magnet_export: Loading model from /workspace/tlt-experiments/experiment_dir_unpruned/weights/resnet18_detector.tlt
2019-10-31 05:44:14.748825: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 05:44:14.871237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-31 05:44:14.871971: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7ddff10 executing computations on platform CUDA. Devices:
2019-10-31 05:44:14.871992: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2019-10-31 05:44:14.897255: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600185000 Hz
2019-10-31 05:44:14.898246: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7e49e30 executing computations on platform Host. Devices:
2019-10-31 05:44:14.898303: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-10-31 05:44:14.898801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:10:00.0
totalMemory: 10.73GiB freeMemory: 10.12GiB
2019-10-31 05:44:14.898829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 05:44:20.200888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 05:44:20.200948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 05:44:20.200961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 05:44:20.201405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9797 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:10:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 05:44:20,643 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-10-31 05:44:28.738623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 05:44:28.738691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 05:44:28.738704: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 05:44:28.738714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 05:44:28.738983: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9797 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:10:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-10-31 05:44:30,068 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:249: __init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-10-31 05:44:30,887 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:127: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
2019-10-31 05:44:31.167039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-10-31 05:44:31.167107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 05:44:31.167117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-10-31 05:44:31.167125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-10-31 05:44:31.167379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9797 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:10:00.0, compute capability: 7.5)
INFO:tensorflow:Restoring parameters from /tmp/tmpLlGFax.ckpt
2019-10-31 05:44:31,310 [INFO] tensorflow: Restoring parameters from /tmp/tmpLlGFax.ckpt
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
2019-10-31 05:44:31,567 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2019-10-31 05:44:31,567 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
INFO:tensorflow:Froze 130 variables.
2019-10-31 05:44:31,688 [INFO] tensorflow: Froze 130 variables.
INFO:tensorflow:Converted 130 variables to const ops.
2019-10-31 05:44:31,732 [INFO] tensorflow: Converted 130 variables to const ops.
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
2019-10-31 05:44:32,848 [INFO] iva.common.magnet_export: Converted model was saved into /workspace/tlt-experiments/experiment_dir_final/resnet18_detector.etlt
2019-10-31 05:44:32,848 [INFO] iva.common.magnet_export: Input node: input_1
2019-10-31 05:44:32,848 [INFO] iva.common.magnet_export: Output node(s): ['output_cov/Sigmoid', 'output_bbox/BiasAdd']

Hi Morgan, the problem was with the first cell of the TLT training .ipynb file.The KEY environment variable was set to $KEY by default and I didn’t realise as I didn’t encounter any errors while training and exporting. Now going back and generating again like you said, I realised that we have to set the KEY environment variable every time we run the docker. Now it works properly. Thank you for your time and efforts.A small suggestion would be to make TLT throw an error when $KEY is not set.

1 Like

Hi sivaishere96,
Glad to know you solve the problem! Thanks very much for using TLT. I will sync with internal team about your suggestion.