tlt-infer with SSD fails with IOError: Unable to open file (File signature not found)

I am trying to run inferencing using an SSD model that I’ve trained on another machine. I have copied the training spec file and the model weights onto the new machine. When I run tlt-infer I get an error regarding an unopened file (seee below). The error message is so cryptic that I have no clue as to what the problem is or how to solve it. Can anyone here suggest what’s the issue and/or a workaround?

# tlt-infer ssd  -i ../ssd_20191125/inference_test/test_images/weapons -o inference_test/results/weapons -e specs/ssd_resnet10_train.txt -m output/weights/ssd_resnet10_epoch_400.tlt -k ${NGC_API_KEY}
Using TensorFlow backend.
2019-12-05 17:35:30,481 [INFO] iva.ssd.scripts.inference: Loading experiment spec at specs/ssd_resnet10_train.txt.
2019-12-05 17:35:30,482 [INFO] /usr/local/lib/python2.7/dist-packages/iva/ssd/utils/spec_loader.pyc: Merging specification from specs/ssd_resnet10_train.txt
Traceback (most recent call last):
  File "/usr/local/bin/tlt-infer", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_infer.py", line 32, in main
  File "./ssd/scripts/inference.py", line 173, in main
  File "./ssd/scripts/inference.py", line 85, in inference
  File "./ssd/utils/model_io.py", line 58, in load_model
  File "./ssd/utils/model_io.py", line 43, in load_model
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/saving.py", line 417, in load_model
    f = h5dict(filepath, 'r')
  File "/usr/local/lib/python2.7/dist-packages/keras/utils/io_utils.py", line 186, in __init__
    self.data = h5py.File(path, mode=mode)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 272, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 92, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2684)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-4rPeHA-build/h5py/_objects.c:2642)
  File "h5py/h5f.pyx", line 76, in h5py.h5f.open (/tmp/pip-4rPeHA-build/h5py/h5f.c:1930)
IOError: Unable to open file (File signature not found)

The issue turned out to be that the model was trained using a different NGC API key.

How can I submit issues like this to NVIDIA so that this sort of thing is addressed in future releases? The usability of TLT could stand some real facelifts, and this sort of cryptic error reporting is a perfect example. For example, it seems elementary in cases like this to just write a log message explaining to the user that the file failed to open due to an incompatible NGC API key, how hard is that? This is why I hate working with closed source software, sorry to gripe…

Hi monocongo,
Sorry for the inconvenient. I am on the way to generate an FAQ to cover these kinds of common issues.
I also suggest end user to search in tlt forum with the key word.
Just click the “search” button on the top right, type “File signature not found” and select tlt forum to search, then two topics are listed, which will give you some tips.

https://devtalk.nvidia.com/default/topic/1063996/transfer-learning-toolkit/tlt-prune-error-ioerror-invalid-decryption-unable-to-open-file-file-signature-not-found-/post/5390072/#5390072

https://devtalk.nvidia.com/default/topic/1064436/transfer-learning-toolkit/ioerror-invalid-decryption-unable-to-open-file-file-signature-not-found-tlt-prune-command/post/5394181/#5394181