Volta AMI with DIGITS 6.0 Container - Can't Import Custom Models

Hello All,

First, as a disclaimer, this is technically a Jetson question (and I will post this as an issue on jetson-inference as well) but I’m asking about it here since it’s a problem I have specifically because we are trying to use the AWS Volta AMI + DIGITS 18.01 Container as our training platform.

After downloading a model snapshot from a successfully trained model, I’m trying to follow the instructions here (GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.) to deploy this DetectNet model onto the Jetson TX2. The Jetson TX2 was upgraded out of the box to the JetPack 3.2 developer preview that includes TensorRT 3.0.0-RC2 per the instructions here (http://docs.nvidia.com/jetpack-l4t/index.html). When trying to run the detectnet-console sample on radomly selected image from the original dataset I’m getting many errors at the “building CUDA engine” stage that look like:

[GIE] inception_3a/3x3: kernel weights has count 1 but 110592 was expected
[GIE] inception_3a/5x5_reduce: kernel weights has count 1 but 3072 was expected

etc …

Based on the several of the issues already created about this on the jetson-inference repo (#84 Can I run a LeNet model on Jetson? · Issue #84 · dusty-nv/jetson-inference · GitHub , #99 related to issue #84 · Issue #99 · dusty-nv/jetson-inference · GitHub , #123 Error during Loading Custom Models on Jetson · Issue #123 · dusty-nv/jetson-inference · GitHub ) it seems this is typically an issue of trying to import a model trained using NVCaffe 0.16+ (0.16.4 in my case) instead of 0.15.

One thing I don’t understand about that: I have TensorRT 3-RC2 installed, which according to the docs here ([url]http://developer2.download.nvidia.com/compute/machine-learning/tensorrt/secure/3.0/ga/TensorRT-Release-Notes-3.0.2.pdf?XZzFNq3ErTn_lkseUUzSgeGMrxRY4GHuPmGOug6OmtsRlEGVEkzXV9gbleHhSikBh6EWoaNA-a5VqnXQmUBNKaXulb6OsXVwX0VBfFumXZT5aWwlrBdaUrPIRvdVSwHPGZzLRribhgbANRzvhB4rSOYPK9PEBqGof3MuFoCMP619OjumRZPAJBJdvnwinQQ[/url]) should support NVCaffe 0.16 model parsing as of TensorRT 3-RC1. Is there a work around to this incompatibility right now, or could I somehow have the wrong version of TensorRT installed? Obviously there’s the option of standing up a computer with DIGITS 5 and NVCaffe 0.15 to retrain our model, but that completely defeats the purpose of using the Volta AMI as a scalable workflow.

This has me stuck dead in the water. Any recommendations or workarounds are very very appreciated!!

Thanks

  • R