Deepstream: How to build a working model

Hi,
I am hoping someone can give me some advice.

I have installed both Digits and DeepStream on my Linux system and have
managed to run all the DeepStream samples without any problems.

I have also created my own .h264 input file and that too works.

I then tried to build my own GoogleNet model using Digits, the model
appears to work inside Digits but when I try to use it with DeepStream
nothing happens. I get a few lines of Debug information sating it is using
FP32 then it stops, no errors no stack dumps nothing.

With the supplied model it outputs debug about using YUV420, with my
model it does not get that far.

I have noticed the first few lines of the supplied model deploy.txt file
and my one are different. My starts with a shape declaration whereas
the supplied model starts with a layer declaration.

As I am very new to this can someone tell me whether Digits is appropriate
for building models to be used in DeepStream? If so a link to an FAQ would
help or some suggestions as what parameters I should use in Digits when
creating the model. (I am using 224x224 jpegs to build the model).

Thanks for any help

  • Charles

FYI, My deploy file starts with this

input: “data”
input_shape {
dim: 1
dim: 3
dim: 224
dim: 224
}
layer {
name: “conv1/7x7_s2”
type: “Convolution”
bottom: “data”
top: “conv1/7x7_s2”
param {
lr_mult: 1.0
decay_mult: 1.0
}
param {
lr_mult: 2.0
decay_mult: 0.0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: “xavier”
std: 0.10000000149
}
bias_filler {
type: “constant”
value: 0.20000000298
}
}
}
//
// The sample deploy file starts with this
//
name: “GoogleNet”
layer {
name: “data”
type: “Input”
top: “data”
input_param { shape: { dim: 1 dim: 3 dim: 224 dim: 224 } }
}
layer {
name: “conv1/7x7_s2”
type: “Convolution”
bottom: “data”
top: “conv1/7x7_s2”
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 64
pad: 3
kernel_size: 7
stride: 2
weight_filler {
type: “xavier”
std: 0.1
}
bias_filler {
type: “constant”
value: 0.2
}
}
}

Well I changed the last layer in the deploy file from

softmax to be the same name as the original prob

before

layer {
name: “softmax”
type: “Softmax”
bottom: “loss3/classifier”
top: “softmax”
}

after

layer {
name: “prob”
type: “Softmax”
bottom: “loss3/classifier”
top: “prob”
}

It gets a bit further but now crashes with the following error

*** Error in `…/bin/sample_classification’: corrupted size vs. prev_size: 0x00007f20580058a0 ***

Since my first post I have watched 3 hours of NVIDIA tutorials and searched the forum, interesting but has not helped.

  • Charles

Hi,

Guess that you are facing a classification problem and run it with our NVDECINFER sample.
Before running the example, please make sure you have modified the sample to fit your custom model:

// Add inference task
std::string inputLayerName = "data";
std::vector<std::string > outputLayerNames(1, "prob");
...

You can find more information in our DeepStream document:
4.2.1.1 Adding a module into the analysis pipeline

Thanks.

Hi,

Thanks for the reply.

I tried that but it is still crashing.

The problem maybe due to a wrong version of libnvinfer.so though.
libdeepstream requires so.3 and my system is using so.4 however
the example in the SDK appears to work with the wrong library version.
I will try and sort that out later.

My Digits model does work with the python example in Digits github repository
it also works with classifier.cpp example in the Caffe github repository.

  • Charles

Hi,

We usually meet this error from EGL related use case.
Could you help us check if this error comes from TensorRT first?

cp -r /usr/src/tensorrt/ .
cd tensorrt/samples/
make
cd ../bin/
./giexec --deploy=/path/to/prototxt --output=/name/of/output

Thanks.

I did what you requested but there was no source directory so I just called the binary with
./giexec --deploy=…/…/model/deploy.prototxt --output=softmax

I wasn’t sure about the --output parameter

I got the following result.

deploy: …/…/model/deploy.prototxt
output: softmax
Input “data”: 3x224x224
Output “softmax”: 2x1x1
./giexec: symbol lookup error: /usr/lib/x86_64-linux-gnu/libnvinfer.so.4: undefined symbol: cudnnSetConvolutionGroupCount

I downloaded the .tar file and extracted the giexec example.

When I compile it I have to set cuda to 8.0, it then compiles
but I get the same error as before when I run it.

If I try to compile it with cuda-9.0 I get 2 linking errors.
/usr/lib/gcc/x86_64-linux-gnu/5/…/…/…/x86_64-linux-gnu/libnvinfer.so: undefined reference to cudnnSetConvolutionGroupCount' /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libnvinfer.so: undefined reference to cudnnGetConvolutionForwardAlgorithmMaxCount’

I just tried to compile it with cuda-9.1 and it compiles with a couple of warnings saying there is a conflict between using cublas8.0 and cublas 9.1

when I run it I get a core dump.

My computer updated itself recently to cuda-9.1 so I suspect I will have to wait until all the
other things are updated to work with cuda-9.1 rather than trying to set up all the correct versions of the libraries.

  • Charles

Hi,

TensorRT support CUDA 8.0 and CUDA 9.0, but not CUDA 9.1 yet.

Please make sure you have downloaded the corresponding package.
The links are different for the CUDA version.

If you are using Debian installer, it can upgrade TensorRT directly.
If you are with the tarball package, please check this commend to setup CUDA/cuDNN/TensorRT.

Thanks, and please let us know your results.