TensorRT YOLO inference error

Hi,

I am currently trying to move some code from caffe framework to tensorRT (GIE) on JetsonTX1.

I installed the “JetPack 2.3.1 - L4T R24.2.1 released for Jetson TX1” and everything seems to be ok for TensorRT.

The code from https://github.com/dusty-nv/jetson-inference which use TensorRT works on the cards.

and the gie_sample “sampleMNISTGIE” from tensorRT package “nv-gie-repo-ubuntu1404-6-rc-cuda8.0_1.0.3-1_amd64.deb” also works.

I am now trying to make a neural net works with tensorRT : YOLO
the prototxt of the network is the following one : https://github.com/xingwangsfu/caffe-yolo/blob/master/prototxt/yolo_small_deploy.prototxt

I think that everything in this network is compatible with TensorRT.

I also have the associate caffemodel file to perform detection on images and everything together works, with the caffe framework.

With TensorRT I don’t have any errors but the output of the neural network is wrong.
The output of the very same image with caffe and tensorRT gives two output completely different.

Here is the code I use :

[code]

IBuilder* builder = createInferBuilder(gLogger);
const char* prototxt="yolo_small_deploy.prototxt";
const char* caffemodel="yolo_small.caffemodel";

// parse the caffe model to populate the network, then set the outputs and create an engine
//ICudaEngine* engine = createMNISTEngine(maxBatchSize, builder, DataType::kFLOAT);
INetworkDefinition* network = builder->createNetwork();
ICaffeParser *parser = createCaffeParser();
const IBlobNameToTensor *blobNameToTensor =parser->parse(prototxt,     // caffe deploy file
                             caffemodel,     // caffe model file
                             *network,              // network definition that parser populate
                             DataType::kFLOAT);

assert(blobNameToTensor != nullptr);
// the caffe file has no notion of outputs
// so we need to manually say which tensors the engine should generate
network->markOutput(*blobNameToTensor->find(OUTPUT_BLOB_NAME));
// Build the engine
builder->setMaxBatchSize(1);
builder->setMaxWorkspaceSize(16 << 20);//WORKSPACE_SIZE);

// Eliminate the side-effect from the delay of GPU frequency boost
builder->setMinFindIterations(3);
builder->setAverageFindIterations(2);

//build
ICudaEngine *engine = builder->buildCudaEngine(*network);

IExecutionContext *context = engine->createExecutionContext();

// run inference
float prob[OUTPUT_SIZE];

// input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
// of these, but in this case we know that there is exactly one input and one output.
assert(engine->getNbBindings() == 2);
void* buffers[2];

// In order to bind the buffers, we need to know the names of the input and output tensors.
// note that indices are guaranteed to be less than IEngine::getNbBindings()
int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME); 
int   outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);

// create GPU buffers and a stream
CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float)));
CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
// DMA the input to the GPU,  execute the batch asynchronously, and DMA it back:
CHECK(cudaMemcpyAsync(buffers[inputIndex], mInputCPU[0], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context->enqueue(BATCH_SIZE, buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(prob, buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE*sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);

// release the stream and the buffers
cudaStreamDestroy(stream);
CHECK(cudaFree(buffers[inputIndex]));
CHECK(cudaFree(buffers[outputIndex]));

// destroy the engine
context->destroy();
engine->destroy();

[\code]

The ouputs of the neural network with tensorRT are similar for different images here are the results for a cat and for a matrice of zeros.

https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection.jpg
https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/zeros_detection.jpg

The detection done when the neural network is run with caffe :

https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/true_detection.jpg

The complete code is on :

Do you have any Idea on what could possible go wrong.
It seems like there is a error in the conversion of the caffemodel which make the result wrong.
Thank you for your help :)

Hi,

Thanks for your question. We are investigating this issue now and will update to you later.

Hi,

Thanks for the question and sorry for our late reply.
We have clarified that this difference is caused by unsupported leaky relu layer of tensorRT.

Here is a WAR that use standard-relu+scale+eltwise to approximate leaky relu.
Results works with tuning leaky parameter to 0.08.

Could you give it a try?
Please remember to change threshold back to 0.2.

name: "YOLONet"
input: "data"
input_shape {
  dim: 1
  dim: 3
  dim: 448
  dim: 448
}

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 64
    kernel_size: 7
    pad: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "relu1"	
}
layer {
  name: "scale1"
  type: "Power"
  bottom: "conv1"
  top: "scale1"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise1"
  type: "Eltwise"
  bottom: "relu1"
  bottom: "scale1"
  top: "layer1"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "layer1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  convolution_param {
    num_output: 192
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "relu2"	
}
layer {
  name: "scale2"
  type: "Power"
  bottom: "conv2"
  top: "scale2"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise2"
  type: "Eltwise"
  bottom: "relu2"
  bottom: "scale2"
  top: "layer2"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "layer2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv3"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3"
  convolution_param {
    num_output: 128
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "relu3"	
}
layer {
  name: "scale3"
  type: "Power"
  bottom: "conv3"
  top: "scale3"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise3"
  type: "Eltwise"
  bottom: "relu3"
  bottom: "scale3"
  top: "layer3"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv4"
  type: "Convolution"
  bottom: "layer3"
  top: "conv4"
  convolution_param {
    num_output: 256
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "relu4"	
}
layer {
  name: "scale4"
  type: "Power"
  bottom: "conv4"
  top: "scale4"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise4"
  type: "Eltwise"
  bottom: "relu4"
  bottom: "scale4"
  top: "layer4"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv5"
  type: "Convolution"
  bottom: "layer4"
  top: "conv5"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "relu5"	
}
layer {
  name: "scale5"
  type: "Power"
  bottom: "conv5"
  top: "scale5"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise5"
  type: "Eltwise"
  bottom: "relu5"
  bottom: "scale5"
  top: "layer5"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv6"
  type: "Convolution"
  bottom: "layer5"
  top: "conv6"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "conv6"
  top: "relu6"	
}
layer {
  name: "scale6"
  type: "Power"
  bottom: "conv6"
  top: "scale6"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise6"
  type: "Eltwise"
  bottom: "relu6"
  bottom: "scale6"
  top: "layer6"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool6"
  type: "Pooling"
  bottom: "layer6"
  top: "pool6"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer{
  name: "conv7"
  type: "Convolution"
  bottom: "pool6"
  top: "conv7"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "conv7"
  top: "relu7"	
}
layer {
  name: "scale7"
  type: "Power"
  bottom: "conv7"
  top: "scale7"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise7"
  type: "Eltwise"
  bottom: "relu7"
  bottom: "scale7"
  top: "layer7"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv8"
  type: "Convolution"
  bottom: "layer7"
  top: "conv8"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu8"
  type: "ReLU"
  bottom: "conv8"
  top: "relu8"	
}
layer {
  name: "scale8"
  type: "Power"
  bottom: "conv8"
  top: "scale8"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise8"
  type: "Eltwise"
  bottom: "relu8"
  bottom: "scale8"
  top: "layer8"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv9"
  type: "Convolution"
  bottom: "layer8"
  top: "conv9"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu9"
  type: "ReLU"
  bottom: "conv9"
  top: "relu9"	
}
layer {
  name: "scale9"
  type: "Power"
  bottom: "conv9"
  top: "scale9"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise9"
  type: "Eltwise"
  bottom: "relu9"
  bottom: "scale9"
  top: "layer9"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv10"
  type: "Convolution"
  bottom: "layer9"
  top: "conv10"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu10"
  type: "ReLU"
  bottom: "conv10"
  top: "relu10"	
}
layer {
  name: "scale10"
  type: "Power"
  bottom: "conv10"
  top: "scale10"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise10"
  type: "Eltwise"
  bottom: "relu10"
  bottom: "scale10"
  top: "layer10"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv11"
  type: "Convolution"
  bottom: "layer10"
  top: "conv11"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu11"
  type: "ReLU"
  bottom: "conv11"
  top: "relu11"	
}
layer {
  name: "scale11"
  type: "Power"
  bottom: "conv11"
  top: "scale11"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise11"
  type: "Eltwise"
  bottom: "relu11"
  bottom: "scale11"
  top: "layer11"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv12"
  type: "Convolution"
  bottom: "layer11"
  top: "conv12"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu12"
  type: "ReLU"
  bottom: "conv12"
  top: "relu12"	
}
layer {
  name: "scale12"
  type: "Power"
  bottom: "conv12"
  top: "scale12"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise12"
  type: "Eltwise"
  bottom: "relu12"
  bottom: "scale12"
  top: "layer12"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv13"
  type: "Convolution"
  bottom: "layer12"
  top: "conv13"
  convolution_param {
    num_output: 256
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu13"
  type: "ReLU"
  bottom: "conv13"
  top: "relu13"	
}
layer {
  name: "scale13"
  type: "Power"
  bottom: "conv13"
  top: "scale13"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise13"
  type: "Eltwise"
  bottom: "relu13"
  bottom: "scale13"
  top: "layer13"
  eltwise_param {
    operation: SUM
  }
}
layer{
  name: "conv14"
  type: "Convolution"
  bottom: "layer13"
  top: "conv14"
  convolution_param {
    num_output: 512
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu14"
  type: "ReLU"
  bottom: "conv14"
  top: "relu14"	
}
layer {
  name: "scale14"
  type: "Power"
  bottom: "conv14"
  top: "scale14"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise14"
  type: "Eltwise"
  bottom: "relu14"
  bottom: "scale14"
  top: "layer14"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv15"
  type: "Convolution"
  bottom: "layer14"
  top: "conv15"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu15"
  type: "ReLU"
  bottom: "conv15"
  top: "relu15"	
}
layer {
  name: "scale15"
  type: "Power"
  bottom: "conv15"
  top: "scale15"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise15"
  type: "Eltwise"
  bottom: "relu15"
  bottom: "scale15"
  top: "layer15"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv16"
  type: "Convolution"
  bottom: "layer15"
  top: "conv16"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu16"
  type: "ReLU"
  bottom: "conv16"
  top: "relu16"	
}
layer {
  name: "scale16"
  type: "Power"
  bottom: "conv16"
  top: "scale16"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise16"
  type: "Eltwise"
  bottom: "relu16"
  bottom: "scale16"
  top: "layer16"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "pool16"
  type: "Pooling"
  bottom: "layer16"
  top: "pool16"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}


layer{
  name: "conv17"
  type: "Convolution"
  bottom: "pool16"
  top: "conv17"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu17"
  type: "ReLU"
  bottom: "conv17"
  top: "relu17"	
}
layer {
  name: "scale17"
  type: "Power"
  bottom: "conv17"
  top: "scale17"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise17"
  type: "Eltwise"
  bottom: "relu17"
  bottom: "scale17"
  top: "layer17"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv18"
  type: "Convolution"
  bottom: "layer17"
  top: "conv18"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu18"
  type: "ReLU"
  bottom: "conv18"
  top: "relu18"	
}
layer {
  name: "scale18"
  type: "Power"
  bottom: "conv18"
  top: "scale18"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise18"
  type: "Eltwise"
  bottom: "relu18"
  bottom: "scale18"
  top: "layer18"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv19"
  type: "Convolution"
  bottom: "layer18"
  top: "conv19"
  convolution_param {
    num_output: 512
    kernel_size: 1
    pad: 0
    stride: 1
  }
}
layer {
  name: "relu19"
  type: "ReLU"
  bottom: "conv19"
  top: "relu19"	
}
layer {
  name: "scale19"
  type: "Power"
  bottom: "conv19"
  top: "scale19"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise19"
  type: "Eltwise"
  bottom: "relu19"
  bottom: "scale19"
  top: "layer19"
  eltwise_param {
    operation: SUM
  }
}



layer{
  name: "conv20"
  type: "Convolution"
  bottom: "layer19"
  top: "conv20"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu20"
  type: "ReLU"
  bottom: "conv20"
  top: "relu20"	
}
layer {
  name: "scale20"
  type: "Power"
  bottom: "conv20"
  top: "scale20"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise20"
  type: "Eltwise"
  bottom: "relu20"
  bottom: "scale20"
  top: "layer20"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv21"
  type: "Convolution"
  bottom: "layer20"
  top: "conv21"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu21"
  type: "ReLU"
  bottom: "conv21"
  top: "relu21"	
}
layer {
  name: "scale21"
  type: "Power"
  bottom: "conv21"
  top: "scale21"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise21"
  type: "Eltwise"
  bottom: "relu21"
  bottom: "scale21"
  top: "layer21"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv22"
  type: "Convolution"
  bottom: "layer21"
  top: "conv22"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 2
  }
}
layer {
  name: "relu22"
  type: "ReLU"
  bottom: "conv22"
  top: "relu22"	
}
layer {
  name: "scale22"
  type: "Power"
  bottom: "conv22"
  top: "scale22"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise22"
  type: "Eltwise"
  bottom: "relu22"
  bottom: "scale22"
  top: "layer22"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "conv23"
  type: "Convolution"
  bottom: "layer22"
  top: "conv23"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu23"
  type: "ReLU"
  bottom: "conv23"
  top: "relu23"	
}
layer {
  name: "scale23"
  type: "Power"
  bottom: "conv23"
  top: "scale23"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise23"
  type: "Eltwise"
  bottom: "relu23"
  bottom: "scale23"
  top: "layer23"
  eltwise_param {
    operation: SUM
  }
}

layer{
  name: "conv24"
  type: "Convolution"
  bottom: "layer23"
  top: "conv24"
  convolution_param {
    num_output: 1024
    kernel_size: 3
    pad: 1
    stride: 1
  }
}
layer {
  name: "relu24"
  type: "ReLU"
  bottom: "conv24"
  top: "relu24"	
}
layer {
  name: "scale24"
  type: "Power"
  bottom: "conv24"
  top: "scale24"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise24"
  type: "Eltwise"
  bottom: "relu24"
  bottom: "scale24"
  top: "layer24"
  eltwise_param {
    operation: SUM
  }
}




layer{
  name: "fc25"
  type: "InnerProduct"
  bottom: "layer24"
  top: "fc25"
  inner_product_param {
    num_output: 512
  }
}
layer {
  name: "relu25"
  type: "ReLU"
  bottom: "fc25"
  top: "relu25"	
}
layer {
  name: "scale25"
  type: "Power"
  bottom: "fc25"
  top: "scale25"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise25"
  type: "Eltwise"
  bottom: "relu25"
  bottom: "scale25"
  top: "layer25"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "fc26"
  type: "InnerProduct"
  bottom: "layer25"
  top: "fc26"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu26"
  type: "ReLU"
  bottom: "fc26"
  top: "relu26"	
}
layer {
  name: "scale26"
  type: "Power"
  bottom: "fc26"
  top: "scale26"
  power_param {
    scale: 0.08
  }
}

layer {
  name: "eltwise26"
  type: "Eltwise"
  bottom: "relu26"
  bottom: "scale26"
  top: "layer26"
  eltwise_param {
    operation: SUM
  }
}


layer{
  name: "fc27"
  type: "InnerProduct"
  bottom: "layer26"
  top: "result"
  inner_product_param {
    num_output: 1470
  }
}

Hi AastaLLL,

thank you for your answer!
I experimented the solution you proposed and the results with the 32 bits version of tensorRT are close to the results with caffe. ( for example https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection_modified_32bits.jpg ).
However the solution with the 16 bits version gives wrong results :
Cat : https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection_modified_16bits.jpg
Matrice of zeros : https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/zeros_detection_modified_16bits.jpg

Thank you very much for your help, do you think there is anything I can do to make the 16 bits version works?

The code I used to change the mode from 32bits to 16 is :

#NB  : "builder->platformHasFastFp16()" return true

INetworkDefinition* network = builder->createNetwork();
ICaffeParser *parser = createCaffeParser();
const IBlobNameToTensor *blobNameToTensor =
			parser->parse(prototxt,		// caffe deploy file
			caffemodel,		// caffe model file
			*network,		// network definition that the parser will populate
			nvinfer1::DataType::kHALF);
[...]

builder->setHalf2Mode(true);

Hi,

Thanks for your feedback.
We are investigating fp16 issue and will update to you later.

Hi,

Thank you for your patient.
We found that YOLO are quite sensitive to the networks precision and need to debug it more.

Sorry for keeping you waiting but we still need some time to figure out the root cause.
Thanks.

Hi, I found there is a bug in your code, you should change the WrapInputLayer2Bgr to

void WrapInputLayer2Bgr(std::vector<std::vectorcv::Mat >& input_channels,float* buffer) {

float* input_data = buffer;

for (int n = 0; n < BATCH_SIZE; ++n) {
	input_channels.push_back(std::vector<cv::Mat>());
	for (int i = 0; i < 3; ++i) {
		if (i == 0)
			input_data = buffer + 2 * INPUT_H * INPUT_W;
		else if (i == 1)
			input_data = buffer + INPUT_H * INPUT_W;
		else
			input_data = buffer;
		cv::Mat channel(INPUT_H, INPUT_W, CV_32FC1, input_data);
		input_channels[n].push_back(channel);
		//input_data += INPUT_H * INPUT_W;
	}
}

}
and change this line in function ‘Preprocess’ from
if (_rescaleTo01)
sample_float = sample_float / 255.f;

to

if (_rescaleTo01)
	sample_float = sample_float / 127.5 - 1;

then the result will be near with the darknet/yolo.
If you want to the result is exactly match the darknet/yolo, you can add a active function to implement the fix-relu(negative slope: 0.08, positive slope: 1.08 ) used in the tensorrt. and change the weights file to caffemodel.
but I dont know why should change the scope of input image from [0.0, 1.0] to [-1.0, 1.0], could anyone show me the reason?

BTW: I think YOLO are not sensitive to the networks precision, if you use the caffemodel produced by those steps, you will found the results of fp16 mode and fp32 mode are almost exactly match. I have made a ‘pseudo-fp16 darknet’ to generate the fp16 weights file and I found the result is hardly to tell with fp32 weights file.

Sorry for the mistake in last post, it should be "you should change the function WrapInputLayer to

WrapInputLayer2Bgr"

Is there any update on this topic?
Where can I find the user guide and examples of tensorRT 2.1?

Hi,

TensorRT2.1 is available with JetPack3.1.

This sample is based on TensorRT1.0. There are some API changes, but not much.

Hi,

Is there any update about fp16 issue?

Thanks.

Hi,

We found YOLO is pretty sensitive to the precision and fp16 mode will slightly lower the output precision.
If you want fp16 acceleration, it’s recommended to train YOLO model directly on the fp16 mode.

Thanks.

HI,@TLESORT
I was wondering what’s the fps results when using TensorRT FP32 and FP16 on YOLO V2?

I have the same question! Has anyone worked on it yet?

Hi, bhargavK

Could you share more information about your question with us?

YOLO can run correctly with TensorRT on float mode.
Have you tested it?

Thanks.

Hi AstaLLL,

Thanks for your reply. I haven’t tested it yet. I was more curious about its performance.

I will test it soon and report the FPS if I get the time, otherwise, I will keep using DetectNet for now.

Hi all:
I am trying to run Tiny Yolo version 2 with tensorRT optimization. i am giving input image in BGR format and values in range [0 to 1]. I have approximated leaky Relu with Relu+scale+eltwise operation. I am getting output at second last layer which is a convolution layer and its output size is 12x12x125 tensor. I have implemented the last detection layer in python seperately.
Everything is working fine using caffe but tensorRT is not giving correct output or may be i am interpreting output wrongly.

As i am getting output at second last layer, tensorRT gives me output in NCHW linearized array which is of size=1x125x12x12=18000.

I take this output from TensorRT and reshape it as 125x12x12 and send it to my python implemented detection layer. i am not getting correct results but in caffe implementations i am getting correct results.

Please tell me what I am doing wrong, either giving input in wrong format or getting output in wrong format.

Thanks in advance…

I have been getting the same error with boxes everywhere, even with suggested scale and eltwise. I would also like to ask if the scaling to 0.08 means anything in particular, should it be changed to adapt to different model?

I am using a tensorflow model, with tiny YOLOv2. The implementation is from the repo basic-yolo-keras.

Hi,

The suggestion is posed one year ago and it is not update-to-date.
Could you create a new topic and explain your issue in detail?

Thanks.

I have Jetson Tx1 development kit opened the box but not used. I had bought for a Project but the Project got cancelled. So I was not able to use this Kit. I am now selling this development kit of Rs 48000/-. (negotiable). Please contact me on email id as.k231216@gmail.com. You guys can call me or mail me for the recent pictures of the kit. It will be helpful if anyone needs it.