TensorRT - Deconvolution layer slow inference

Hi everyone!
My team is working with DRIVE PX platform. Currently we are training semantic segmentation model.
Recently we have encountered a problem with optimizing caffe model using TensorRT 3.0.2.
Generally inference performance of our network is decent however one layer seems to work extremely slowly.

Here is the output from network profiler:

[GIE]  layer shift - 0.543744 ms
[GIE]  layer conv1 + relu1 - 0.949248 ms
[GIE]  layer pool1 - 0.075776 ms
[GIE]  layer norm1 - 0.041984 ms
[GIE]  layer conv2 + relu2 - 0.771072 ms
[GIE]  layer pool2 - 0.052224 ms
[GIE]  layer norm2 - 0.083968 ms
[GIE]  layer conv3 + relu3 - 0.348160 ms
[GIE]  layer conv4 + relu4 - 0.286720 ms
[GIE]  layer conv5 + relu5 - 0.212992 ms
[GIE]  layer pool5 - 0.020480 ms
[GIE]  layer fc6 + relu6 - 4.219904 ms
[GIE]  layer fc7 + relu7 - 1.878016 ms
[GIE]  layer score_fr - 0.335872 ms
[GIE]  layer upscore - 6728.174805 ms
[GIE]  layer network time - 6737.995117 ms

The definition of upscore layer:

layer {
  name: "upscore"
  type: "Deconvolution"
  bottom: "score_fr"
  top: "upscore"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 21
    group: 21
    bias_term: false
    kernel_size: 63
    stride: 32
    weight_filler: { type: "bilinear" }
  }
}

The discrepancy between inference times among layers seems to us very confusing. What is more, the last layer involves even less computation than other ones. Maybe it is some kind of bug inside TensorRT?
We would really appreciate any suggestion what could be possible reason of this situation, because this issue basically stops our development process.

Dear peczek,
Can you please raise a bug with the steps to reproduce it on our end.
Please login to https://developer.nvidia.com/drive with your credentials. Please check MyAccount->MyBugs->Submit a new bug to file bug. Please share Bug ID here to follow up.

Dear peczek,
Thank you for reporting it. We will update you once it is fixed

Also this happens on TensorRT 4.1, tested on a 1080Ti with the jetson-inference code with a FCN-Alexnet network, the layer is the following:

layer {
  name: "upscore"
  type: "Deconvolution"
  bottom: "score_fr"
  top: "upscore"
  param {
    lr_mult: 0.0
  }
  convolution_param {
    num_output: 21
    bias_term: false
    kernel_size: 63
    group: 21
    stride: 32
    weight_filler {
      type: "bilinear"
    }
  }
}

With the following times:

[TRT]  layer shift - 0.919552 ms
[TRT]  layer conv1 + relu1 - 1.481728 ms
[TRT]  layer pool1 - 0.116832 ms
[TRT]  layer norm1 - 0.056224 ms
[TRT]  layer conv2 + relu2 - 1.145856 ms
[TRT]  layer pool2 - 0.087520 ms
[TRT]  layer norm2 - 0.097824 ms
[TRT]  layer conv3 + relu3 - 0.546816 ms
[TRT]  layer conv4 + relu4 - 0.425984 ms
[TRT]  layer conv5 + relu5 - 0.280576 ms
[TRT]  layer pool5 - 0.027648 ms
[TRT]  layer fc6 + relu6 - 8.728576 ms
[TRT]  layer fc7 + relu7 - 3.900512 ms
[TRT]  layer score_fr - 0.097184 ms
[TRT]  layer upscore - 3986.665527 ms
[TRT]  layer network time - 4004.578369 ms
[TRT]  segNet::Overlay -- s_w 1343  s_h 767  s_c 21  s_x 1.049219  s_y 1.065278
[TRT]  segNet::Overlay -- ignoring class 'void' id=0

I’ll test on the JetsonTX2 over the weekend, in the case of TensorRT, where should we submit a bug?

Note that the input image was 1280x720p.

Dear bpinaya,
Yes it is reproducible on 4.1 too. To submit a bug, you need to login to https://developer.nvidia.com/drive with your credentials. Please check MyAccount->MyBugs->Submit a new bug to file bug. Please share Bug ID here to follow up

Dear SivaRamaKrishna,
I’ve submitted the bug (https://developer.nvidia.com/nvidia_bug/2337503), sorry about the description, it seems there is no formatting options, but it’s mainly what I’ve mentioned here.

Note that I also tested on the JetsonTX2 with Jetpack 3.2.1 and even with FP16 enabled I got about 3000ms for the deconv layer.

This morning I implemented my own code instead of using the jetson-inference but it seems that indeed the issue is relevant. Any updates are appreciated!

Hi there SivaRamaKrishna,
any updates on this one? I have a Jetson Xavier and will probably will test it out on TensorRT 5, but wanted to read the release notes before, can’t seem to find them anywhere.

Dear bpinaya,
TRT 5.0 does not have fix for this. we scoped it in next release.