Hi everyone!
My team is working with DRIVE PX platform. Currently we are training semantic segmentation model.
Recently we have encountered a problem with optimizing caffe model using TensorRT 3.0.2.
Generally inference performance of our network is decent however one layer seems to work extremely slowly.
Here is the output from network profiler:
[GIE] layer shift - 0.543744 ms
[GIE] layer conv1 + relu1 - 0.949248 ms
[GIE] layer pool1 - 0.075776 ms
[GIE] layer norm1 - 0.041984 ms
[GIE] layer conv2 + relu2 - 0.771072 ms
[GIE] layer pool2 - 0.052224 ms
[GIE] layer norm2 - 0.083968 ms
[GIE] layer conv3 + relu3 - 0.348160 ms
[GIE] layer conv4 + relu4 - 0.286720 ms
[GIE] layer conv5 + relu5 - 0.212992 ms
[GIE] layer pool5 - 0.020480 ms
[GIE] layer fc6 + relu6 - 4.219904 ms
[GIE] layer fc7 + relu7 - 1.878016 ms
[GIE] layer score_fr - 0.335872 ms
[GIE] layer upscore - 6728.174805 ms
[GIE] layer network time - 6737.995117 ms
The discrepancy between inference times among layers seems to us very confusing. What is more, the last layer involves even less computation than other ones. Maybe it is some kind of bug inside TensorRT?
We would really appreciate any suggestion what could be possible reason of this situation, because this issue basically stops our development process.
Dear peczek,
Can you please raise a bug with the steps to reproduce it on our end.
Please login to https://developer.nvidia.com/drive with your credentials. Please check MyAccount->MyBugs->Submit a new bug to file bug. Please share Bug ID here to follow up.
Dear bpinaya,
Yes it is reproducible on 4.1 too. To submit a bug, you need to login to https://developer.nvidia.com/drive with your credentials. Please check MyAccount->MyBugs->Submit a new bug to file bug. Please share Bug ID here to follow up
Dear SivaRamaKrishna,
I’ve submitted the bug (https://developer.nvidia.com/nvidia_bug/2337503), sorry about the description, it seems there is no formatting options, but it’s mainly what I’ve mentioned here.
Note that I also tested on the JetsonTX2 with Jetpack 3.2.1 and even with FP16 enabled I got about 3000ms for the deconv layer.
This morning I implemented my own code instead of using the jetson-inference but it seems that indeed the issue is relevant. Any updates are appreciated!
Hi there SivaRamaKrishna,
any updates on this one? I have a Jetson Xavier and will probably will test it out on TensorRT 5, but wanted to read the release notes before, can’t seem to find them anywhere.