depthwise convolution is very slow using tensorrt3.0
I convert a tensorflow mobilenet model to UFF and profile it on tx2 using tensorrt3.0. The layer time is : Why the depthwise_conv(Unnamed Layer* 0) 0.197ms MobileNet/conv_1/BiasAdd + MobileNet/conv_1/batch_norm/Relu 0.427ms MobileNet/conv_ds_2/depthwise_conv/BiasAdd + MobileNet/conv_ds_2/dw_batch_norm/Relu 8.436ms MobileNet/conv_ds_2/pointwise_conv/BiasAdd + MobileNet/conv_ds_2/pw_batch_norm/Relu 0.731ms (Unnamed Layer* 13) 0.850ms MobileNet/conv_ds_3/depthwise_conv/BiasAdd + MobileNet/conv_ds_3/dw_batch_norm/Relu 4.350ms MobileNet/conv_ds_3/pointwise_conv/BiasAdd + MobileNet/conv_ds_3/pw_batch_norm/Relu 0.502ms MobileNet/conv_ds_4/depthwise_conv/BiasAdd + MobileNet/conv_ds_4/dw_batch_norm/Relu 8.709ms MobileNet/conv_ds_4/pointwise_conv/BiasAdd + MobileNet/conv_ds_4/pw_batch_norm/Relu 0.814ms (Unnamed Layer* 30) 0.430ms MobileNet/conv_ds_5/depthwise_conv/BiasAdd + MobileNet/conv_ds_5/dw_batch_norm/Relu 2.753ms MobileNet/conv_ds_5/pointwise_conv/BiasAdd + MobileNet/conv_ds_5/pw_batch_norm/Relu 0.417ms MobileNet/conv_ds_6/depthwise_conv/BiasAdd + MobileNet/conv_ds_6/dw_batch_norm/Relu 5.513ms MobileNet/conv_ds_6/pointwise_conv/BiasAdd + MobileNet/conv_ds_6/pw_batch_norm/Relu 0.751ms (Unnamed Layer* 47) 0.226ms MobileNet/conv_ds_7/depthwise_conv/BiasAdd + MobileNet/conv_ds_7/dw_batch_norm/Relu 2.938ms MobileNet/conv_ds_7/pointwise_conv/BiasAdd + MobileNet/conv_ds_7/pw_batch_norm/Relu 0.433ms MobileNet/conv_ds_8/depthwise_conv/BiasAdd + MobileNet/conv_ds_8/dw_batch_norm/Relu 5.846ms MobileNet/conv_ds_8/pointwise_conv/BiasAdd + MobileNet/conv_ds_8/pw_batch_norm/Relu 0.796ms MobileNet/conv_ds_9/depthwise_conv/BiasAdd + MobileNet/conv_ds_9/dw_batch_norm/Relu 4.293ms MobileNet/conv_ds_9/pointwise_conv/BiasAdd + MobileNet/conv_ds_9/pw_batch_norm/Relu 0.786ms MobileNet/conv_ds_10/depthwise_conv/BiasAdd + MobileNet/conv_ds_10/dw_batch_norm/Relu 4.885ms MobileNet/conv_ds_10/pointwise_conv/BiasAdd + MobileNet/conv_ds_10/pw_batch_norm/Relu 0.787ms MobileNet/conv_ds_11/depthwise_conv/BiasAdd + MobileNet/conv_ds_11/dw_batch_norm/Relu 5.855ms MobileNet/conv_ds_11/pointwise_conv/BiasAdd + MobileNet/conv_ds_11/pw_batch_norm/Relu 0.748ms MobileNet/conv_ds_12/depthwise_conv/BiasAdd + MobileNet/conv_ds_12/dw_batch_norm/Relu 4.874ms MobileNet/conv_ds_12/pointwise_conv/BiasAdd + MobileNet/conv_ds_12/pw_batch_norm/Relu 0.791ms (Unnamed Layer* 96) 0.118ms MobileNet/conv_ds_13/depthwise_conv/BiasAdd + MobileNet/conv_ds_13/dw_batch_norm/Relu 5.715ms MobileNet/conv_ds_13/pointwise_conv/BiasAdd + MobileNet/conv_ds_13/pw_batch_norm/Relu 0.502ms MobileNet/conv_ds_14/depthwise_conv/BiasAdd 10.942ms Time over all layers: 85.414 Why is the depthwise conv cost so much time?
I convert a tensorflow mobilenet model to UFF and profile it on tx2 using tensorrt3.0.
The layer time is :


Why the depthwise_conv(Unnamed Layer* 0) 0.197ms
MobileNet/conv_1/BiasAdd + MobileNet/conv_1/batch_norm/Relu 0.427ms
MobileNet/conv_ds_2/depthwise_conv/BiasAdd + MobileNet/conv_ds_2/dw_batch_norm/Relu 8.436ms
MobileNet/conv_ds_2/pointwise_conv/BiasAdd + MobileNet/conv_ds_2/pw_batch_norm/Relu 0.731ms
(Unnamed Layer* 13) 0.850ms
MobileNet/conv_ds_3/depthwise_conv/BiasAdd + MobileNet/conv_ds_3/dw_batch_norm/Relu 4.350ms
MobileNet/conv_ds_3/pointwise_conv/BiasAdd + MobileNet/conv_ds_3/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_4/depthwise_conv/BiasAdd + MobileNet/conv_ds_4/dw_batch_norm/Relu 8.709ms
MobileNet/conv_ds_4/pointwise_conv/BiasAdd + MobileNet/conv_ds_4/pw_batch_norm/Relu 0.814ms
(Unnamed Layer* 30) 0.430ms
MobileNet/conv_ds_5/depthwise_conv/BiasAdd + MobileNet/conv_ds_5/dw_batch_norm/Relu 2.753ms
MobileNet/conv_ds_5/pointwise_conv/BiasAdd + MobileNet/conv_ds_5/pw_batch_norm/Relu 0.417ms
MobileNet/conv_ds_6/depthwise_conv/BiasAdd + MobileNet/conv_ds_6/dw_batch_norm/Relu 5.513ms
MobileNet/conv_ds_6/pointwise_conv/BiasAdd + MobileNet/conv_ds_6/pw_batch_norm/Relu 0.751ms
(Unnamed Layer* 47) 0.226ms
MobileNet/conv_ds_7/depthwise_conv/BiasAdd + MobileNet/conv_ds_7/dw_batch_norm/Relu 2.938ms
MobileNet/conv_ds_7/pointwise_conv/BiasAdd + MobileNet/conv_ds_7/pw_batch_norm/Relu 0.433ms
MobileNet/conv_ds_8/depthwise_conv/BiasAdd + MobileNet/conv_ds_8/dw_batch_norm/Relu 5.846ms
MobileNet/conv_ds_8/pointwise_conv/BiasAdd + MobileNet/conv_ds_8/pw_batch_norm/Relu 0.796ms
MobileNet/conv_ds_9/depthwise_conv/BiasAdd + MobileNet/conv_ds_9/dw_batch_norm/Relu 4.293ms
MobileNet/conv_ds_9/pointwise_conv/BiasAdd + MobileNet/conv_ds_9/pw_batch_norm/Relu 0.786ms
MobileNet/conv_ds_10/depthwise_conv/BiasAdd + MobileNet/conv_ds_10/dw_batch_norm/Relu 4.885ms
MobileNet/conv_ds_10/pointwise_conv/BiasAdd + MobileNet/conv_ds_10/pw_batch_norm/Relu 0.787ms
MobileNet/conv_ds_11/depthwise_conv/BiasAdd + MobileNet/conv_ds_11/dw_batch_norm/Relu 5.855ms
MobileNet/conv_ds_11/pointwise_conv/BiasAdd + MobileNet/conv_ds_11/pw_batch_norm/Relu 0.748ms
MobileNet/conv_ds_12/depthwise_conv/BiasAdd + MobileNet/conv_ds_12/dw_batch_norm/Relu 4.874ms
MobileNet/conv_ds_12/pointwise_conv/BiasAdd + MobileNet/conv_ds_12/pw_batch_norm/Relu 0.791ms
(Unnamed Layer* 96) 0.118ms
MobileNet/conv_ds_13/depthwise_conv/BiasAdd + MobileNet/conv_ds_13/dw_batch_norm/Relu 5.715ms
MobileNet/conv_ds_13/pointwise_conv/BiasAdd + MobileNet/conv_ds_13/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_14/depthwise_conv/BiasAdd 10.942ms
Time over all layers: 85.414
Why is the depthwise conv cost so much time?

#1
Posted 11/01/2017 03:43 AM   
Hi, May I know which data format do you use? NCHW or NHWC? Thanks.
Hi,

May I know which data format do you use? NCHW or NHWC?

Thanks.

#2
Posted 11/01/2017 07:36 AM   
[quote="AastaLLL"]Hi, May I know which data format do you use? NCHW or NHWC? Thanks.[/quote] Hi AastaLLL, tensorflow depthwise conv API only supports NHWC. I use NHWC data format. Thanks
AastaLLL said:Hi,

May I know which data format do you use? NCHW or NHWC?

Thanks.

Hi AastaLLL,

tensorflow depthwise conv API only supports NHWC. I use NHWC data format.

Thanks

#3
Posted 11/01/2017 08:16 AM   
Hi, Currently, separable convolution is implemented with groups=C + conv1x1, and it's not efficient enough. We're looking at the possibility to optimize general convolution groups. But we can't provide any firm commitments or estimates at this time. Thanks and sorry for the inconvenience.
Hi,

Currently, separable convolution is implemented with groups=C + conv1x1, and it's not efficient enough.
We're looking at the possibility to optimize general convolution groups. But we can't provide any firm commitments or estimates at this time.

Thanks and sorry for the inconvenience.

#4
Posted 11/02/2017 06:53 AM   
Scroll To Top

Add Reply