depthwise convolution is very slow using tensorrt3.0
I convert a tensorflow mobilenet model to UFF and profile it on tx2 using tensorrt3.0. The layer time is : Why the depthwise_conv(Unnamed Layer* 0) 0.197ms MobileNet/conv_1/BiasAdd + MobileNet/conv_1/batch_norm/Relu 0.427ms MobileNet/conv_ds_2/depthwise_conv/BiasAdd + MobileNet/conv_ds_2/dw_batch_norm/Relu 8.436ms MobileNet/conv_ds_2/pointwise_conv/BiasAdd + MobileNet/conv_ds_2/pw_batch_norm/Relu 0.731ms (Unnamed Layer* 13) 0.850ms MobileNet/conv_ds_3/depthwise_conv/BiasAdd + MobileNet/conv_ds_3/dw_batch_norm/Relu 4.350ms MobileNet/conv_ds_3/pointwise_conv/BiasAdd + MobileNet/conv_ds_3/pw_batch_norm/Relu 0.502ms MobileNet/conv_ds_4/depthwise_conv/BiasAdd + MobileNet/conv_ds_4/dw_batch_norm/Relu 8.709ms MobileNet/conv_ds_4/pointwise_conv/BiasAdd + MobileNet/conv_ds_4/pw_batch_norm/Relu 0.814ms (Unnamed Layer* 30) 0.430ms MobileNet/conv_ds_5/depthwise_conv/BiasAdd + MobileNet/conv_ds_5/dw_batch_norm/Relu 2.753ms MobileNet/conv_ds_5/pointwise_conv/BiasAdd + MobileNet/conv_ds_5/pw_batch_norm/Relu 0.417ms MobileNet/conv_ds_6/depthwise_conv/BiasAdd + MobileNet/conv_ds_6/dw_batch_norm/Relu 5.513ms MobileNet/conv_ds_6/pointwise_conv/BiasAdd + MobileNet/conv_ds_6/pw_batch_norm/Relu 0.751ms (Unnamed Layer* 47) 0.226ms MobileNet/conv_ds_7/depthwise_conv/BiasAdd + MobileNet/conv_ds_7/dw_batch_norm/Relu 2.938ms MobileNet/conv_ds_7/pointwise_conv/BiasAdd + MobileNet/conv_ds_7/pw_batch_norm/Relu 0.433ms MobileNet/conv_ds_8/depthwise_conv/BiasAdd + MobileNet/conv_ds_8/dw_batch_norm/Relu 5.846ms MobileNet/conv_ds_8/pointwise_conv/BiasAdd + MobileNet/conv_ds_8/pw_batch_norm/Relu 0.796ms MobileNet/conv_ds_9/depthwise_conv/BiasAdd + MobileNet/conv_ds_9/dw_batch_norm/Relu 4.293ms MobileNet/conv_ds_9/pointwise_conv/BiasAdd + MobileNet/conv_ds_9/pw_batch_norm/Relu 0.786ms MobileNet/conv_ds_10/depthwise_conv/BiasAdd + MobileNet/conv_ds_10/dw_batch_norm/Relu 4.885ms MobileNet/conv_ds_10/pointwise_conv/BiasAdd + MobileNet/conv_ds_10/pw_batch_norm/Relu 0.787ms MobileNet/conv_ds_11/depthwise_conv/BiasAdd + MobileNet/conv_ds_11/dw_batch_norm/Relu 5.855ms MobileNet/conv_ds_11/pointwise_conv/BiasAdd + MobileNet/conv_ds_11/pw_batch_norm/Relu 0.748ms MobileNet/conv_ds_12/depthwise_conv/BiasAdd + MobileNet/conv_ds_12/dw_batch_norm/Relu 4.874ms MobileNet/conv_ds_12/pointwise_conv/BiasAdd + MobileNet/conv_ds_12/pw_batch_norm/Relu 0.791ms (Unnamed Layer* 96) 0.118ms MobileNet/conv_ds_13/depthwise_conv/BiasAdd + MobileNet/conv_ds_13/dw_batch_norm/Relu 5.715ms MobileNet/conv_ds_13/pointwise_conv/BiasAdd + MobileNet/conv_ds_13/pw_batch_norm/Relu 0.502ms MobileNet/conv_ds_14/depthwise_conv/BiasAdd 10.942ms Time over all layers: 85.414 Why is the depthwise conv cost so much time?
I convert a tensorflow mobilenet model to UFF and profile it on tx2 using tensorrt3.0.
The layer time is :


Why the depthwise_conv(Unnamed Layer* 0) 0.197ms
MobileNet/conv_1/BiasAdd + MobileNet/conv_1/batch_norm/Relu 0.427ms
MobileNet/conv_ds_2/depthwise_conv/BiasAdd + MobileNet/conv_ds_2/dw_batch_norm/Relu 8.436ms
MobileNet/conv_ds_2/pointwise_conv/BiasAdd + MobileNet/conv_ds_2/pw_batch_norm/Relu 0.731ms
(Unnamed Layer* 13) 0.850ms
MobileNet/conv_ds_3/depthwise_conv/BiasAdd + MobileNet/conv_ds_3/dw_batch_norm/Relu 4.350ms
MobileNet/conv_ds_3/pointwise_conv/BiasAdd + MobileNet/conv_ds_3/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_4/depthwise_conv/BiasAdd + MobileNet/conv_ds_4/dw_batch_norm/Relu 8.709ms
MobileNet/conv_ds_4/pointwise_conv/BiasAdd + MobileNet/conv_ds_4/pw_batch_norm/Relu 0.814ms
(Unnamed Layer* 30) 0.430ms
MobileNet/conv_ds_5/depthwise_conv/BiasAdd + MobileNet/conv_ds_5/dw_batch_norm/Relu 2.753ms
MobileNet/conv_ds_5/pointwise_conv/BiasAdd + MobileNet/conv_ds_5/pw_batch_norm/Relu 0.417ms
MobileNet/conv_ds_6/depthwise_conv/BiasAdd + MobileNet/conv_ds_6/dw_batch_norm/Relu 5.513ms
MobileNet/conv_ds_6/pointwise_conv/BiasAdd + MobileNet/conv_ds_6/pw_batch_norm/Relu 0.751ms
(Unnamed Layer* 47) 0.226ms
MobileNet/conv_ds_7/depthwise_conv/BiasAdd + MobileNet/conv_ds_7/dw_batch_norm/Relu 2.938ms
MobileNet/conv_ds_7/pointwise_conv/BiasAdd + MobileNet/conv_ds_7/pw_batch_norm/Relu 0.433ms
MobileNet/conv_ds_8/depthwise_conv/BiasAdd + MobileNet/conv_ds_8/dw_batch_norm/Relu 5.846ms
MobileNet/conv_ds_8/pointwise_conv/BiasAdd + MobileNet/conv_ds_8/pw_batch_norm/Relu 0.796ms
MobileNet/conv_ds_9/depthwise_conv/BiasAdd + MobileNet/conv_ds_9/dw_batch_norm/Relu 4.293ms
MobileNet/conv_ds_9/pointwise_conv/BiasAdd + MobileNet/conv_ds_9/pw_batch_norm/Relu 0.786ms
MobileNet/conv_ds_10/depthwise_conv/BiasAdd + MobileNet/conv_ds_10/dw_batch_norm/Relu 4.885ms
MobileNet/conv_ds_10/pointwise_conv/BiasAdd + MobileNet/conv_ds_10/pw_batch_norm/Relu 0.787ms
MobileNet/conv_ds_11/depthwise_conv/BiasAdd + MobileNet/conv_ds_11/dw_batch_norm/Relu 5.855ms
MobileNet/conv_ds_11/pointwise_conv/BiasAdd + MobileNet/conv_ds_11/pw_batch_norm/Relu 0.748ms
MobileNet/conv_ds_12/depthwise_conv/BiasAdd + MobileNet/conv_ds_12/dw_batch_norm/Relu 4.874ms
MobileNet/conv_ds_12/pointwise_conv/BiasAdd + MobileNet/conv_ds_12/pw_batch_norm/Relu 0.791ms
(Unnamed Layer* 96) 0.118ms
MobileNet/conv_ds_13/depthwise_conv/BiasAdd + MobileNet/conv_ds_13/dw_batch_norm/Relu 5.715ms
MobileNet/conv_ds_13/pointwise_conv/BiasAdd + MobileNet/conv_ds_13/pw_batch_norm/Relu 0.502ms
MobileNet/conv_ds_14/depthwise_conv/BiasAdd 10.942ms
Time over all layers: 85.414
Why is the depthwise conv cost so much time?

#1
Posted 11/01/2017 03:43 AM   
Hi, May I know which data format do you use? NCHW or NHWC? Thanks.
Hi,

May I know which data format do you use? NCHW or NHWC?

Thanks.

#2
Posted 11/01/2017 07:36 AM   
[quote="AastaLLL"]Hi, May I know which data format do you use? NCHW or NHWC? Thanks.[/quote] Hi AastaLLL, tensorflow depthwise conv API only supports NHWC. I use NHWC data format. Thanks
AastaLLL said:Hi,

May I know which data format do you use? NCHW or NHWC?

Thanks.

Hi AastaLLL,

tensorflow depthwise conv API only supports NHWC. I use NHWC data format.

Thanks

#3
Posted 11/01/2017 08:16 AM   
Hi, Currently, separable convolution is implemented with groups=C + conv1x1, and it's not efficient enough. We're looking at the possibility to optimize general convolution groups. But we can't provide any firm commitments or estimates at this time. Thanks and sorry for the inconvenience.
Hi,

Currently, separable convolution is implemented with groups=C + conv1x1, and it's not efficient enough.
We're looking at the possibility to optimize general convolution groups. But we can't provide any firm commitments or estimates at this time.

Thanks and sorry for the inconvenience.

#4
Posted 11/02/2017 06:53 AM   
Hi, @373197201, can you please specify which implementation of "tensorflow mobilenet" you were using ?
Hi,

@373197201, can you please specify which implementation of "tensorflow mobilenet" you were using ?

#5
Posted 12/05/2017 09:04 AM   
Hi, any chance to see the depthwise better optimized in cudnn? We have implemented our own kernels in cuda, but would like more optimal convolutions like winograd. Regards.
Hi, any chance to see the depthwise better optimized in cudnn?
We have implemented our own kernels in cuda, but would like more optimal convolutions like winograd.
Regards.

#6
Posted 12/16/2017 08:00 PM   
Hi, We're looking at the possibility, but we can't provide any firm commitments or estimates at this time. Thanks.
Hi,

We're looking at the possibility, but we can't provide any firm commitments or estimates at this time.
Thanks.

#7
Posted 12/19/2017 06:11 AM   
[quote="AastaLLL"]Hi, We're looking at the possibility, but we can't provide any firm commitments or estimates at this time. Thanks. [/quote] Hi AastaLLL: Have the issue been solved? Thanks Bryan
AastaLLL said:Hi,

We're looking at the possibility, but we can't provide any firm commitments or estimates at this time.
Thanks.


Hi AastaLLL:

Have the issue been solved?

Thanks
Bryan

#8
Posted 03/29/2018 10:26 AM   
Hello, I'm trying to convert TF frozen default mobilenet to uff format with uff.from_tensorflow_frozen_model method. I am facing an issue (error saying): AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'unknown_rank' did you (@[b]373197201[/b]) face any such issue ??? Any help will be appreciated. Thanks!!
Hello,

I'm trying to convert TF frozen default mobilenet to uff format with uff.from_tensorflow_frozen_model method.

I am facing an issue (error saying):

AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'unknown_rank'

did you (@373197201) face any such issue ???


Any help will be appreciated. Thanks!!

#9
Posted 03/30/2018 08:37 AM   
Hi, 373197201 The improvement is in our plan but we cannot disclose concrete schedule. Please pay attention to our announcement for the latest update. Thank.s
Hi, 373197201

The improvement is in our plan but we cannot disclose concrete schedule.
Please pay attention to our announcement for the latest update.

Thank.s

#10
Posted 04/02/2018 07:15 AM   
Hi, gautam.patel This issue is from the pure TensorFlow use case. It's recommended to share your issue with the TensorFlow developer for information. Thanks.
Hi, gautam.patel

This issue is from the pure TensorFlow use case.
It's recommended to share your issue with the TensorFlow developer for information.

Thanks.

#11
Posted 04/02/2018 07:18 AM   
Scroll To Top

Add Reply