Unsolvable, see posted link below why: CUDNN_TENSOR_NCHW vs. CUDNN_TENSOR_NHWC

Hi, apparently NCHW is the preferred layout for data buffers in cuDNN. However, the framework I am using (includes CPU optimized routines) has all its data buffers stored in a NHWC manner. I do not want to loose CPU optimized code for scenarios where my users have no suitable GPU available.

  1. Are there any significant performance penalties for NHWC that would make it worthwhile to convert to NCHW?

  2. Are there performance differences between using 4d and Nd tensor descriptors?

  3. Same question for 2d and Nd convolution descriptor?

Did somebody benchmark this already?

Please consider this question as closed. As a matter of fact, I had to realize that the support for NHWC is incomplete in cuDNN. Surprising and very disappointing I might say. Since support for this was kind of announced in 2014 already for the v2: https://devtalk.nvidia.com/default/topic/783344/?comment=4664719

The upcoming cuDnn v4 will support NHWC format for backprop. Stay tuned.

Trying to make this work with v4 now.

NHWC is supported when the tensor is HWC packed witch is fine for my use case.

But how can I specify the N-stride?

cudnnSetTensorDescriptor allows specifying the format (i.e. NHWC), but it doesn’t let you specify the strides.

Afaict, none of the other variants allow you to set the format.