cuDNN6.0: NCHW vs. NHWC

Dear All,

I’ve tested cudnnConvolutionForward() routines with NCHW formatted 4-D data with 3x3 kernel. Initial results does not provide inference speedup as I expected. Currently suspicious for NCHW format as speed limiting factor and wondering any potential gains if I change the data layout to NCHW. I heard cuDNN prefers NCHW. In that, few questions below…

Questions

    1. Are there any significant performance penalties for NHWC that would make it worthwhile to convert to NCHW? 2. Did anyone benchmark this already?

Thanks in advance, Hak