Bad OpenCL Performance - Profiling?

Hi everyone,

I’m working on some academic OpenCL-Code that I’m trying to tune for Kepler-based GPUs. I have noticed some unexpected behavior for our optimizations which seem to degrade performance on Nvidia hardware. I am aware that I can profile my CUDA-Code (we have a lot of highly-optimized CUDA-Code), but I seemingly cannot profile my OpenCL kernels with nsight/nvvp/nvprof.

Is it at all possible to profile OpenCL kernels with Nvidia hardware?

Best Regards,
David Pfander