Profiling and performance counters

What APIs/tools do you use for analyzing performance on Nvidia’s OpenCL implementation? On AMD, there are APIs like GPUPerfAPI. Is anything similar available for Nvidia hardware?
Has anyone used Visual Profiler for OpenCL?