How to generate performance metrics and kernel execution time in nvprof?

I am using this command to generate the metrics for 1 kernel.

nvprof --metrics all --log-file log.txt --csv --profile-api-trace none myapp.exe

I get about 120 lines of output for the performance counters. Here is one of them.

“GeForce GTX TITAN X (0)”,“vectorAdd(float const *, float const , float, int)”,1,“sm_efficiency”,“Multiprocessor Activity”,65.837030%,65.837030%,65.837030%

I can’t figure out how to also get the kernel duration in this one call to nvprof.

Can anyone help me?

–Bob

Hi, bz

Suggest you use nvvp(Visual Profiler) to get what you need.
This is an UI tool designed based on nvprof.
Details you can refer Profiler :: CUDA Toolkit Documentation

Hi,bz

Sorry for the previous answer did not satisfy you. Here is the result checked with dev:

In the next CUDA Toolkit release we are planning a nvprof enhancement to support combined metrics and tracing output. But note that with metric collection kernel execution will be serialized and kernel execution time will not be accurate.