Nsight VS CUPTI

I tested the ‘callback_metric’ example from CUPTI samples directory (i.e. NVIDIA GPU Computing Toolkit\CUDA\v5.5\extras\CUPTI\sample). The metric I gave is ‘achieved_occupancy’ and the result is larger than 1, which is apparently wrong.

My objective is to port out the profile results from Nsight, but the .csv file doesn’t contain all detail numbers. So I plan to use CUPTI and write the interested metrics to files in the program. I think CUPTI is the right API to do this. Anyone knows the reason for the wrong results for the sample code?

rla,

CUPTI is the correct interface to collect these metrics. Can you provide additional information on your target GPU and driver version so I can forward the question to the CUPTI development team.
The metric achieved_occupancy is calculated as

SUM_SM(active_warps) / SUM_SM(active_cycles) / MAX_WARPS_PER_SM

on compute capability 2.0 and above devices. If this is the only metric specified then these counters can be collected from all SMs in a single pass. If you specify multiple metrics or additional counters then it is possible that the number will be out of range.

On some architectures you can get an out of range value if the duration of the kernel is very short (< 10 µs).

NOTE: Nsight Visual Studio Edition does not use the CUPTI SDK; however, Visual Profiler and Nsight Eclipse Edition are based upon the CUPTI SDK. If you see additional differences please let us know. We aim to have consistent values across the tools.

Hi Greg,
I sent emails to CUPTI teams and got their replies. I believe the problem is due to my old driver version. After updating the driver and upgrading to CUDA 5.5, I got correct results. See the following outputs (results are collected on GTX TITAN).

However, I still cannot collect the metric value of ‘branch_efficiency’ on this card. And the error is ‘CUPTI_ERROR_INVALID_METRIC_NAME’. I then tried the sample ‘cupti_query’ and the outputs show:
Invalid/incomplete option 0
Invalid/incomplete option branch_efficiency
…other events’ values…

Do you know any possible reasons?


outputs of achieved_occupancy

  1. Linux-x86_64 Ubuntu 12.04, CUDA 5.0 with driver version 319.17
    Usage: ./callback_metric [device_num] [metric_name]
    CUDA Device Number: 0
    CUDA Device Name: GeForce GTX TITAN
    ./callback_metric, 0, achieved_occupancy
    Launching kernel: blocks 196, thread/block 256
    Duration = 7008ns
    Pass 0
    Launching kernel: blocks 196, thread/block 256
    active_cycles = 39270 (2768, 2788, 2878, 2732, 2798, 2749, 2818, 2832, 2772, 2738, 2907, 2805, 2813, 2872)
    active_cycles (normalized) (39270 * 14) / 14 = 39270
    active_warps = 4067052 (286976, 289612, 299336, 280952, 290316, 281500, 290404, 293680, 287256, 282108, 303780, 291472, 290212, 299448)
    active_warps (normalized) (4067052 * 14) / 14 = 4067052
    Metric achieved_occupancy = 1.618225

  1. Linux-x86_64 Ubuntu 12.04, CUDA 5.5 with driver version 319.37
    Usage: ./callback_metric [device_num] [metric_name]
    CUDA Device Number: 0
    CUDA Device Name: GeForce GTX TITAN
    Launching kernel: blocks 196, thread/block 256
    Duration = 6656ns
    Pass 0
    Launching kernel: blocks 196, thread/block 256
    active_warps = 1856780 (131720, 132944, 132570, 134670, 135798, 130490, 132282, 132996, 131560, 132608, 133284, 131406, 131770, 132682)
    active_warps (normalized) (1856780 * 14) / 14 = 1856780
    active_cycles = 35949 (2568, 2566, 2560, 2594, 2627, 2523, 2573, 2582, 2556, 2559, 2568, 2545, 2546, 2582)
    active_cycles (normalized) (35949 * 14) / 14 = 35949
    Metric achieved_occupancy = 0.807037