I ran the profiler against my application and it came back with the following metrics:
cf_issued Issued Control-Flow Instructions 8,388,480
cf_executed Executed Control-Flow Instructions 8,388,480
inst_control Control-Flow Instructions 134,215,680
inst_control is exactly 16x cf_issued and cf_executed. inst_control is exactly 2x the number of threads I had running. There is obviously some kind of relationship between these numbers, but the docs don’t really help (or at least I haven’t found the explanation yet).
Can anyone tell me what, specifically, these metrics are counting and how they arrive at the values?
edit: That is to say, I know they are counting control-flow instructions, etc., but how they came to the values displayed.
edit 2: I think it’s something like the GPU issued 4 control-flow instructions per warp (67,107,840 threads), which would give the 8,388,480 number. But then, why would the total (I’m assuming it’s the total) inst_control only be 2x the number of threads? Wouldn’t it, at that point, be 4x?