Profiler: Unable to collect metric and event values

I am experiencing some issues with the visual profiler (Cuda 4.1).

I am trying to profile with Nvvp a quite complex CFD code (globally a very complicated version of the particles example).
Nvvp plot the timeline but as soon i try to add a counter a got the “Metric/Event Collection Failed” error.

I used the profiler in command line and got this result: as soon i activate a counter the code became slower so the timelines
cannot coincide but the kernel order is exactly the same.

How can i fix that ?

I am also having this issue. My code isn’t so much complex as it is a large amount of operations. It takes about 50 seconds to fully execute the kernel. And I can get the timeline showing in NVVP, but once I choose the next test/analysis to run a problem occurs. Now normally the program completes, but when I run this second test it goes on for about 10 minutes working on profiling the thing. It then abruptly ends with this error. Based on console output I have I knew it gets to starting up the kernel in question.

Might anyone have an idea what’s going on here?

You may be overloading the profiler. Seek to reduce the scope of profiling. Suggestions are given in the profiler documentation.

Is there anyway to possibly reduce the scope of the Guided analysis options that are in the bottom left corner of the profiler. The options in first part of that guided analytics run fine. There are problems when I try to run the second part. Specifically the option to choose a particular kernel to analyze.

I tried looking in the documentation, and I did find the pertinent information, but it didn’t mention anything in conjunction to this Guided Analysis.

metric/event collection can fail if you overrun a counter.

A typical way to try to avoid this would be to reduce the scope of your program. If your kernel takes 50 seconds to execute, see if you can get the kernel execution down to a second or so. This will have benefits not only for this particular issue but also for profiling responsiveness anyway, since many profiling operations will require multiple runs of your kernel.

In the context of a CFD code, perhaps if you are creating a simulation over a large discretized space, see if you can reduce the space, and therefore the number of blocks launched, and therefore the kernel duration.

I’m not sure I’ll be able to give good general advice without considering a specific code or behavior.