Frame rate inconsistent with frame time in scubber line

Dear all,

I’am writting an OpenGL 4.4/GLSL Library for Deep Learning convolutional neuronal model prediction.

In ordrer to profil the GPU load of my compute shaders (between computing features maps/pooling operation etc…) I wrote a simple Windows application with an OpenGL window, a glClearColor(), and CNN prediction using compute shader at each redraw. Nsight 5.5 crash when I start “Performance analysis” however this is not my issue.

The frame rate captionned by Nsight is about 22.5 FPS (43 ms/frame). But when I do “Pause and capture Frame” the total scale of the Scrubber timeline reported only 3 ms (even I choose GPU or CPU Duration Scale).

I think there is an omission of a lot of CPU time (the time to wait a glMapBuffer() after a memory barrier for example). Note that I used IndirectDispatchCompute(), so the reported time for this call is not the time to process and finish the GPU computations. This is why I used memory barriers before reading the resulting textures.

Does anyone have the same problem ? Why this is not the actual elapsed time in the scrubber line ?

Tk, Best,

Benjamin

My config: VS2015, Nsight 5.5, Geforce GTX 1060, version 4.6.0 388.71 driver

Hi benj.aubert,

Could you please post some screenshots of this issue? Thanks.

Hi,

Here is the screenshot for framerate: [url]https://imgur.com/a/w3YcF[/url]
Here is the screenshot for the Scribber timeline (total scale is only 1-2 ms) [url]https://imgur.com/a/SkQW2[/url]

Benjamin

Hi,

Thanks for the reply.
Could you please check the CPU/GPU times in “Events View”?

Hi,

Thanks, here is the screenshot for Events and API stats windows: [url]https://imgur.com/a/B0fKj[/url]

The GPU/CPU work between the start of capture (wglMakeCurrent) and the end of capture (SwapBuffers) is 35-40 ms (benchmarked using std::chrono::steady_clock).

Benjamin

Hi,

Thanks for the feedback.

From your screenshots of Events and Scrubber, CPU time range in scrubber is 1~2 ms, which looks correct as in Events view, CPU times sum up to 1~2 sec.

While the framerate shows ~40ms, but total GPU time in Events view is ~20ms. This seems weird, we will further investigate this case.

Hi,

Ok Thanks,

The questions/points that can help you to investigate:

  • My example use indirectDispatch() with Compute Shader, so the actual GPU time of processing is not the time of the call to the OPENGL API.

-I used texture memory barrier, so when I want to access to results (and pull texture memory from GPU to CPU), in fact the function glMapBuffer() blocks until the GPU processing is finished → It seems that this time is not correct (largely under estimated) in the events list.

Tk,

Best,

Benjamin

Hi benj.aubert,

It looks like some misunderstanding of frame time (~40ms) and other time numbers in Nsight.

Nsight’s time might be more specific/focus to Graphics/Compute API on both CPU and GPU Time. For example, the Api statistics view only show the sum of the api calls, but not the duration. On the other hand, Nsight only trace the API calls like OGL drawcall/dispatch calls, but not other functions calls for AI simulation or Physical simuatlion, which might cost even more time, or just some waiting time that contribute to frame time, but you might not see these time in Nsight (because it’s not tracked by Nsight)

Thanks
An