Kernel call overhead Is this overhead or am I blocking with the CPU?
Hi!

I'm running an algorithm consisting of several kernels, which are called in a loop - combining them into one big kernel maybe would be possible, but ugly... before I'm investing any time there, I'd like to check ;)

The thing is: Viusal Profile gives me the following numbers:
Kernel1 - 93 calls - 5680 usec (63%)
Kernel2 - 93 calls - 1051 usec (11%)
Kernel3 - 47 calls - 885 usec (10%)
and 4 others, accounting for the remaining few %.
In total that would be around 10ms for all of them, which should yield 100 executions per second. The problem is that inbetween the kernel calls there is a delay, the time width plot shows huge white gaps between the individual kernels. In practice I achieve 10 executions per second. There is nothing happening between kernel calls on the CPU... is there a way how I can reduce this idle time/overhead?

Kind regards

Edit: I'm attaching a screenshot of the time width plot for better illustration. I'm referring to the huge white blocks between kernel calls, where not even the CPU is doing anything (gray blocks at the bottom).
Hi!



I'm running an algorithm consisting of several kernels, which are called in a loop - combining them into one big kernel maybe would be possible, but ugly... before I'm investing any time there, I'd like to check ;)



The thing is: Viusal Profile gives me the following numbers:

Kernel1 - 93 calls - 5680 usec (63%)

Kernel2 - 93 calls - 1051 usec (11%)

Kernel3 - 47 calls - 885 usec (10%)

and 4 others, accounting for the remaining few %.

In total that would be around 10ms for all of them, which should yield 100 executions per second. The problem is that inbetween the kernel calls there is a delay, the time width plot shows huge white gaps between the individual kernels. In practice I achieve 10 executions per second. There is nothing happening between kernel calls on the CPU... is there a way how I can reduce this idle time/overhead?



Kind regards



Edit: I'm attaching a screenshot of the time width plot for better illustration. I'm referring to the huge white blocks between kernel calls, where not even the CPU is doing anything (gray blocks at the bottom).
Attachments

time_width_plot.png

#1
Posted 07/24/2009 11:03 AM   
[quote name='luze' date='24 July 2009 - 07:03 AM' timestamp='1248433389' post='569398']
Hi!

I'm running an algorithm consisting of several kernels, which are called in a loop - combining them into one big kernel maybe would be possible, but ugly... before I'm investing any time there, I'd like to check /wink.gif' class='bbc_emoticon' alt=';)' />

The thing is: Viusal Profile gives me the following numbers:
Kernel1 - 93 calls - 5680 usec (63%)
Kernel2 - 93 calls - 1051 usec (11%)
Kernel3 - 47 calls - 885 usec (10%)
and 4 others, accounting for the remaining few %.
In total that would be around 10ms for all of them, which should yield 100 executions per second. The problem is that inbetween the kernel calls there is a delay, the time width plot shows huge white gaps between the individual kernels. In practice I achieve 10 executions per second. There is nothing happening between kernel calls on the CPU... is there a way how I can reduce this idle time/overhead?

Kind regards

Edit: I'm attaching a screenshot of the time width plot for better illustration. I'm referring to the huge white blocks between kernel calls, where not even the CPU is doing anything (gray blocks at the bottom).
[/quote]

Hi,

did you find by chance a solution to your problem ? I have the same situation now.

Thanks.
[quote name='luze' date='24 July 2009 - 07:03 AM' timestamp='1248433389' post='569398']

Hi!



I'm running an algorithm consisting of several kernels, which are called in a loop - combining them into one big kernel maybe would be possible, but ugly... before I'm investing any time there, I'd like to check /wink.gif' class='bbc_emoticon' alt=';)' />



The thing is: Viusal Profile gives me the following numbers:

Kernel1 - 93 calls - 5680 usec (63%)

Kernel2 - 93 calls - 1051 usec (11%)

Kernel3 - 47 calls - 885 usec (10%)

and 4 others, accounting for the remaining few %.

In total that would be around 10ms for all of them, which should yield 100 executions per second. The problem is that inbetween the kernel calls there is a delay, the time width plot shows huge white gaps between the individual kernels. In practice I achieve 10 executions per second. There is nothing happening between kernel calls on the CPU... is there a way how I can reduce this idle time/overhead?



Kind regards



Edit: I'm attaching a screenshot of the time width plot for better illustration. I'm referring to the huge white blocks between kernel calls, where not even the CPU is doing anything (gray blocks at the bottom).





Hi,



did you find by chance a solution to your problem ? I have the same situation now.



Thanks.

#2
Posted 12/07/2011 06:48 PM   
Scroll To Top