Execute kernels without 100% CPU busy-wait?
Is it possible to execute kernels through OpenCL without incurring a busy wait on one CPU-core? If it is not possible through OpenCL, is it possible through CUDA directly?

I would like to keep feeding the GPU more work as the previous work is done, but preferably without running the CPU at 100%.
Is it possible to execute kernels through OpenCL without incurring a busy wait on one CPU-core? If it is not possible through OpenCL, is it possible through CUDA directly?



I would like to keep feeding the GPU more work as the previous work is done, but preferably without running the CPU at 100%.

#1
Posted 05/23/2011 06:04 AM   
Of course that's possible. In fact, that's the default case, and you would have to do some work to make a (GPU) kernel execution block on the (CPU) host. If you look at the documentation of e.g. clEnqueueNDRangeKernel(), you'll see that it immediately returns after "the kernel execution was [successfully] queued" (not "executed"). So executing a kernel is a non-blocking operation by default, as long as you don't wait for the event associated with the kernel execution instance.
Of course that's possible. In fact, that's the default case, and you would have to do some work to make a (GPU) kernel execution block on the (CPU) host. If you look at the documentation of e.g. clEnqueueNDRangeKernel(), you'll see that it immediately returns after "the kernel execution was [successfully] queued" (not "executed"). So executing a kernel is a non-blocking operation by default, as long as you don't wait for the event associated with the kernel execution instance.

#2
Posted 05/23/2011 06:15 AM   
Sure, clEnqueueNDRangeKernel() returns. But there is a background thread in nvcuda.dll eating 100% CPU time.
Sure, clEnqueueNDRangeKernel() returns. But there is a background thread in nvcuda.dll eating 100% CPU time.

#3
Posted 05/23/2011 09:27 AM   
Well, I don't see this behavior on Vista x64, driver 270.61, GTX 285.
Well, I don't see this behavior on Vista x64, driver 270.61, GTX 285.

#4
Posted 05/23/2011 09:50 AM   
I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.

It's very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.
I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.



It's very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.

#5
Posted 05/23/2011 10:34 AM   
[quote name='Dr.Haribo' date='23 May 2011 - 01:34 PM' timestamp='1306146851' post='1241243']
I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.

It's very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.
[/quote]

Busy wait loop is actually the default behavior under NVIDIA. Under CUDA you have an option to change the behavior into blocking synchronization or to wait on an interupt. The purpose of busy waiting is actually to get minimal latency in the responce. I don't think that you can change the behavior with OpenCL though.
[quote name='Dr.Haribo' date='23 May 2011 - 01:34 PM' timestamp='1306146851' post='1241243']

I get this behavior with GeForce GTX 580 with 270.61 drivers, on 64-bit Windows 7. Not only with my own program, but OpenCL programs written by others as well.



It's very interesting to hear that it is not the same for everyone. There must be some kind of software or hardware issue causing it.





Busy wait loop is actually the default behavior under NVIDIA. Under CUDA you have an option to change the behavior into blocking synchronization or to wait on an interupt. The purpose of busy waiting is actually to get minimal latency in the responce. I don't think that you can change the behavior with OpenCL though.

#6
Posted 05/24/2011 01:58 PM   
It seems to be possible in CUDA but not in OpenCL to avoid the "busy wait":

http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2798

Still I'm wondering why I don't see this, or at least not to that extent. None of my CPU cores reaches 100%, at most 60%, but I'm also constantly doing CPU work, so that's no surprise.
It seems to be possible in CUDA but not in OpenCL to avoid the "busy wait":



http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2798



Still I'm wondering why I don't see this, or at least not to that extent. None of my CPU cores reaches 100%, at most 60%, but I'm also constantly doing CPU work, so that's no surprise.

#7
Posted 05/24/2011 02:42 PM   
Thanks for the info, guys.

[quote name='eyebex' date='24 May 2011 - 04:42 PM' timestamp='1306248148' post='1241728']
It seems to be possible in CUDA but not in OpenCL to avoid the "busy wait":

http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2798

[/quote]

Did you paste the wrong link? It's about sharing data between OpenCL and OpenGL.

If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?
Thanks for the info, guys.



[quote name='eyebex' date='24 May 2011 - 04:42 PM' timestamp='1306248148' post='1241728']

It seems to be possible in CUDA but not in OpenCL to avoid the "busy wait":



http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2798







Did you paste the wrong link? It's about sharing data between OpenCL and OpenGL.



If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?

#8
Posted 05/27/2011 02:56 PM   
[quote name='Dr.Haribo' date='27 May 2011 - 04:56 PM' timestamp='1306508206' post='1243102']
Did you paste the wrong link? It's about sharing data between OpenCL and OpenGL.
[/quote]
Indeed, sorry. Here's the correct one: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2794
[quote name='Dr.Haribo' date='27 May 2011 - 04:56 PM' timestamp='1306508206' post='1243102']

Did you paste the wrong link? It's about sharing data between OpenCL and OpenGL.



Indeed, sorry. Here's the correct one: http://www.khronos.org/message_boards/viewtopic.php?f=28&t=2794

#9
Posted 05/27/2011 03:04 PM   
any news on this one?

is nvidia going to implement a non-busy wait?

--
Srdja
any news on this one?



is nvidia going to implement a non-busy wait?



--

Srdja

#10
Posted 06/16/2011 03:54 PM   
any news on this one?

is nvidia going to implement a non-busy wait?

--
Srdja
any news on this one?



is nvidia going to implement a non-busy wait?



--

Srdja

#11
Posted 06/16/2011 03:54 PM   
[quote name='Dr.Haribo' date='27 May 2011 - 06:56 PM' timestamp='1306508206' post='1243102']

If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?
[/quote]
+1
Looks like this behavior driver version dependent.
With 267.24 we see very low CPU consumption in OpenCL app, but with 275.33 cpu time almost the same as elapsed, CPU used constantly...
It's very undesirable cause we use CPU for computations too.
[quote name='Dr.Haribo' date='27 May 2011 - 06:56 PM' timestamp='1306508206' post='1243102']



If NVIDIA is listening, this would be high on my wish list: Having a way to choose between busy-wait and hardware interrupt. Perhaps an OpenCL extension could be used to expose this feature from CUDA into OpenCL?



+1

Looks like this behavior driver version dependent.

With 267.24 we see very low CPU consumption in OpenCL app, but with 275.33 cpu time almost the same as elapsed, CPU used constantly...

It's very undesirable cause we use CPU for computations too.

#12
Posted 06/26/2011 07:50 PM   
What kind of tool do you to view the cpu thread's infomation? I tried gdb but it won't let me interrupt when the kernel is running.
What kind of tool do you to view the cpu thread's infomation? I tried gdb but it won't let me interrupt when the kernel is running.

#13
Posted 08/19/2011 03:10 PM   
Scroll To Top

Add Reply