clEnqueueWriteBuffer under the hood
I was wondering if anyone was able to provide more information over the programming guides etc. as to how clEnqueueWriteBuffer works under the hood.

Specifically, if I queue up, say, 100 transfers of small size (100KB) - do they all get grouped into a single transfer to the device? I am finding a 2ms limitation to the lower bound of the transfer speed for 100x enqueues of very small buffers. I have read up on the PCIe interface extensively to see if the issue is related to payload size (including the 8B/10B encoding) but there seems to be no relation.

I have transferred the same amount of data in a single clEnqueueWriteBuffer at much higher speeds(/lower latency). I am wondering, therefore, what happens in OpenCL when you queue so many transfers up?

I am of course using a clFinish() before stopping my clock...

Cheers,

Jam
I was wondering if anyone was able to provide more information over the programming guides etc. as to how clEnqueueWriteBuffer works under the hood.



Specifically, if I queue up, say, 100 transfers of small size (100KB) - do they all get grouped into a single transfer to the device? I am finding a 2ms limitation to the lower bound of the transfer speed for 100x enqueues of very small buffers. I have read up on the PCIe interface extensively to see if the issue is related to payload size (including the 8B/10B encoding) but there seems to be no relation.



I have transferred the same amount of data in a single clEnqueueWriteBuffer at much higher speeds(/lower latency). I am wondering, therefore, what happens in OpenCL when you queue so many transfers up?



I am of course using a clFinish() before stopping my clock...



Cheers,



Jam

#1
Posted 04/01/2012 05:23 PM   
If you are able to use OpenCL 1.1 and depending on why you want to do small transfers [i]clEnqueueWriteBufferRect[/i] might be a solution. I haven't tried it, but I would hope that this aggregates the bits for a speedy transfer.
If you are able to use OpenCL 1.1 and depending on why you want to do small transfers clEnqueueWriteBufferRect might be a solution. I haven't tried it, but I would hope that this aggregates the bits for a speedy transfer.

#2
Posted 04/03/2012 01:37 PM   
Scroll To Top