Image2d vector length How to read more than 4 pixels
Hello,

I am using OpenCL to run basic picture analysis function on GPU.

As I am working with HD pictures (1920*1080) I have a lot of pixels to deal with, and I need wide vectorization. I read that image2d object provide good performances when working with 2d images, which is my case, so I decided to use it instead of buffers. But the thing is that readImage functions into kernel, do not return any vector bigger than 4 components, whereas the hardware should, to my opinion, deal with much more.

So, am I missing something, and there is a possibility to read more than 4 pixels, or I should use buffers and do use vloadn to read my pixels?

with readimagei:
[code]localmem[l_j*l_size_i+l_i]=convert_uint4(abs( read_imagei(pix, samplerA, (int2)(g_i, 2*g_j)) - read_imagei(pix, samplerA, (int2)(g_i, 2*g_j+1)) ));[/code]

with vload:
[code]localmem[l_j*l_size_i+l_i]=convert_uint16(abs( vload16(2*g_j*stride + g_i, pix) - vload16((2*g_j+1)*stride + g_i, pix)));[/code]


I am going to test performances with the 2 solutions, but I find this situation a bit weird ^^.

EDIT: After running some basic execution time measures, it appears that using image2d is better if working with GPU, but it's the opposite with CPU.

Thank you.
Hello,



I am using OpenCL to run basic picture analysis function on GPU.



As I am working with HD pictures (1920*1080) I have a lot of pixels to deal with, and I need wide vectorization. I read that image2d object provide good performances when working with 2d images, which is my case, so I decided to use it instead of buffers. But the thing is that readImage functions into kernel, do not return any vector bigger than 4 components, whereas the hardware should, to my opinion, deal with much more.



So, am I missing something, and there is a possibility to read more than 4 pixels, or I should use buffers and do use vloadn to read my pixels?



with readimagei:

localmem[l_j*l_size_i+l_i]=convert_uint4(abs( read_imagei(pix, samplerA, (int2)(g_i, 2*g_j)) - read_imagei(pix, samplerA, (int2)(g_i, 2*g_j+1)) ));




with vload:

localmem[l_j*l_size_i+l_i]=convert_uint16(abs( vload16(2*g_j*stride + g_i, pix) - vload16((2*g_j+1)*stride + g_i, pix)));






I am going to test performances with the 2 solutions, but I find this situation a bit weird ^^.



EDIT: After running some basic execution time measures, it appears that using image2d is better if working with GPU, but it's the opposite with CPU.



Thank you.

#1
Posted 04/18/2012 10:57 AM   
I think you're confusing (pixel) components with pixels themselves. A kernel's read_image() function always returns the value of a single pixel, where a single pixel may consist of up to four components (red, green, blue, alpha). If you're dealing with gray scale images that only have a single component per pixel, the returned vector will contain that gray value multiple times (the exact layout depends on whether you've created your images as CL_INTENSITY or CL_LUMINANCE, see e.g. [url="http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/read_imagef3d.html"]the documentation for read_imagef[/url] for details).
I think you're confusing (pixel) components with pixels themselves. A kernel's read_image() function always returns the value of a single pixel, where a single pixel may consist of up to four components (red, green, blue, alpha). If you're dealing with gray scale images that only have a single component per pixel, the returned vector will contain that gray value multiple times (the exact layout depends on whether you've created your images as CL_INTENSITY or CL_LUMINANCE, see e.g. the documentation for read_imagef for details).

#2
Posted 04/18/2012 01:23 PM   
I am dealing with gray scale images indeed. I use CL_RGBA channel order and CL_UNSIGNED_INT8 type, to store my data. This allows me to read 4 grayscale pixels.
I pay attention to make the difference between the real number of pixels and the number of pixels RGBA, when I create and write the data, but in the end it's like I am working with vector of 4 gray scal pixels.

My device support CL_INTENSITY and CL_LUMINANCE, but only with FLOAT data type...


Am I totally wrong? x)
I am dealing with gray scale images indeed. I use CL_RGBA channel order and CL_UNSIGNED_INT8 type, to store my data. This allows me to read 4 grayscale pixels.

I pay attention to make the difference between the real number of pixels and the number of pixels RGBA, when I create and write the data, but in the end it's like I am working with vector of 4 gray scal pixels.



My device support CL_INTENSITY and CL_LUMINANCE, but only with FLOAT data type...





Am I totally wrong? x)

#3
Posted 04/18/2012 02:23 PM   
Scroll To Top