Hello,
I am using OpenCL to run basic picture analysis function on GPU.
As I am working with HD pictures (1920*1080) I have a lot of pixels to deal with, and I need wide vectorization. I read that image2d object provide good performances when working with 2d images, which is my case, so I decided to use it instead of buffers. But the thing is that readImage functions into kernel, do not return any vector bigger than 4 components, whereas the hardware should, to my opinion, deal with much more.
So, am I missing something, and there is a possibility to read more than 4 pixels, or I should use buffers and do use vloadn to read my pixels?
with readimagei:
localmem[l_j*l_size_i+l_i]=convert_uint4(abs( read_imagei(pix, samplerA, (int2)(g_i, 2*g_j)) - read_imagei(pix, samplerA, (int2)(g_i, 2*g_j+1)) ));
with vload:
localmem[l_j*l_size_i+l_i]=convert_uint16(abs( vload16(2*g_j*stride + g_i, pix) - vload16((2*g_j+1)*stride + g_i, pix)));
I am going to test performances with the 2 solutions, but I find this situation a bit weird ^^.
EDIT: After running some basic execution time measures, it appears that using image2d is better if working with GPU, but it’s the opposite with CPU.
Thank you.