Simple OpenCL program compiles and runs, gives incorrect output
I wrote a simple OpenCL program based off the SDK and it compiles and runs, however the output is wrong. Is there something I'm doing wrong?

Any suggestions for learning to debug C and OpenCL is much appreciated. I'm quite new to the platform.

Code is attached.

Thanks.
I wrote a simple OpenCL program based off the SDK and it compiles and runs, however the output is wrong. Is there something I'm doing wrong?



Any suggestions for learning to debug C and OpenCL is much appreciated. I'm quite new to the platform.



Code is attached.



Thanks.
Attachments

TEST_OPENCL.zip

#1
Posted 11/18/2009 01:41 AM   
Hi,
the best things to do it's to look all GPU memory with ReadBuffer to check the data
If you have very complicated kernel you can make on CPU to see if there is some problem
[code]// create buffers on device
cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, &a, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, &b, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY, mem_size, &c, &err);
shrCheckError(err, CL_SUCCESS);[/code]
problem with the declaration
You make c = a + b
So c = CL_MEM_WRITE_ONLY and a et b are CL_MEM_READ_ONLY and not the inverse ;)

Moreover, after you make a WriteBuffer it's more simple to make :
[code] cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &a, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &b, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &c, &err);
shrCheckError(err, CL_SUCCESS);[/code]
Last thing, why you put the adress of a, b and c (&a, &b and &c) instead of a, b and c ?
this code runs:
[code]cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, c, &err);
shrCheckError(err, CL_SUCCESS);[/code]
same thing for the readbuffer

Thanks
J
Hi,

the best things to do it's to look all GPU memory with ReadBuffer to check the data

If you have very complicated kernel you can make on CPU to see if there is some problem

// create buffers on device

cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, &a, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, &b, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY, mem_size, &c, &err);

shrCheckError(err, CL_SUCCESS);


problem with the declaration

You make c = a + b

So c = CL_MEM_WRITE_ONLY and a et b are CL_MEM_READ_ONLY and not the inverse ;)



Moreover, after you make a WriteBuffer it's more simple to make :

cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &a, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &b, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, &c, &err);

shrCheckError(err, CL_SUCCESS);


Last thing, why you put the adress of a, b and c (&a, &b and &c) instead of a, b and c ?

this code runs:

cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, c, &err);

shrCheckError(err, CL_SUCCESS);


same thing for the readbuffer



Thanks

J

#2
Posted 11/18/2009 10:16 AM   
I made the suggested changes and after copying array a and b to the device, running the kernel and copying back to new arrays d and e, I was able to establish the memory was correctly copied. However, array c is still 0. I'm thinking that the kernel did not run for some reason. Or if it did, not correctly. Any suggestions?

Code is attached.

And thanks for the corrections and pointers.
I made the suggested changes and after copying array a and b to the device, running the kernel and copying back to new arrays d and e, I was able to establish the memory was correctly copied. However, array c is still 0. I'm thinking that the kernel did not run for some reason. Or if it did, not correctly. Any suggestions?



Code is attached.



And thanks for the corrections and pointers.
Attachments

TEST_OPENCL.zip

#3
Posted 11/18/2009 03:05 PM   
change
[code]// create buffers on device
cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, c, &err);
shrCheckError(err, CL_SUCCESS);

// copy data from host to device
err = clEnqueueWriteBuffer(cmd_queue, vol_a, CL_TRUE, 0, mem_size, a, 0, NULL, NULL);
err |= clEnqueueWriteBuffer(cmd_queue, vol_b, CL_TRUE, 0, mem_size, b, 0, NULL, NULL);
shrCheckError(err, CL_SUCCESS);[/code]

to
[code]// create buffers on device
cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);
shrCheckError(err, CL_SUCCESS);

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, NULL, &err);
shrCheckError(err, CL_SUCCESS);[/code]
Change your kernel
to
[code]__kernel void add_array(__global int *a, __global int *b, __global int *c)
{
int xid = get_global_id(0);
c[xid] = a[xid] + b[xid];
}[/code]
because you work with int and not with float
change

// create buffers on device

cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, c, &err);

shrCheckError(err, CL_SUCCESS);



// copy data from host to device

err = clEnqueueWriteBuffer(cmd_queue, vol_a, CL_TRUE, 0, mem_size, a, 0, NULL, NULL);

err |= clEnqueueWriteBuffer(cmd_queue, vol_b, CL_TRUE, 0, mem_size, b, 0, NULL, NULL);

shrCheckError(err, CL_SUCCESS);




to

// create buffers on device

cl_mem vol_a = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, a, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_b = clCreateBuffer(gpu_context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, b, &err);

shrCheckError(err, CL_SUCCESS);



cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, NULL, &err);

shrCheckError(err, CL_SUCCESS);


Change your kernel

to

__kernel void add_array(__global int *a, __global int  *b, __global int *c)

{

int xid = get_global_id(0);

c[xid] = a[xid] + b[xid];

}


because you work with int and not with float

#4
Posted 11/18/2009 05:30 PM   
I'm an idiot. I had been staring at that code for so long that I didn't realize the float/int issue.

Although it compiled with "NULL" instead of "c", the program errored out. Changing it to "c" made it work.
[codebox] cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, c, &err);
shrCheckError(err, CL_SUCCESS);[/codebox]

Thanks for all the help!
I'm an idiot. I had been staring at that code for so long that I didn't realize the float/int issue.



Although it compiled with "NULL" instead of "c", the program errored out. Changing it to "c" made it work.

[codebox] cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY | CL_MEM_COPY_HOST_PTR, mem_size, c, &err);

shrCheckError(err, CL_SUCCESS);[/codebox]



Thanks for all the help!

#5
Posted 11/18/2009 06:02 PM   
it's
[code]cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, NULL, &err)[/code]
Don't forget to erase CL_MEM_COPY_HOST_PTR if you make a null pointer for the CPU
I think you should read the openCL guide because you don't understand what you make
it's

cl_mem vol_c = clCreateBuffer(gpu_context, CL_MEM_WRITE_ONLY, mem_size, NULL, &err)


Don't forget to erase CL_MEM_COPY_HOST_PTR if you make a null pointer for the CPU

I think you should read the openCL guide because you don't understand what you make

#6
Posted 11/19/2009 09:16 AM   
Scroll To Top