OpenCL bug: __constant program scope variable and __constant kernel argument?
Hi all! I think I discovered a bug in Nvidia's OpenCL implementation (CUDA 5.0, driver 306.94). If a kernel gets a pointer to the __constant memory space as an argument, it can't access an array declared at program scope (which also must reside in __constant memory space according to the OpenCL 1.1 specification). The following OpenCL code demonstrates this: [code]__constant float2 data[] = { (float2)(1.0f, 2.0f), (float2)(3.0f, 4.0f), (float2)(5.0f, 6.0f), (float2)(7.0f, 8.0f), }; __kernel void test1(__global float2 *out) { int x = get_local_id(0); out[x] = data[x]; /* ok */ } __kernel void test2(__global int *in, __global float2 *out) { int x = get_local_id(0); out[x] = data[x]; /* ok */ } __kernel void test3(__constant int *in, __global float2 *out) { int x = get_local_id(0); out[x] = data[x]; /* fails! */ }[/code] The kernel "test1" copies the __constant "data" array to the __global "out" pointer, and "test2" does the same. However, kernel "test3" copies zeroes to "out", though the only difference to "test2" is the address space qualifier of the (ignored) kernel argument "in". The same code works fine with AMD's OpenCL implementation (tested on an Intel Core i7 CPU and a Radeon HD7950). Does anybody know how this can be worked around or when it will be fixed? Kind regards, Markus
Hi all!

I think I discovered a bug in Nvidia's OpenCL implementation (CUDA 5.0, driver 306.94). If a kernel gets a pointer to the __constant memory space as an argument, it can't access an array declared at program scope (which also must reside in __constant memory space according to the OpenCL 1.1 specification). The following OpenCL code demonstrates this:

__constant float2 data[] = {
(float2)(1.0f, 2.0f),
(float2)(3.0f, 4.0f),
(float2)(5.0f, 6.0f),
(float2)(7.0f, 8.0f),
};

__kernel void test1(__global float2 *out)
{
int x = get_local_id(0);
out[x] = data[x]; /* ok */
}

__kernel void test2(__global int *in, __global float2 *out)
{
int x = get_local_id(0);
out[x] = data[x]; /* ok */
}

__kernel void test3(__constant int *in, __global float2 *out)
{
int x = get_local_id(0);
out[x] = data[x]; /* fails! */
}


The kernel "test1" copies the __constant "data" array to the __global "out" pointer, and "test2" does the same. However, kernel "test3" copies zeroes to "out", though the only difference to "test2" is the address space qualifier of the (ignored) kernel argument "in". The same code works fine with AMD's OpenCL implementation (tested on an Intel Core i7 CPU and a Radeon HD7950).

Does anybody know how this can be worked around or when it will be fixed?

Kind regards,
Markus

#1
Posted 11/20/2012 06:33 PM   
Scroll To Top