Dear NVidia Developers,
I am running to strange limitations on the number of arguments that are passed to kernel.
I’ve designed the simple 3d/2d test to demonstrate the problem (see attached oclSimple3d.zip).
My system is Windows 7 64, NVidia GeForce GTX 260 (driver 285.62) and CUDA 4.0.
According to the “NVIDIA OpenCL BestPracticesGuide†“Shared memory holds the parameters
or arguments that are passed to kernels at launch". The shared memory limit is about 16KB per thread.
Also the limit on the number of arguments is pretty large. See CL_DEVICE_MAX_PARAMETER_SIZE.
The minimum supported value must be at least 256 bytes. On my testing system that is CL_DEVICE_MAX_PARAMETER_SIZE=4352
Please take in mind that my testing 2d/3d kernels are very simple (see “simple3d_test1†kernel).
This “simple3d_test1†kernel executes with no problem on 3D image size 256x256x256.
Although adding more parameters to the kernel see “simple3d_test2â€, completely break the execution.
The GPU result DOESN’T MATCH CPU result within allowable tolerance for “simple3d_test2â€.
The resulting 3D array has 98% pixels being zeros. I’ve tried many variations of parameters for this kernel
(see kernels simple3d_test2…simple3d_test5). The result is always the same 98% pixels being zeros.
It seems that driver 285.62 does not execute it correctly which make is impossible to use with 3D arrays.
I’ve also tested it for simple 2D array where similar
kernels works fine see “simple2d_test2†for example (also set test3D = false in the oclSimple3D.cpp).
Building it for x86 or x64, does not effect the test results.
Could you please explain or confirm it?
Thanks,
-Denis
// working kernel in 3D
// test1
__kernel void simple3d_test1(const __global float* in,
uint4 in_size,
__global float* out,
uint4 out_size)
{
uint3 index;
index.x = get_global_id(0);
index.y = get_global_id(1);
index.z = get_global_id(2);
bool isValid = true;
if(index.x >= out_size.x) isValid = false;
if(index.y >= out_size.y) isValid = false;
if(index.z >= out_size.z) isValid = false;
if(isValid)
{
uint gidx = out_size.x *(index.z * out_size.y + index.y) + index.x;
out[gidx] = 1;
}
}
// NOT working kernel in 3D, what is wrong?
// test2
__kernel void simple3d_test2(const __global float* in,
uint4 in_size,
float4 f11,
float4 f12,
float16 a1,
float16 b1,
float16 c1,
__global float* out,
uint4 out_size,
float4 f21,
float4 f22,
float16 a2,
float16 b2,
float16 c2)
{
uint3 index;
index.x = get_global_id(0);
index.y = get_global_id(1);
index.z = get_global_id(2);
bool isValid = true;
if(index.x >= out_size.x) isValid = false;
if(index.y >= out_size.y) isValid = false;
if(index.z >= out_size.z) isValid = false;
if(isValid)
{
uint gidx = out_size.x *(index.z * out_size.y + index.y) + index.x;
out[gidx] = 1;
}
}
// working kernel in 2D
// test2 2D
__kernel void simple2d_test2(const __global float* in,
uint2 in_size,
float2 f11,
float2 f12,
float16 a1,
float16 b1,
float16 c1,
float16 d1,
__global float* out,
uint2 out_size,
float2 f21,
float2 f22,
float16 a2,
float16 b2,
float16 c2,
float16 d2)
{
uint2 index;
index.x = get_global_id(0);
index.y = get_global_id(1);
bool isValid = true;
if(index.x >= out_size.x) isValid = false;
if(index.y >= out_size.y) isValid = false;
if(isValid)
{
unsigned int gidx = out_size.x * index.y + index.x;
out[gidx] = 1;
}
}
oclSimple3d.zip (9.91 KB)