Bug in float3* addressing
Hello,

I have this simple test kernel:

[code]
__kernel void vectest(__global float3* data, __global float* ret) {
float3 v=data[1];
ret[0]=v.x; ret[1]=v.y; ret[2]=v.z;
}
[/code]

data = [0, 1, 2, 3, 4, 5, 6]

ret = [4, 5, 6]

According to the data I receive in the ret array, the float3 variable v consists of the floats at index positions 4-6, instead of 3-5. It seems that when the driver converts the float3 array access to byte offset, it incorrectly assumes that the array element size is 4*4 bytes instead of 3*4.

Using the 64-bit Linux driver version 295.49, running on a GeForce GTX 285.


Csaba
Hello,



I have this simple test kernel:





__kernel void vectest(__global float3* data, __global float* ret) {

float3 v=data[1];

ret[0]=v.x; ret[1]=v.y; ret[2]=v.z;

}




data = [0, 1, 2, 3, 4, 5, 6]



ret = [4, 5, 6]



According to the data I receive in the ret array, the float3 variable v consists of the floats at index positions 4-6, instead of 3-5. It seems that when the driver converts the float3 array access to byte offset, it incorrectly assumes that the array element size is 4*4 bytes instead of 3*4.



Using the 64-bit Linux driver version 295.49, running on a GeForce GTX 285.





Csaba

#1
Posted 05/04/2012 03:24 PM   
Hi Csaba!

Indeed, that is very frustrating, however I would discourage the use of any type aligned to 3*sizeof(), because GPUs don't handle it well. If on host side you ask the size of these types, you will get sizeof(cl_float3) = sizeof(cl_float4). So when even host side does not use it properly, it gets very messy, even if device would use it properly.

GPUs are highly optimized for types aligned to either 4 bytes, or 16 bytes. Even if you need only 3 variables in a vector, use float4 and just disregard the last element. If you don't want to waste the memory, use three independant arrays of floats.

Regards,
Máté
Hi Csaba!



Indeed, that is very frustrating, however I would discourage the use of any type aligned to 3*sizeof(), because GPUs don't handle it well. If on host side you ask the size of these types, you will get sizeof(cl_float3) = sizeof(cl_float4). So when even host side does not use it properly, it gets very messy, even if device would use it properly.



GPUs are highly optimized for types aligned to either 4 bytes, or 16 bytes. Even if you need only 3 variables in a vector, use float4 and just disregard the last element. If you don't want to waste the memory, use three independant arrays of floats.



Regards,

Máté

#2
Posted 05/09/2012 11:25 AM   
Wow, you're right. In fact, I was very wrong. I blindly assumed that the size of float3 was 12 bytes, but it is in fact 16 bytes, as clearly stated in the specification, section 6.1.5.

I apologize for falsely accusing NVIDIA out of sheer ignorance.

Csaba
Wow, you're right. In fact, I was very wrong. I blindly assumed that the size of float3 was 12 bytes, but it is in fact 16 bytes, as clearly stated in the specification, section 6.1.5.



I apologize for falsely accusing NVIDIA out of sheer ignorance.



Csaba

#3
Posted 05/17/2012 08:49 AM   
Scroll To Top