wumpus
1
If you try to use the following macro in OpenCL, the compiler borks out with an error message:
#define swap(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
Error:
Code selection failed to select: i32 = bswap 0xXXXXXXXX
I tried to deceive it by doing.
/* Alternative swap */
uint a = (x << 16) | (x >> 16);
uint b = ((a & 0x00FF00FF) << 8) | ((a & 0xFF00FF00) >> 8);
But this gives the same problem. It’s just too smart :) This can be reproduced with a basic kernel
__kernel void buggy(const uint input, __global uint * output)
{
output[0] = swap(input);
}
Version/card info (this should be the latest driver available from the site):
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 8600 GT/PCI/SSE2/3DNOW!
OpenGL version string: 3.3.0 NVIDIA 260.19.21
My cards supports sm_11…
wumpus
2
If you try to use the following macro in OpenCL, the compiler borks out with an error message:
#define swap(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )
Error:
Code selection failed to select: i32 = bswap 0xXXXXXXXX
I tried to deceive it by doing.
/* Alternative swap */
uint a = (x << 16) | (x >> 16);
uint b = ((a & 0x00FF00FF) << 8) | ((a & 0xFF00FF00) >> 8);
But this gives the same problem. It’s just too smart :) This can be reproduced with a basic kernel
__kernel void buggy(const uint input, __global uint * output)
{
output[0] = swap(input);
}
Version/card info (this should be the latest driver available from the site):
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 8600 GT/PCI/SSE2/3DNOW!
OpenGL version string: 3.3.0 NVIDIA 260.19.21
My cards supports sm_11…
- 1 on here, i have the same problem since i work with opencl! but i had some more luck, this is not detected by the optimizer:
uint swap_workaround (uint v)
{
return (v & 0x0000000F) << 24 |
(v & 0x000000F0) << 24 |
(v & 0x00000F00) << 8 |
(v & 0x0000F000) << 8 |
(v & 0x000F0000) >> 8 |
(v & 0x00F00000) >> 8 |
(v & 0x0F000000) >> 24 |
(v & 0xF0000000) >> 24;
}
- 1 on here, i have the same problem since i work with opencl! but i had some more luck, this is not detected by the optimizer:
uint swap_workaround (uint v)
{
return (v & 0x0000000F) << 24 |
(v & 0x000000F0) << 24 |
(v & 0x00000F00) << 8 |
(v & 0x0000F000) << 8 |
(v & 0x000F0000) >> 8 |
(v & 0x00F00000) >> 8 |
(v & 0x0F000000) >> 24 |
(v & 0xF0000000) >> 24;
}
wumpus
5
Thanks, that indeed seems to work :)
Then again, I still hope someone from NVIDIA will read this and fix the bug.
wumpus
6
Thanks, that indeed seems to work :)
Then again, I still hope someone from NVIDIA will read this and fix the bug.
Here’s a nice way to do this:
as_int(as_char4(data[tid]).wzyx)
Here’s a nice way to do this:
as_int(as_char4(data[tid]).wzyx)