Code selection failed to select: i32 = bswap 0xXXXXXXXX OpenCL bug

If you try to use the following macro in OpenCL, the compiler borks out with an error message:

#define swap(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )

Error:

Code selection failed to select: i32 = bswap 0xXXXXXXXX

I tried to deceive it by doing.

/* Alternative swap */

        uint a = (x << 16) | (x >> 16);

        uint b = ((a & 0x00FF00FF) << 8) | ((a & 0xFF00FF00) >> 8);

But this gives the same problem. It’s just too smart :) This can be reproduced with a basic kernel

__kernel void buggy(const uint input, __global uint * output)

{

    output[0] = swap(input);

}

Version/card info (this should be the latest driver available from the site):

OpenGL vendor string: NVIDIA Corporation

OpenGL renderer string: GeForce 8600 GT/PCI/SSE2/3DNOW!

OpenGL version string: 3.3.0 NVIDIA 260.19.21

My cards supports sm_11…

If you try to use the following macro in OpenCL, the compiler borks out with an error message:

#define swap(x) ( ((x) << 24) | (((x) << 8) & 0x00ff0000) | (((x) >> 8) & 0x0000ff00) | ((x) >> 24) )

Error:

Code selection failed to select: i32 = bswap 0xXXXXXXXX

I tried to deceive it by doing.

/* Alternative swap */

        uint a = (x << 16) | (x >> 16);

        uint b = ((a & 0x00FF00FF) << 8) | ((a & 0xFF00FF00) >> 8);

But this gives the same problem. It’s just too smart :) This can be reproduced with a basic kernel

__kernel void buggy(const uint input, __global uint * output)

{

    output[0] = swap(input);

}

Version/card info (this should be the latest driver available from the site):

OpenGL vendor string: NVIDIA Corporation

OpenGL renderer string: GeForce 8600 GT/PCI/SSE2/3DNOW!

OpenGL version string: 3.3.0 NVIDIA 260.19.21

My cards supports sm_11…

  • 1 on here, i have the same problem since i work with opencl! but i had some more luck, this is not detected by the optimizer:

uint swap_workaround (uint v)
{
return (v & 0x0000000F) << 24 |
(v & 0x000000F0) << 24 |
(v & 0x00000F00) << 8 |
(v & 0x0000F000) << 8 |
(v & 0x000F0000) >> 8 |
(v & 0x00F00000) >> 8 |
(v & 0x0F000000) >> 24 |
(v & 0xF0000000) >> 24;
}

  • 1 on here, i have the same problem since i work with opencl! but i had some more luck, this is not detected by the optimizer:

uint swap_workaround (uint v)
{
return (v & 0x0000000F) << 24 |
(v & 0x000000F0) << 24 |
(v & 0x00000F00) << 8 |
(v & 0x0000F000) << 8 |
(v & 0x000F0000) >> 8 |
(v & 0x00F00000) >> 8 |
(v & 0x0F000000) >> 24 |
(v & 0xF0000000) >> 24;
}

Thanks, that indeed seems to work :)

Then again, I still hope someone from NVIDIA will read this and fix the bug.

Thanks, that indeed seems to work :)

Then again, I still hope someone from NVIDIA will read this and fix the bug.

Here’s a nice way to do this:

as_int(as_char4(data[tid]).wzyx)

Here’s a nice way to do this:

as_int(as_char4(data[tid]).wzyx)