Is it possible to optimize this?
uint __byte_perm(in uint a, in uint b, in uint slct)
{
uint i0 = (slct >> 0) & 0x7;
uint i1 = (slct >> 4) & 0x7;
uint i2 = (slct >> 8) & 0x7;
uint i3 = (slct >> 12) & 0x7;
return (((((i0 < 4) ? (a >> (i0*8)) : (b >> ((i0-4)*8))) & 0xff) << 0) +
((((i1 < 4) ? (a >> (i1*8)) : (b >> ((i1-4)*8))) & 0xff) << 8) +
((((i2 < 4) ? (a >> (i2*8)) : (b >> ((i2-4)*8))) & 0xff) << 16) +
((((i3 < 4) ? (a >> (i3*8)) : (b >> ((i3-4)*8))) & 0xff) << 24));
}
CUDA has __byte_perm function. Are there any Nvidia specific OpenGL extensions which offer this functionality. My code is many times slower and GLSL compiler doesn’t optimize things like these.