__byte_perm glsl

Is it possible to optimize this?

uint __byte_perm(in uint a, in uint b, in uint slct)
{
    uint i0 = (slct >>  0) & 0x7;
    uint i1 = (slct >>  4) & 0x7;
    uint i2 = (slct >>  8) & 0x7;
    uint i3 = (slct >> 12) & 0x7;

    return (((((i0 < 4) ? (a >> (i0*8)) : (b >> ((i0-4)*8))) & 0xff) <<  0) +
          ((((i1 < 4) ? (a >> (i1*8)) : (b >> ((i1-4)*8))) & 0xff) <<  8) +
          ((((i2 < 4) ? (a >> (i2*8)) : (b >> ((i2-4)*8))) & 0xff) << 16) +
          ((((i3 < 4) ? (a >> (i3*8)) : (b >> ((i3-4)*8))) & 0xff) << 24));
}

CUDA has __byte_perm function. Are there any Nvidia specific OpenGL extensions which offer this functionality. My code is many times slower and GLSL compiler doesn’t optimize things like these.