I haven’t used this code yet but thought it was kind of an interesting hack… I’ll leave it here.
I’m guessing others have come up with similar hacks but I was too lazy to look. :)
Problem:
I’d like to somehow pack two 32-bit floats into a 32-bit word.
I know the floats are saturated with range [0.0,1.0].
Losing lots of precision over time is entirely OK.
Solution:
The trick is to recognize that the middle 16 bits of an IEEE-754 32-bit float are where the hack should focus (marked in blue):
Notice the bit patterns between 1.0f and 1.9999999f only vary in the mantissa:
1.0000000 = 0x3f800000
1.9999999 = 0x3fffffff
2.0000000 = 0x40000000
Then notice that:
2.0 - 1.0 = 0x3f800000
A hack presents itself…
If you shift [0.0,1.0] to [1.0,2.0], treat it as a 32-bit unsigned int, subtract 0x3f800000 and select the middle 16 bits of the word then you have 15 bits of mantissa and a leading bit that is only 1 if the original value was 1.0.
You can then select these middle 16 bits and place them in their destination:
void pxl_sat16v2_pack_lo(uint32_t* const sat16v2, const float v)
{
const uint32_t t = __float_as_uint(v + PXL_SAT16_PACK_MANTISSA_BIAS) - 0x3f800000;
asm("prmt.b32 %0, %1, %2, 0x3265;" : "=r"(*sat16v2) : "r"(*sat16v2), "r"(t));
}
Unpacking is similar… select a 16-bit unsigned short from the 32-bit word, add 0x3f800000, treat it as a float, subtract 1.0 and return a float.
float pxl_sat16v2_unpack_lo(const uint32_t sat16v2)
{
uint32_t d;
asm("prmt.b32 %0, %1, 0x0, 0x7104;" : "=r"(d) : "r"(sat16v2));
return __uint_as_float(d + 0x3f800000) - 1.0f;
}
Is this a good hack? Is it faster than a fp32<>fp16 conversion?
I have no idea… I never used it or dwelled on its limitations but it might be useful to someone. :)
I dumped some source code here.