In trying to find a compact idiom for floating-point infinity for use in host device code, I observed a curious transformation applied by the front half of the CUDA compiler. I am using the CUDA 6.5 tool chain and would be interested to know whether this still reproduces with CUDA 7.5 RC if anybody here is inclined to give this a try. I started out with code handling ‘float’ data:
__global__ void kernel (float *out, const float *in)
{
*out = 1e38f * 1e38f;
}
The resulting PTX code is as I expected it, storing the single-precision infinity encoding 0x7F800000 into the output location:
.visible .entry _Z6kernelPfPKf(
.param .u64 _Z6kernelPfPKf_param_0,
.param .u64 _Z6kernelPfPKf_param_1
)
{
.reg .s32 %r<2>;
.reg .s64 %rd<3>;
ld.param.u64 %rd1, [_Z6kernelPfPKf_param_0];
cvta.to.global.u64 %rd2, %rd1;
mov.u32 %r1, 2139095040;
st.global.u32 [%rd2], %r1;
ret;
}
So I tried the analogous double-precision case,
__global__ void kernel (double *out, const double *in)
{
*out = 1e308 * 1e308;
}
and was blown away to find the compiler emits code that computes infinity at runtime as 1.0/0.0!
.visible .entry _Z6kernelPdPKd(
.param .u64 _Z6kernelPdPKd_param_0,
.param .u64 _Z6kernelPdPKd_param_1
)
{
.reg .s64 %rd<3>;
.reg .f64 %fd<3>;
ld.param.u64 %rd1, [_Z6kernelPdPKd_param_0];
cvta.to.global.u64 %rd2, %rd1;
mov.f64 %fd1, 0d0000000000000000;
rcp.rn.f64 %fd2, %fd1;
st.global.f64 [%rd2], %fd2;
ret;
}
The resulting SASS also contains this computation.
My current best attempt at a portable compact idiom for infinity is ((float)1e308*(float)1e308) but I have not fully tested that yet across all platforms.