kernel index bug?

supersymmetry · October 21, 2017, 7:16am

For a device function like this:

void __global__ ComputeOutput(float * const C,int const num_in) 
{
	// Grid-Stride Loops
        // learnt from https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/
	for (int j_ = blockIdx.x * blockDim.x + threadIdx.x;
			 j_ <  num_in;      
			 j_ += blockDim.x * gridDim.x) {
		C[  j_ ] = float(j_);  
	}
}

is it possible that the output C[j] != j?

I encountered this issue: Most C[j] are j, but a few of them are not j.

The bug is present even if I launch the kernel with 1 thread

ComputeOutput<<<1,1>>>(  d_C, num_in);

You can reproduce the error using my code:

My environment is Matlab 2017a, Ubuntu 16.04 64-bit, CUDA-8.0, Tesla K80.

Update: I do find the error only occurs when j is relatively large (in the order of 16 millions). It’s common for me to deal with such large numbers.

Robert_Crovella · October 21, 2017, 2:00pm

At around 16 million you’ll reach the limit of what can reliably be stored in a float quantity, if you want to test for exact equality with an equivalent integer. This is not unique to CUDA. A float quantity has around 23 bits of mantissa. As a simple test, try it with C as a double array. double should have around 53 bits of mantissa, so you should be able to test for equality beyond 4 billion.

You may want to learn more about the use of floating point arithmetic in computers.

[url]Floating Point and IEEE 754 :: CUDA Toolkit Documentation

Beyond the above comments, testing for exact equality of floating point values has a variety of challenges.

supersymmetry · October 21, 2017, 3:02pm

Thanks! I forgot about floating point precision limit…