thrust::minmax_element on GPU produces different results than on CPU

gpu is a 1050Ti with compute capability 6.1, running cuda 8.0 on 64-bit windows 10.
The error is on the “min” side, had similar problem with thrust::reduce().
This happens with a certain dataset of floats, coming from an image luma value (HDR app)

Just to verify, I have written a very simple single threaded kernel to perform the min-reduction; its result agrees with the CPU.

The floats are not huge (or very small) values: the min values are like -6.61f vs -6.049f and the max is about 2.5f.

Anyone else had a similar experience with thurst?
Thanks in advance

Can you provide a self-contained reproducer code?

Hi Bob,

Thanks for the response.
The reproducer code may not be applicable, because I come across this behavior with some (not all) certain data sets (coming from images).
But here is a code snippet:

float minVal;
float maxVal;
//luminance is a float device pointer used with cudaMalloc() and actually been processed by a custom kernel
//length is the size of device memory pointed to by luminance

thrust::pair<thrust::device_ptr, thrust::device_ptr> tuple;
tuple = thrust::minmax_element(thrust::device,
thrust::device_pointer_cast((float*)luminance),
thrust::device_pointer_cast((float*)luminance) + length);
CHECK_CUDA_ERROR(errCode, functionName, “thrust::minmax_element failed”);
minVal = *(tuple.first);
maxVal = *(tuple.second);