My program requires an atomic add operation to accumulate absorbed “energy” by each primitive. I have been running into precision issues using floats, so I have been trying to implement a double precision version. Since I have compute capability <6.0, it is recommended to implement atomicAdd for doubles using atomicCAS as suggested below
#if __CUDA_ARCH__ < 600
__device__ double atomicAdd(double* address, double val) {
unsigned long long int* address_as_ull = (unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed, __double_as_longlong(val + __longlong_as_double(assumed)));
// Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN) }
while (assumed != old);
return __longlong_as_double(old);
}
#endif
Using this code will successfully compile and generate the ptx, however it fails the OptiX compilation during runtime with the following error:
OptiX Error: Unknown error (Details: Function “RTresult _rtProgramCreateFromPTXFile(RTcontext, const char*, const char*, RTprogram_api**)” caught exception: Compile Error: Cannot perform indirect call to functions which are not callable programs. at: [ Instruction: %352 = call i64 bitcast (float (%“struct.cort::CanonicalState”, %“struct.cort::UberPointer”, i32, i1, float, float) @_UberPointerMemAtomicOp to i64 (%“struct.cort::CanonicalState”, %“struct.cort::UberPointer”, i32, i1, i64, i64))(%“struct.cort::CanonicalState”* %0, %“struct.cort::UberPointer” %339, i32 7, i1 false, i64 %“%rd196.0”, i64 %351), contained in basic block: BB1_11, in function: _Z19closest_hit_diffusev_cp7, in module: Canonical__Z19closest_hit_diffusev
It seems to be failing because of the call to atomicCAS, because if I comment out only that line it will run without error.
Please let me know if you have any ideas on how to get around this. I am using OpitX version 4.0.1 with CUDA 8.0 and compute capability 3.5 GPU.