cublasDasum with device result

Hello,

I’m trying to use cublasDasum to sum a vector. I can get it to work properly if it returns the result to a host variable. I then tried to allocate a double on the device and returning the result to the device but it results in a segmentation fault. The documentation says this function supports returns to the host or device. The function declaration looks like:

cublasDasum(cublasHandle_t handle, int n, const double *x, int incx, double *result);

And I allocated the result using:

double result;
cudaMalloc((void
*)&result, sizeof(double));

Where the result (according to the documentation) can be either on the host or device. Anyone have any ideas what I might be doing wrong? Thanks.

Ok, so I decided to do the most pragmatic thing and actually read the documentation. The above code works after using:

cublasSetPointerMode(handle, CUBLAS_POINTER_MODE_DEVICE)

Also just fyi cublas functions are asynchronous. This was giving me other problems.