Call cublas API from kernel

sim1 · December 8, 2015, 10:02am

Hi, i want to execute the cublas api from kernel, actually my configuration launch 2 blocks (it’s only an example) and execute a kernel like this (it’s compute a matrix vector multiplication and each thread compute a dot product):

__global__ void product(double *dev_a0, double *dev_a1, double *dev_A0, double *dev_A1, double *result, int max, int n){
	int i;
	double prod = 0.0;

	for (i = 0; i < max; i++) {
		if(blockIdx.x == 0) {
                        //i want to call cublas dotproduct here!!!
		        prod = prod + dev_a0[i] * dev_A0[i + n * threadIdx.x];
	        }
		else if(blockIdx.x == 1) {
                        //i want to call cublas dotproduct here!!!
			prod = prod + dev_a1[i] * dev_A1[i + n * threadIdx.x];
		}
	}
	__syncthreads();
        //each block write the result in a column
	result[threadIdx.x + n * blockIdx.x] = prod;
}

It’s possible to call cublas API for dot product?

Robert_Crovella · December 8, 2015, 7:04pm

Yes, you can use the cublas API from kernel code if you are running on a compute capability 3.5 device or higher as mentioned in the documentation:

[url]http://docs.nvidia.com/cuda/cublas/index.html#device-api[/url]

The simpleDevLibCublas cuda sample code/project should be instructive:

[url]CUDA Samples :: CUDA Toolkit Documentation

sim1 · December 8, 2015, 7:47pm

Ok txbob thanks to this reply. I am concerned that placing the call of the cublas API, in the IF could create a divergence in the kernel execution. My question then is: is correct handle the execution on the multiprocessors (so checking the IDs of the blocks)?

Robert_Crovella · December 8, 2015, 8:03pm

Any if statement could cause divergence. That is true with or without CUBLAS, with or without dynamic parallelism.

I don’t understand your question: