Point to Point vector Multiply

Hello all,

I am looking for a point to point multiplication API call. I am going to use it to implement a convolution by doing two FFTs, point to point multiplication, then a single IFFT.

I copied my kernel below that I am using. Simply if a = {1 2 3 4} and b = {4 5 6 7} the output of my Kernel would be output = {14 25 36 47} = {4 10 18 28}. My total Convolution takes 63ms but my pointMultiply is a whopping 30ms of that time!!!

Is there a cublas API that does this?

// Complex multiplication
static __device__ __host__ inline cuComplex ComplexMul(cuComplex a, cuComplex b)
{
	cuComplex c;
	c.x = a.x * b.x - a.y * b.y;
	c.y = a.x * b.y + a.y * b.x;
	return c;
}

__global__ void pointMultiply(cuComplex *a, const cuComplex *b, int size)
{
	int i = blockIdx.x * blockDim.x + threadIdx.x;
	if(i >= size)
		return;

	a[i] = ComplexMul(a[i], b[i]);
}