getting wrong results when calling cublas in coupling with C++/CLI and C#

I have written a wrapper in C++11/CLI with Visual Studio to use CUDA’s CuBLAS. I am using CUDA Toolkit 7.0.

Here is the source code of my wrapper:

#pragma once

#include "stdafx.h"
#include "BLAS.h"
#include "cuBLAS.h"

namespace lab
{
    namespace Mathematics
    {
	    namespace CUDA
	    {
		   
		    void BLAS::DAXPY(int n, double alpha, const array<double> ^x, int incx, array<double> ^y, int incy)
		    {
			    pin_ptr<double> xPtr = &(x[0]);
				pin_ptr<double> yPtr = &(y[0]);
     			pin_ptr<double> alphaPtr = α

		    	cuBLAS::DAXPY(n, alphaPtr, xPtr, incx, yPtr, incy);
		    }
       }
   }
}

To test this code, I wrote the following test in C#:

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Linq;
using lab.Mathematics.CUDA;

namespace lab.Mathematics.CUDA.Test
{
  [TestClass]
  public class TestBLAS
  {
    [TestMethod]
    public void TestDAXPY()
    {
        var count = 10;
        var alpha = 1.0;
        var a = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();
        var b = Enumerable.Range(0, count).Select(x => Convert.ToDouble(x)).ToArray();

        // Call CUDA
        BLAS.DAXPY(count, alpha, a, 1, b, 1);

        // Validate results
        for (int i = 0; i < count; i++)
        {
            Assert.AreEqual(i + i, b[i]);
        }
    }
  }
}

The program compiles with x64 architecture with no error. But the results I get are different every time I run the test. More precisely, the array b is the result and it has different values every time. And I don’t know why.

I am Also adding my cuda code maybe there, someone can find a problem. note that I don’t get any error, warning whatsoever while compiling. I am also wondering maybe I have to do some changes in the compilation while I did nothing and used the default options.

void cuBLAS::DAXPY(int n, const double *alpha, const double *x, int incx, double *y, int incy)
		{
			// Allocate GPU memory
			double *devX, *devY;
			cudaMalloc((void **)&devX, (size_t)n*sizeof(*devX));
			cudaMalloc((void **)&devY, (size_t)n*sizeof(*devY));

			// Create cuBLAS handle
			cublasHandle_t handle;
			cublasCreate(&handle);

			// Initialize the input matrix and vector
			cublasSetVector(n, sizeof(*devX), x, incx, devX, incx);

			// Call cuBLAS function
			cublasDaxpy(handle, n, alpha, devX, incx, devY, incy);

			// Retrieve resulting vector
			cublasGetVector(n, sizeof(*devY), devY, incy, y, incy);

			// Free GPU resources
			cudaFree(devX);
			cudaFree(devY);
			cublasDestroy(handle);
		}

Hi afshiinzkh,

This is Nsight visual studio forum, for cuda programming question you can ask it at CUDA Programming and Performance forum, for cublas queston you can ask it at GPU-Accelerated Libraries forum.

Best Regards