The kernel always returns values equal to zero

barbaro2014 · November 13, 2014, 12:58pm

I am beginner in CUDA theme. I am trying to add two 10-element vectors, but the result is always zero. I do not understand why this happens. I show my complete code, , this is very simple.

#include “stdafx.h”
#include <stdio.h>
#include <time.h>
#include <conio.h>
#include
#include <cuda.h>
#include <cuda_runtime.h>

using namespace std;

#define N 10

global void Suma_vec( int *a, int *b, int *c, int n )
{
int tid = threadIdx.x; // Identificador del thread
c[tid] = a[tid] + b[tid];
}

int main(void)
{
int A[N], B[N], C[N];
int *dA, *dB, *dC;
srand (time(NULL));

//Se crea el vector A
for(int i=0; i<N; i++)
   A[i] = rand() % 101; 

//Se crea la matriz B
for(int i=0; i<N; i++)
   B[i] = rand() % 101; 

//Se reserva memoria en la GPU
cudaMalloc( (void**)&dA, N * sizeof(int)); 
cudaMalloc( (void**)&dB, N * sizeof(int)); 
cudaMalloc( (void**)&dC, N * sizeof(int)); 

//Se copian los vectores A y B en la GPU
cudaMemcpy( dA, A, N * sizeof(int), cudaMemcpyHostToDevice);
cudaMemcpy( dB, B, N * sizeof(int), cudaMemcpyHostToDevice);

Suma_vec<<<1,N>>>(dA, dB, dC, N);

//Se copia el resultado obtenido (GPU) en el vector C de la CPU
cudaMemcpy( C, dC, N * sizeof(int), cudaMemcpyDeviceToHost);

for (int j=0; j<N; j++)
	{
	cout<<A[j]<<"\t"<<B[j]<<"\t"<<C[j]<<endl; 
	}

cudaFree( dA);
cudaFree( dB);
cudaFree( dC);

getch();
return 0;

}

hadschi118 · November 13, 2014, 1:04pm

Your code works on my machine (GTX580, CUDA6.5). I only removed headers that I do not have (stdafx.h, conio.h) and the call to getch().

barbaro2014 · November 13, 2014, 1:14pm

I use a GeForce 8400 GS, CUDA 6.5 and Visual Studio 10. The results are always zero. The data is copied well in GPU memory, if instead of

cudaMemcpy (C, dC, N * sizeof (int), cudaMemcpyDeviceToHost);

use

cudaMemcpy (C, dA, N * sizeof (int), cudaMemcpyDeviceToHost);

Then the first and third columns are equal. The problem is that it not takes the sum. Can you help me ?

hadschi118 · November 13, 2014, 1:58pm

What you should do in any case is to check for errors after the kernel call with something like

cudaDeviceSynchronize();
cudaError_t error = cudaGetLastError();
if(error!=cudaSuccess)
{
   fprintf(stderr,"ERROR: %s\n", cudaGetErrorString(error) );
   exit(-1);
}

barbaro2014 · November 13, 2014, 2:13pm

ERROR: invalid device function

What does this mean?

Robert_Crovella · November 13, 2014, 3:14pm

Your GeForce8400 GS is a compute capability 1.1 GPU:

[url]https://developer.nvidia.com/cuda-gpus[/url]

If you are using CUDA 6.5, and provide no arch switches, the default compilation target is cc2.0, which won’t run on your GPU (invalid device function).

You will need to specifically target a cc1.1 GPU when you compile. When you do so, CUDA will provide some warning messages that cc1.1 is deprecated, but the compiler will still work.

There are many resources on the web which explain how to target a different compute capability in visual studio.

In a nutshell, you should be able to go into your project properties…CUDA C/C++ properties…Device, and change the target to compute_11,sm_11

barbaro2014 · November 13, 2014, 3:28pm

Now if it works, thank you very much and congratulations on your excellent forum…

SnehaShankar · January 31, 2015, 12:53pm

Hi, we tried the above steps and it is working fine with Visual Studio compilation. But we are trying to compile the same using the command window and the result is back to 0. Can you please suggest what should be done in order to get the right answer. Which compiler should be used. We are currently using the cl.exe application found in C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin
Awaiting your response. Thanks

mahtab_fooladi · January 25, 2018, 10:42am

Hi i have simular problem
My machine is geforce 610m and cuda9
I dont know what sm and compute is better
Please help me

mahtab_fooladi · January 25, 2018, 2:12pm

Cc is 2.1 for my gpu

Robert_Crovella · February 2, 2018, 6:26am

CUDA 9 won’t work with that GPU