Problems with the summation of arrays There are no values ​​in the array

Hi all!

I am new in CUDA programming.

Wrote program to sum ​​two arrays in the third array.

For some reason, the target array C is always zero, even after adding … do not tell what I’m doing wrong?

The source code Cuda.cu:

#include <iostream>

#include <cuda_runtime.h>

__global__ void sum(float *A, float *B, float *C) 

{

    int n = blockDim.x * blockIdx.x + threadIdx.x;

    C[n] = A[n] + B[n];

}

void StartSum(float *A, float *B, float *C, int N)

{

    sum<<< N/64, 64 >>>(A, B, C) ;

}

The source code to initialize arrays and calling summation:

#include <windows.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cuda_runtime_api.h>

#include <iostream>

#define N 5

void StartSum(float *A, float *B, float *C, int n);

int main() 

{

    float a[N] = {1,2,3,4,5}, b[N]={-2,-4,5,7,1}, c[N] = {0,0,0,0,0};

    cudaError_t err;

    float *dev_a , *dev_b , *dev_c ;

    cudaSetDevice(0);

    cudaMalloc((void**)&dev_a , sizeof (float)*N);

    cudaMalloc((void**)&dev_b , sizeof (float)*N);

    cudaMalloc((void**)&dev_c , sizeof (float)*N);

    err = cudaMemcpy(dev_a, a, sizeof(float)*N, cudaMemcpyHostToDevice);

    err = cudaMemcpy(dev_b, b, sizeof(float)*N, cudaMemcpyHostToDevice);

    err = cudaMemcpy(dev_c, c, sizeof(float)*N, cudaMemcpyHostToDevice);

    StartSum(dev_a, dev_b, dev_c, N);

    err = cudaMemcpy(c, dev_c, sizeof(float), cudaMemcpyDeviceToHost);

    for (int i = 0; i<N; i++)

        std::cout<<c[i]<<" ";

    std::cout<<std::endl;

    system("PAUSE");

}

In deriving the results always get zero … std::cout<<c[i]<<" ";

Thank you in advance for your help.

Hi,

your problem (at least your first problem) is that your kernel is never called since 5/64=0…

Should you check the pre-launch error status, you would have get an “invalid configuration argument” error due to this. To convince yourself, just try this:

$ cat gridDim.cu

#include <stdio.h>

#include <cuda.h>

__global__ void foo() {

	if (threadIdx.x==0)

		printf("in kernel, gridDim and blockDim are %d %d\n", gridDim.x, blockDim.x);

}

int main() {

	foo<<<0,1>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	foo<<<1,1>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	cudaDeviceSynchronize();

	foo<<<5/64,64>>>();

	printf("%s\n", cudaGetErrorString(cudaGetLastError()));

	return 0;

}

$ nvcc -arch=sm_21 -o gridDim gridDim.cu

$ ./gridDim 

invalid configuration argument

no error

in kernel, gridDim and blockDim are 1 1

invalid configuration argument

Then, I guess you’ll also have to add a test somewhere in your kernel to avoid accessing out of bound data (like an “if(n<N)” test).

Thank you very much, dear Gilles_C!
It worked! External Image
I will continue to study this interesting subject.

Sorry for the stupid question … sum two arrays happened.

How to find the sum of the elements of one array?

I tried to do this:

__global__ void Summation(float *A, float *C) 

{

	int n = blockDim.x * blockIdx.x + threadIdx.x;

	*C += A[n];

}

but this option does not work …


I solved the problem this way:

for (int i=0; i<count; i++)

    *C += A[n];

Do you think this approach is correct, in terms of technology CUDA?

The problem it performs - the array is summed. But I doubt whether I came to this issue.

Thank you.

Last approach won’t help you much - sum of single array is performed by all cores not in parralel but serial way. Take a look at Parallel Reduction.