Problems with the summation of arrays There are no values ​​in the array
Hi all!
I am new in CUDA programming.
Wrote program to sum ​​two arrays in the third array.
For some reason, the target array C is always zero, even after adding .. do not tell what I'm doing wrong?
The source code Cuda.cu:
[code]
#include <iostream>
#include <cuda_runtime.h>
__global__ void sum(float *A, float *B, float *C)
{
int n = blockDim.x * blockIdx.x + threadIdx.x;
C[n] = A[n] + B[n];
}
void StartSum(float *A, float *B, float *C, int N)
{
sum<<< N/64, 64 >>>(A, B, C) ;
}
[/code]
The source code to initialize arrays and calling summation:
[code]
#include <windows.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_runtime_api.h>
#include <iostream>
#define N 5
void StartSum(float *A, float *B, float *C, int n);

int main()
{
float a[N] = {1,2,3,4,5}, b[N]={-2,-4,5,7,1}, c[N] = {0,0,0,0,0};
cudaError_t err;
float *dev_a , *dev_b , *dev_c ;
cudaSetDevice(0);
cudaMalloc((void**)&dev_a , sizeof (float)*N);
cudaMalloc((void**)&dev_b , sizeof (float)*N);
cudaMalloc((void**)&dev_c , sizeof (float)*N);
err = cudaMemcpy(dev_a, a, sizeof(float)*N, cudaMemcpyHostToDevice);
err = cudaMemcpy(dev_b, b, sizeof(float)*N, cudaMemcpyHostToDevice);
err = cudaMemcpy(dev_c, c, sizeof(float)*N, cudaMemcpyHostToDevice);
StartSum(dev_a, dev_b, dev_c, N);
err = cudaMemcpy(c, dev_c, sizeof(float), cudaMemcpyDeviceToHost);
for (int i = 0; i<N; i++)
std::cout<<c[i]<<" ";
std::cout<<std::endl;
system("PAUSE");
}
[/code]
In deriving the results always get zero .. std::cout<<c[i]<<" ";
Thank you in advance for your help.
Hi all!

I am new in CUDA programming.

Wrote program to sum ​​two arrays in the third array.

For some reason, the target array C is always zero, even after adding .. do not tell what I'm doing wrong?

The source code Cuda.cu:



#include <iostream>

#include <cuda_runtime.h>

__global__ void sum(float *A, float *B, float *C)

{

int n = blockDim.x * blockIdx.x + threadIdx.x;

C[n] = A[n] + B[n];

}

void StartSum(float *A, float *B, float *C, int N)

{

sum<<< N/64, 64 >>>(A, B, C) ;

}


The source code to initialize arrays and calling summation:



#include <windows.h>

#include <cuda.h>

#include <cuda_runtime.h>

#include <cuda_runtime_api.h>

#include <iostream>

#define N 5

void StartSum(float *A, float *B, float *C, int n);



int main()

{

float a[N] = {1,2,3,4,5}, b[N]={-2,-4,5,7,1}, c[N] = {0,0,0,0,0};

cudaError_t err;

float *dev_a , *dev_b , *dev_c ;

cudaSetDevice(0);

cudaMalloc((void**)&dev_a , sizeof (float)*N);

cudaMalloc((void**)&dev_b , sizeof (float)*N);

cudaMalloc((void**)&dev_c , sizeof (float)*N);

err = cudaMemcpy(dev_a, a, sizeof(float)*N, cudaMemcpyHostToDevice);

err = cudaMemcpy(dev_b, b, sizeof(float)*N, cudaMemcpyHostToDevice);

err = cudaMemcpy(dev_c, c, sizeof(float)*N, cudaMemcpyHostToDevice);

StartSum(dev_a, dev_b, dev_c, N);

err = cudaMemcpy(c, dev_c, sizeof(float), cudaMemcpyDeviceToHost);

for (int i = 0; i<N; i++)

std::cout<<c[i]<<" ";

std::cout<<std::endl;

system("PAUSE");

}


In deriving the results always get zero .. std::cout<<c[i]<<" ";

Thank you in advance for your help.

#1
Posted 04/20/2012 04:37 AM   
Hi,
your problem (at least your first problem) is that your kernel is never called since 5/64=0...
Should you check the pre-launch error status, you would have get an "invalid configuration argument" error due to this. To convince yourself, just try this:[code]
$ cat gridDim.cu
#include <stdio.h>
#include <cuda.h>

__global__ void foo() {
if (threadIdx.x==0)
printf("in kernel, gridDim and blockDim are %d %d\n", gridDim.x, blockDim.x);
}

int main() {
foo<<<0,1>>>();
printf("%s\n", cudaGetErrorString(cudaGetLastError()));
foo<<<1,1>>>();
printf("%s\n", cudaGetErrorString(cudaGetLastError()));
cudaDeviceSynchronize();
foo<<<5/64,64>>>();
printf("%s\n", cudaGetErrorString(cudaGetLastError()));
return 0;
}
$ nvcc -arch=sm_21 -o gridDim gridDim.cu
$ ./gridDim
invalid configuration argument
no error
in kernel, gridDim and blockDim are 1 1
invalid configuration argument
[/code]Then, I guess you'll also have to add a test somewhere in your kernel to avoid accessing out of bound data (like an "if(n<N)" test).
Hi,

your problem (at least your first problem) is that your kernel is never called since 5/64=0...

Should you check the pre-launch error status, you would have get an "invalid configuration argument" error due to this. To convince yourself, just try this:


$ cat gridDim.cu

#include <stdio.h>

#include <cuda.h>



__global__ void foo() {

if (threadIdx.x==0)

printf("in kernel, gridDim and blockDim are %d %d\n", gridDim.x, blockDim.x);

}



int main() {

foo<<<0,1>>>();

printf("%s\n", cudaGetErrorString(cudaGetLastError()));

foo<<<1,1>>>();

printf("%s\n", cudaGetErrorString(cudaGetLastError()));

cudaDeviceSynchronize();

foo<<<5/64,64>>>();

printf("%s\n", cudaGetErrorString(cudaGetLastError()));

return 0;

}

$ nvcc -arch=sm_21 -o gridDim gridDim.cu

$ ./gridDim

invalid configuration argument

no error

in kernel, gridDim and blockDim are 1 1

invalid configuration argument
Then, I guess you'll also have to add a test somewhere in your kernel to avoid accessing out of bound data (like an "if(n<N)" test).

#2
Posted 04/20/2012 05:06 AM   
Thank you very much, dear Gilles_C!
It worked! /biggrin.gif' class='bbc_emoticon' alt=':biggrin:' />
I will continue to study this interesting subject.
Thank you very much, dear Gilles_C!

It worked! /biggrin.gif' class='bbc_emoticon' alt=':biggrin:' />

I will continue to study this interesting subject.

#3
Posted 04/20/2012 01:12 PM   
Sorry for the stupid question ... sum two arrays happened.
How to find the sum of the elements of one array?
I tried to do this:
[code]__global__ void Summation(float *A, float *C)
{
int n = blockDim.x * blockIdx.x + threadIdx.x;

*C += A[n];
}[/code]
but this option does not work ..
---
I solved the problem this way:
[code]
for (int i=0; i<count; i++)
*C += A[n];
[/code]
---
Do you think this approach is correct, in terms of technology CUDA?
The problem it performs - the array is summed. But I doubt whether I came to this issue.
Thank you.
Sorry for the stupid question ... sum two arrays happened.

How to find the sum of the elements of one array?

I tried to do this:

__global__ void Summation(float *A, float *C) 

{

int n = blockDim.x * blockIdx.x + threadIdx.x;



*C += A[n];

}


but this option does not work ..

---

I solved the problem this way:



for (int i=0; i<count; i++)

*C += A[n];


---

Do you think this approach is correct, in terms of technology CUDA?

The problem it performs - the array is summed. But I doubt whether I came to this issue.

Thank you.

#4
Posted 04/20/2012 05:32 PM   
[quote name='Wisdom' date='20 April 2012 - 07:32 PM' timestamp='1334943147' post='1398794']
Sorry for the stupid question ... sum two arrays happened.
How to find the sum of the elements of one array?
I tried to do this:
[code]__global__ void Summation(float *A, float *C)
{
int n = blockDim.x * blockIdx.x + threadIdx.x;

*C += A[n];
}[/code]
but this option does not work ..
---
I solved the problem this way:
[code]
for (int i=0; i<count; i++)
*C += A[n];
[/code]
---
Do you think this approach is correct, in terms of technology CUDA?
The problem it performs - the array is summed. But I doubt whether I came to this issue.
Thank you.
[/quote]
Last approach won't help you much - sum of single array is performed by all cores not in parralel but serial way. Take a look at [url="http://www.csce.uark.edu/~mqhuang/courses/5013/f2011/lab/Lab-5-scan.pdf"]Parallel Reduction[/url].
[quote name='Wisdom' date='20 April 2012 - 07:32 PM' timestamp='1334943147' post='1398794']

Sorry for the stupid question ... sum two arrays happened.

How to find the sum of the elements of one array?

I tried to do this:

__global__ void Summation(float *A, float *C) 

{

int n = blockDim.x * blockIdx.x + threadIdx.x;



*C += A[n];

}


but this option does not work ..

---

I solved the problem this way:



for (int i=0; i<count; i++)

*C += A[n];


---

Do you think this approach is correct, in terms of technology CUDA?

The problem it performs - the array is summed. But I doubt whether I came to this issue.

Thank you.



Last approach won't help you much - sum of single array is performed by all cores not in parralel but serial way. Take a look at Parallel Reduction.

#5
Posted 04/27/2012 07:34 PM   
Scroll To Top