So, I have the following code:
int main(void){
float host_sums[numberofBlocks];
float *dev_sums;
float totalSum = 0.0f;
cudaMalloc((void**)&dev_sums, sizeof(float) * totalofThreads);
__solve_trap<<<numberofBlocks, numberofThreads>>>(A,B,dev_sums);
cudaMemcpy(host_sums, dev_sums, sizeof(float) * totalofThreads, cudaMemcpyDeviceToHost);
for(int i = 0; i < numberofBlocks; i++) totalSum += host_sums[i]; //Error->*
totalSum -= trap_error;
printf("%f",totalSum);
cudaFree(dev_sums);
return 0;
}
When i run the program, i’ve got an error: ‘*’ are adding a nonzero number with host_sums[i]. I debugged the program, but after the cudaMemcpy, totalSum <> 0, so if i put float totalSum = 0 after that memcpy, it doesnt give me any error, immediately before the call to cudaMemcpy, I have totalSum==0, but immediately after it I have totalSum!=0
So, what’s the really problem initializing variables after any cudaMemcpy? what causes this error?
Edit: I had and error
cudaMemcpy(host_sums, dev_sums, sizeof(float) * numberofBlocks, cudaMemcpyDeviceToHost);
is the correct line