Greetings,
I’m having some trouble to understand if I got something wrong in my programming or if there’s an unclear issue (to me) on copying 2D data between host and device. I’m using cudaMallocPitch() to allocate memory on device side. I want to check if the copied data using cudaMemcpy2D() is actually there.
Here is the example code (running in my machine):
#include <iostream>
using namespace std;
#define CUDA_SAFE(x) do { if (( x ) != cudaSuccess) { \
cout << "Error at " << __FILE__ << " : " \
<< __LINE__ << endl \
<< cudaGetErrorString( x ) << endl;\
return -1 ;}} while (0)
int main( int argc, char** argv)
{
size_t i, j, Row, Col, pitch;
float **devP, **hostP;
Row = 8;
Col = 16;
hostP = new float*[Row];
for (i = 0; i < Row; i++) hostP[i] = new float[Col];
//-- allocate device memory
CUDA_SAFE(
cudaMallocPitch(&devP, &pitch, Col*sizeof(float), Row)
);
//-- initialize host matrix (Row X Col)
for (i = 0; i < Row; i++)
for (j = 0; j < Col; j++)
hostP[i][j] = (float)i+j;
//-- print host information
cout << " Before ========= " << endl;
for (i = 0; i < Row; i++)
{
cout << "[" << i << "] ";
for (j = 0; j < Col; j++)
cout << hostP[i][j] << " ";
cout << endl;
}
//-- copy host matrix to device
CUDA_SAFE(
cudaMemcpy2D(devP, pitch, hostP, Col*sizeof(float),
Col*sizeof(float), Row, cudaMemcpyHostToDevice)
);
//-- destroy host information
for (i = 0; i < Row; i++)
for (j = 0; j < Col; j++)
hostP[i][j] = 99.0f;
//-- copy back device to host matrix
CUDA_SAFE(
cudaMemcpy2D(hostP, Col*sizeof(float), devP, pitch,
Col*sizeof(float), Row, cudaMemcpyDeviceToHost)
);
//-- print updated host information
cout << " After ========= " << endl;
for (i = 0; i < Row; i++)
{
cout << "[" << i << "] ";
for (j = 0; j < Col; j++)
cout << hostP[i][j] << " ";
cout << endl;
}
//-- free memory (device and host)
cudaFree(devP);
delete [] hostP;
return 0;
}
This is what I get:
Before sending to device
Before =========
[0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[2] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[3] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[4] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[5] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
[6] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
[7] 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
After destroying host side information and copying device’s backup
After =========
[0] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[2] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[3] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[4] 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[5] 5 6 7 8 9 10 11 12 99 99 99 99 99 99 99 99
[6] 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
[7] 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99 99
If one changes Row and/or Col values, he’ll see that there is no visible pattern to this behavior. At least no pattern I could find… :-(
If somebody just could help me, I think I’m blind to the reason my code gets this behavior…
At least, could somebody run this code in his machine and see if it presents the same behavior?
I’m using:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Thu_Apr__5_00:24:31_PDT_2012
Cuda compilation tools, release 4.2, V0.2.1221