cudaMemcpy not working

I have problems with cudaMemcpying my data from host to gpu/device memory. I also have created a SO question , but have not got a satisfying answer.

However here is the code again:

CuArray.h:

#ifndef CUARRAY_H_
#define CUARRAY_H_

#include <cuda_runtime_api.h>
#include <cuda.h>
template<class TType>
class CuArray
{
public:

	int Rows;
	int Columns;
	int Elements;
	TType *ArrayPointer;

	CuArray<TType>(int rows, int columns = 1)
	{
		this->Rows = rows;
		this->Columns = columns;
		Elements = this->Rows * this->Columns;

		cudaMalloc(&this->ArrayPointer, sizeof(TType)*this->Elements);
	}

	virtual ~CuArray<TType>()
	{

	}

	static CuArray<TType>* GpuCreate(int rows, int columns = 1)
	{
		CuArray<TType>* cuArray = new CuArray<TType>(rows, columns);
		CuArray<TType>* gpuCuArray;
		size_t size = sizeof(CuArray<TType>);
		cudaMalloc(&gpuCuArray, size);
		cudaMemcpy(gpuCuArray, cuArray, size, cudaMemcpyHostToDevice);
		delete(cuArray);
		return gpuCuArray;
	}
};

#endif

main.cu

#include "CuArray.h"
#include <stdio.h>

int main()
{
	CuArray<int*>* array = CuArray<int*>::GpuCreate(10);
	printf("pointer location: %p\n", &array);
	printf("pointer address: %p", array);
	getchar();
}

The output of the prints is the following:

pointer location: 0x7fffffffd718
pointer address: 0x7053e0200

The problem is in the static GpuCreate and everyhing is working except the call of cudaMemcpy. I am watching the content of the gpuCuArray device pointer with the nsight ecplipse debugger from NVidia and every property is 0 except the memory location of the pointer. The location of the pointer is 0x7053e0200 which looks fine for me.

The values of cuArray are unequals 0 and as expected. So i think the cudaMemcpy is not working (or better: i am using it wrong somehow).

What am i going wrong?

If you need any further information then please tell me and i will try to provide whatever you think is useful.

There are at least 2 problems with your SO posting.

  1. it is unclear if you are concerned about an actual functional issue with your code, or just concerned about an observation you are making in the debugger.

  2. You have not provided an MCVE (it’s defined on SO, look it up). An MCVE should be a complete code, that someone else could compile and run without having to add anything, and see the issue. Most MCVEs written in C/C++ should include a main routine, for example. They should be complete. Furthermore, the expected behavior and the actual behavior must be defined. This should not really depend on use of a debugger, if you are concerned about the functional (input/output) behavior of your code, as opposed to just asking about a debugger observation

If your code has a functional issue, you should provide a MCVE on SO (or here) which defines the code, the actual behavior of your code, and the expected behavior. This last part should not depend on the debugger (unless you are merely asking about a debugger observation).

Since you’ve not clarified these things (either here or on SO), I’m personally not surprised that you have gotten an unsatisfying answer.

Okay. I edited this post with an complete reproducible example.

I am relatively sure that the observation is correct because i run into a segmentation fault later in my original code which is because ArrayPointer is 0 (nullptr).

But i can check that if I found out how I can read out the values “manually”

Nsight Eclipse Edition 7.5
nvcc Version build 8/11/2015
CUDA Card: GTX 960 (Compute Capability 5.2)
Display Card: GT 610
I am building PTX and GPU code for version 5.2

It seems to me that CaArray is a descriptor, not the actual matrix (which is pointed to by the ArrayPointer component of the descriptor). Your code is copying the descriptor with cudaMemcpy(), not the actual matrix. In other words, this is a C/C++ level problem, nothing specific to CUDA.

It seems to me that CuArray is a descriptor, not the actual matrix (which is pointed to by the ArrayPointer component of the descriptor). Your code copies the descriptor with cudaMemcpy(), not the actual matrix. In other words, this is a C/C++ level problem, nothing specific to CUDA.

I still can’t figure out what you are asking. It would help if you wrote a program that printed out values, and explain what you expected to see there as well as what you actually see.

If you think the cudaMemcpy is not working, why not check its return value? If it returns with no error, it is probably working.

If you think you are using it wrong, this is presumably by inspection of the data. Write a program that prints out the data, and indicate what you expect to see.

Thats 100% right. I want to copy that descriptor. The Array itself is allocated in the constructor of this descriptor. The problem is that nothing of this descriptor is copied according to the debugger. I am struggling at the moment with reading the “real values” assumed that the debugger shows me wrong ones.

CuArray is just wrapping the Column and Row of the Array behind ArrayPointer. It is like a Java or C# Array where you can get the length / size of the array. To make the accessing easier on the GPU i want to transfer this Array to the GPU to access Row and Column there, so i don’t have to pass it as a kernel parameter. I just have to pass CuArray as a parameter.

So i am creating the CuArray with CuArray* cuArray = new CuArray(rows, columns);
. But this CuArray instance is in the host memory. CuArray contains a pointer to a array which is in GPU memory (because i have called cudaMalloc in the constructor of CuArray). Now i want to transfer the CuArray instance cuArray into the GPU memory. So i am creating space of sizeof(CuArray) with cudaMemalloc and copying the data to the GPU memory with cudaMemcpy expecting that all properties (Row, Column, Elements and ArrayPointer) are the same in cuArray and cuGpuArray. But this is not the case. cudaMemalloc and cudaMemcpy both return CUDA_SUCCESS (or however it is called).
So the content of cuGpuArray seems to be 0. And my question is: What am i doing wrong? Because the malloc is running with success, the copy is running with success, but my data is not copied.

I have not tried to read the data manually without the debugger. i will post or edit when i have concrete values.

The nsight eclipse edition debugger is built on top of cuda-gdb.

cuda-gdb has a limitation that it will not correctly show device data until you are stopped at a breakpoint in device code (which means you have to launch a kernel - which your posted code is not doing):

http://stackoverflow.com/questions/6683721/check-global-device-memory-using-cuda-gdb

So I wouldn’t expect that inspection of device data via the debugger would be useful here.

Thank you. That is exactly the problem! The debugger shows in host code wrong or non updated values.

I updated the code as shown here:

main.cu

#include "CuArray.h"
#include <stdio.h>

__global__ void CheckData(CuArray<int*>* array)
{
	int row = array->Rows;
	int column = array->Columns;
	int element = array->Elements;
	int** arrayPtr = array->ArrayPointer;
}


int main()
{
	CuArray<int*>* array = CuArray<int*>::GpuCreate(10);
	printf("pointer location: %p\n", &array);
	printf("pointer address: %p", array);
	CheckData<<<1,1>>>(array);
	getchar();
}

txbob and nnjuffa, I really thank you for your patience and time. Thank You! Now i can debug correctly with this debug helper.