I am using CUDA 10 with Visual Studio 2017 (15.8.9, latest available now). The GPU is a GTX 1050 Ti Max-Q in a laptop. When I create a new CUDA project I get the simple “addWithCuda” example that adds two vectors together. The example code uses cudaMalloc to allocate memory and then it copies the two vectors from host memory to device memory. If I compile and run that, it executes fine.
However, if I change the calls to cudaMalloc to calls to cudaMallocManaged, and I replace cudaMemcpy with simple memcpy, when I run the program my whole system freezes and I have to hard reboot it. This doesn’t happen always, e.g. I may be able to run the program twice and then at the third run the freezing behavior occurs (same exact executable, with no recompilation in between).
Am I doing something wrong here? And even if I am, is it normal that this freezes the whole system to the point that I have to reboot it?
Remove the memcpy calls from the code.
Memory allocated with cudaMallocManaged is automatically moved between gpu and host depending on where you reference it from.
You don’t need memcpy in this particular case because you can totally remove testVector from the code and instead just use newMem. Since it is allocated with cudaMallocManaged, it can be directly accessed from the host or from the gpu without further copy. After allocating the memory, just assign whatever values you want to this array;
Remove this void** cast from cudaMallocManaged call;
Don’t use memcpy from the host to copy stuff between host and device, memcpy doesn’t know about the memory addressing in the device side. For this, you have to use cudaMemcpy, and because you are using cudaMallocManaged, the step of copying data between host and device is already taken care. You can, however, use memcpy from within a kernel function because now it is the device calling its own memcpy function that is aware of the memory addressing in the device.
Try this instead:
int length = 5, *managed_array;
cudaMallocManaged(&managed_array, length * sizeof(int)); // Allocates managed memory for 5 int elements
cudaMemset(managed_array, 0, length * sizeof(int)); // Initializes the memory with 0
for(int i = 0; i < length; i++); // Assigns some values to the array in host side
managed_array[i] = i;
// Call your kernel function to do something
// cudaDeviceSynchronize();
// Do something else with managed_array in host side if you want
cudaFree(managed_array); // We are done, deallocate the memory
Add your error checking to the code, I omitted it for simplicity.
Thanks for the input, I have reduced my code a minimal contained example that still exhibits the freezing behavior:
__global__ void testKernel(int* arr)
{
int i = threadIdx.x;
arr[i] = arr[i] * 2;
}
int main()
{
int length = 5, *managed_array;
cudaCheckError(cudaMallocManaged(&managed_array, length * sizeof(int))); // Allocates managed memory for 5 int elements
cudaCheckError(cudaMemset(managed_array, 0, length * sizeof(int))); // Initializes the memory with 0
for (int i = 0; i < length; i++) // Assigns some values to the array in host side
managed_array[i] = i;
testKernel<<<1, length>>> (managed_array);
cudaCheckError(cudaDeviceSynchronize());
for (int i = 0; i < length; ++i)
{
printf("%d ", managed_array[i]);
}
cudaCheckError(cudaFree(managed_array)); // We are done, deallocate the memory
return 0;
}
The macro cudaCheckError simply exits if the return code is not success.
I’ve been seeing other posts in the forum lately with seemingly the same problem - if the code above should work ok, maybe I should report a bug.
Hi!
I had the same issue lately, and I think I know what causes your problem. Check your device, if it isn’t supports the concurrentManagedAccess (I guess no, because you wrote, you are using Windows, and it is only supported on Linux), the freeze caused by the cudaMallocManaged() function. If you doing the memory allocation with cudaMalloc() and copying it with cudaMemcpy(), the above program works fine.
I’m a beginner with CUDA and I’m not sure why it is a problem, because regarding to the docs([url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-gpu-exclusive[/url]
), you are using the Unified Memory correctly. So I would also appreciate, if someone can explain this problem.
I have a GeForce 1050ti like yourself and was having problems with Unified Memory like yourself. The solution is to update to the very latest driver (418.81) which was released a week ago. Check out my post at https://cudaeducation.com/cudaunifiedmemorycrash/ to learn more about the issue with links etc.