cudaMemGetInfo() how does it work?!?

Hello,

iam currently programming an acceleration structure on the GPU for raytracing.
There is a bug in my code, but i can’t figure out where. The program runs and the bug appears at a non deterministic iteration.
So sometimes my structure is build 500 times without an unspecified launch failure and sometimes just two times with same options set.
I already know that some weird is written into my structure memory, for instance the splitting dimension of a node in the structure is 32012, even though it should be something between 0 - 3 (leaf=3).
Next thing i found is that in the first iteration the used memory of the graphic card is lower then the used memory at the second iteration. After the second iteration it stays constant.
So now my question: How does cudaMemGetInfo() determine how much memory is used?
Would it recognize if i wrote over the bounds of my allocated memory or isn’t that possible?
My idea is that if some function allocates memory then a global int is incremented and if cudaMemGetInfo() is called this global int is used to determine the memory usage ?!?!
Thing is that i don’t allocate new memory during an iteration, so the memory usage shouldn’t increase, but it does…
I am glad for any hint.

regards,
peter

I just add this code in my list

cudaMem.cu

And call checkGpuMem(); when needed

#include <stdio.h>

#include "cuda.h"

extern "C"

void checkGpuMem()

{

float free_m,total_m,used_m;

size_t free_t,total_t;

cudaMemGetInfo(&free_t,&total_t);

free_m =(uint)free_t/1048576.0 ;

total_m=(uint)total_t/1048576.0;

used_m=total_m-free_m;

printf ( "  mem free %d .... %f MB mem total %d....%f MB mem used %f MB\n",free_t,free_m,total_t,total_m,used_m);

}
1 Like

Contrary to popular belief, cuMemGetInfo() does not actually rely on magic. We ask the kernel mode driver how much memory has been allocated on the card. However, this will not look for out-of-bounds accesses or anything like that; what you want is cuda-memcheck or cuda-gdb (both are rightly considered miracles).

The code posted by jam11 is defective on GPUs with greater than 4GB of memory and should not be used as-is in any CUDA code.

So what is the fix for the same?

The code has various issues. It is not casting correctly, and it is using incorrect printf format specifiers (a modern compiler will tell you that last item).

If it were me, I would simply use cudaMemGetInfo directly on size_t quantities, and print out those size_t quantities using a correct format specifier like %lu, or even better just use std::cout.

#include <iostream>
...
size_t free_t, total_t;
cudaMemGetInfo(&free_t, &total_t);
std::cout << "Free mem: " << free_t << " Total mem: " << total_t << std::endl;

In the original code, the casting is not done correctly. C++ order of operations dictates that c-style casts are done before arithmetic like division. Casting the 64-bit value to a 32-bit value prior to the division could not possibly be correct, if the 64-bit value is larger than about 4 billion (4 GB).