CL_DEVICE_MAX_MEM_ALLOC_SIZE Incorrect?

My GTX 295 reports about 220 MB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, which is in line with the OpenCL spec’s minimum value of 1/4th CL_DEVICE_GLOBAL_MEM_SIZE. However, in practice I’ve been able to allocate and use buffers as large as 512MB, more than twice the stated maximum, with correct results.

What’s up with this?

Also, is there any way to determine how much memory can be allocated and be resident on the device all at once? Since my algorithm is a global scatter, the only way to break it into segments is to rerun the entire program for each segment, discarding/clipping all points that do not fall into the segment currently in memory. Thus, splitting the problem into the fewest segments such that a given segment fits into memory is critical to my program’s performance for large problem sizes.

My GTX 295 reports about 220 MB for CL_DEVICE_MAX_MEM_ALLOC_SIZE, which is in line with the OpenCL spec’s minimum value of 1/4th CL_DEVICE_GLOBAL_MEM_SIZE. However, in practice I’ve been able to allocate and use buffers as large as 512MB, more than twice the stated maximum, with correct results.

What’s up with this?

Also, is there any way to determine how much memory can be allocated and be resident on the device all at once? Since my algorithm is a global scatter, the only way to break it into segments is to rerun the entire program for each segment, discarding/clipping all points that do not fall into the segment currently in memory. Thus, splitting the problem into the fewest segments such that a given segment fits into memory is critical to my program’s performance for large problem sizes.

I’ve seen the overly pessimistic reporting of MAX_MEM_ALLOC_SIZE as well and wondered if this is a bug or I’ve we’re just been lucky so far and that such large allocations may fail at any time in the future.

I’ve seen the overly pessimistic reporting of MAX_MEM_ALLOC_SIZE as well and wondered if this is a bug or I’ve we’re just been lucky so far and that such large allocations may fail at any time in the future.

I seem to remember reading somewhere that the G200 series of GPUs implements caching similar to that of a CPU. It could be that most of your 512 MB buffer is actually stored in system RAM, but only 220 MB at most is being cached in the VRAM at any given time. If this is the case, it probably won’t hurt anything but could lead to performance issues for random access (sequential access is probably ok).

I seem to remember reading somewhere that the G200 series of GPUs implements caching similar to that of a CPU. It could be that most of your 512 MB buffer is actually stored in system RAM, but only 220 MB at most is being cached in the VRAM at any given time. If this is the case, it probably won’t hurt anything but could lead to performance issues for random access (sequential access is probably ok).

I don’t think that is the case. My card has a GiB of VRAM and I have not noticed any sudden reductions in performance when allocating more than the stated maximum, even with somewhat random accesses.

I don’t think that is the case. My card has a GiB of VRAM and I have not noticed any sudden reductions in performance when allocating more than the stated maximum, even with somewhat random accesses.

I would call it an NVIDIA-problem, but I don’t have enough data. I wonder what ATI cards report.

Anyway, I had s similar thread a while back, where I was complaining that that CUDA allocations would fail above a certain threshold, which at the time appeared to be the CL_DEVICE_MAX_MEM_ALLOC_SIZE. tmurray classified it as a WDDM limitation, and we all agreed, case closed. Later, I started having the same memory allocation problem on Linux.

I have an algorithm that needs two big cunks of data: size 2n and size n. If the maximum safe allocation is 1/4 of the total memory, then I can only use at most 3/8 of the total memory. I think it’s just a mistake on NVIDIA’s side, where they took the OpenCL spec too literally, and just return 1/4 of the total device memory. (just a guess).

I would call it an NVIDIA-problem, but I don’t have enough data. I wonder what ATI cards report.

Anyway, I had s similar thread a while back, where I was complaining that that CUDA allocations would fail above a certain threshold, which at the time appeared to be the CL_DEVICE_MAX_MEM_ALLOC_SIZE. tmurray classified it as a WDDM limitation, and we all agreed, case closed. Later, I started having the same memory allocation problem on Linux.

I have an algorithm that needs two big cunks of data: size 2n and size n. If the maximum safe allocation is 1/4 of the total memory, then I can only use at most 3/8 of the total memory. I think it’s just a mistake on NVIDIA’s side, where they took the OpenCL spec too literally, and just return 1/4 of the total device memory. (just a guess).

According to OpenCL specification the minimum value for CL_DEVICE_MAX_MEM_ALLOC_SIZE is max(1/4th of CL_DEVICE_GLOBAL_MEM_SIZE, 12810241024). So it can be more than 1/4th of the total memory size, but cannot be less. Perhaps NVIDIA just misread the specification…