cudaDeviceSetLimit call increases the GPU memory

After this code line

err = cudaDeviceSetLimit(cudaLimitStackSize, 65536);

The GPU “used memory” increased from 400 mb to 2500 mb.

What is the problem here ? How can it be explained?

Thank you

Stack is a per-thread resource which is reserved ahead of a kernel launch. Since GPU kernels typically run thousands (if not tens of thousands) of threads, even modest increases in stack size (I think 4 KB is the default) can cause much additional memory to be reserved for stack usage.

A need to increase stack is often a red flag, as the vast majority of common GPU programming patterns can operate within the default allocation. You would want to carefully examine your approach to see whether a large stack is indeed necessary and the best way to use the memory on the GPU card.

Thank you !