Does cudaMemcpyAsync require pinned memory?

Hello! Have some questions about pinned memory.

  1. Is pinned memory necessary when we want to perform async-memory copy?

2.If the answer is yes, is there any size limitation of pinned memory?
For example, if we have a 64GB host memory machine, is 4GB pinned memory will influence CPU performance significantly?

Thanks!

Yes, I believe so according to this page;

http://devblogs.nvidia.com/parallelforall/how-overlap-data-transfers-cuda-cc/

“The host memory involved in the data transfer must be pinned memory.”

I have always used pinned memory with cudaMemcpyAsync and do see overlapping behavior.

Using 4 GB out of 64GB host memory will not degrade CPU performance. There is some additional overhead related to the initiall pinned memory allocation (more than a regular host malloc)

Yes (and no). If you want truly asynchronous behavior (e.g. overlap of copy and compute) then the memory must be pinned. If it is not pinned, there won’t be any runtime errors, but the copy will not be asynchronous - it will be performed like an ordinary cudaMemcpy.

The usable size may vary by system and OS. Pinning 4GB of memory on a 64GB system on Linux should not have a significant effect on CPU performance, after the pinning operation is complete. Attempting to pin 60GB on the other hand might cause significant system responsiveness issues. YMMV.