Bandwidth Fluctuations v100 and RTX2080

I have a simple cuda kernel (adding two vectors of size N) pretty similar to to this cuda blog here 1. I only changed a few things, e.g. running the measurement over various sample. So, let this run for, lets say, 1000 times and writing this measurement to a txt afterwards. If I plot now the measurements for transfering a vector to the device I get the following:
https://i.stack.imgur.com/adCSI.png

Now, if we take a look at the stddev drawn as vertical errorbars, then it should be clear, that for some reason, the data movements fluctuation scale with the size, because the errorbars are kinda constant in a log-log plot. This can be validated when only the stddev is plotted

https://i.stack.imgur.com/Rr7CN.png

If I take the very same programm from the cuda blog1, then I get for every 10-th run or so also bandwidth fluctuations. Where does this come from? I observed the same behaviour on two different GPUs, a V100 and a RTX2080

If you’re using cudaMemcpy, the performance could possibly be affected by paging. Try pinning the memory and rerunning your tests.

Please read https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/ for more details.

Indeed I’m using cudaMemcpy. I’ll try the pinning approach and come back to this post.

Thank you very much for your time and suggestions

Kind regards
Max