Fluctuating PCI Express transfer rates

I’m experiencing a puzzling problem where transfers to/from the GPU occur at wildly different speeds.

I’ve developed an image processing program that uploads an image to the card, performs a number of operations on it, and downloads the resulting image. The images are approximately 1.25MB in size, and the frame rate is high (500+ Hz). The time to process each frame seemed to vary quite a bit, so I looked at it with the profiler and discovered that the transfer times to and from the GPU were all over the place. I see speeds ranging from ~600MB/s to ~9GB/s for the PCIe transfers. I do not see any significant fluctuation in the kernel execution times, just the memory transfers. There doesn’t seem to be any pattern to which transfers are fast and which are slow.

My setup: [url]http://www.connecttech.com/sub/Products/VXG001-COM-Express-GPU-Embedded-System.asp[/url]. Its an an embedded system with an i7-4700EQ and GTX 970m. I’ve experienced the problem running both CentOS 7.0 and Ubuntu 14.04. In all cases, X was running on the integrated Intel graphics – the GPU is used only for computation. I’ve tried my code out on some other machines, and I always have pretty consistent data transfer speeds on those. It seems that the varying transfer speeds are specific to this hardware, but I don’t know why.

Has anyone else seen this sort of thing? Any ideas what may be causing this inconsistency?

You note this fluctuation only on the first run or ever?

I am seeing almost exactly the same thing. I am copying a relatively small image (625K) back from the GPU. Most of the time this happens in about 0.35 ms for about 1.5GB/sec. However, occasionally, it takes 78 ms. for about 8 MB per second.

I’d love to hear any ideas as to why this is happening.

Is it possible for you to transfer more than one image at a time to the GPU before processing?

Not in my case. This is a rendering server, so, we render and send back the image. The kernel times are very consistent - with variations that are explainable by the scene complexity. I’m hoping for some hints like, it might be caching and you could check this, or it might be locking and you could check this…