How to benchmark multi-GPU system and get overal SP/DP GFlops?

What is the best way to get the overall performance in GFlops (Single/Double precision) of several GPU connected to a single CPU?

And what is the easiest way?

OS: Linux or Windows

performance, or utility?

theoretically, it should be the value of a single device, multiplied by the device count

practically, i would think that it is dependent on the first bottleneck introduced by the fact that there are now multiple devices - additional strain on the pci bus, for instance

hence, you might measure cpu, memory and device utility, and focus on what/ which one max out first

Easiest I can think of is probably something like CUDA-z (cross-platform): http://cuda-z.sourceforge.net/ But this will only get you the theoretical? (haven’t looked into how they bench their SP/DP stats, but it’s open source) performance of individual GPUs at a time, which maybe is what you want.