Bypassing cache while running a benchmark

Hi,

I want to run a benchmark on hardware and I want to by pass all the caches available on the hardware and want the benchmark to use only global memory.

Is there any command line option to achieve that? Or any leads in that direction?

Thanks for the help,
Sai

The main cache is L2 and possibly L1 depending on what GPU you are running on.

The L2 cache cannot be disabled in any way.

The L1 cache, if it would normally be enabled, can be disabled at compile time using a particular compile switch for PTXAS:

-Xptxas -dlcm=cg

added to the compile command line.

[url]http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#ptxas-options[/url]

There are other caches on the device which also cannot be globally disabled in any way, such as the constant cache, read-only cache, etc. Code that is written to use these features will use those features, and the only way to disable their use would be to re-write the code.