Hello,
I am executing LIB benchmark on GTX480 in two scenarios. One is with L1 cache and other is bypassing L1 cache.
I found the following results
Execution Time(ms)
Benchmark GridSize BlockSize With Cache(L1) Without Cache(By passing L1)
LIB 64 16 14.179 14.223
64 32 7.315 7.34
64 64 4.47 4.464
64 128 4.491 4.506
64 256 4.593 4.6
I am not able understand why the execution time is more in case of running a benchmark bypassing L1 cache.
Any insights on this?
Thanks
Sai