GET STARTED

GET INVOLVED

Authorization Required

Not a member? Register Now

I used visual profiler on mac to profile a sparse matrix vector multiplication kernel. I found that the number of L1 misses * 15(num. of SMs) is not equal to the number of L2 requests. Even (num. of L1 misses + num. of L1 hits) * 15 < L2 requests. Can someone explain this?

L1 hits: 677342

L1 misses: 2.07111e+06

L2 requests: 1.23936e+08

I used visual profiler on mac to profile a sparse matrix vector multiplication kernel. I found that the number of L1 misses * 15(num. of SMs) is not equal to the number of L2 requests. Even (num. of L1 misses + num. of L1 hits) * 15 < L2 requests. Can someone explain this?

L1 hits: 677342

L1 misses: 2.07111e+06

L2 requests: 1.23936e+08

Hi,

I used visual profiler on mac to profile a sparse matrix vector multiplication kernel. I found that the number of L1 misses * 15(num. of SMs) is not equal to the number of L2 requests. Even (num. of L1 misses + num. of L1 hits) * 15 < L2 requests. Can someone explain this?

L1 hits: 677342

L1 misses: 2.07111e+06

L2 requests: 1.23936e+08

[/quote]

You have 60x more L2 requests than L1 misses. 60 is 15 times 4. 15 is the number of SMs as you noted, and I'd speculate that 4 is due to the 4x difference between L2 and L1 cache line sizes.

Hi,

L1 hits: 677342

L1 misses: 2.07111e+06

L2 requests: 1.23936e+08

You have 60x more L2 requests than L1 misses. 60 is 15 times 4. 15 is the number of SMs as you noted, and I'd speculate that 4 is due to the 4x difference between L2 and L1 cache line sizes.

You have 60x more L2 requests than L1 misses. 60 is 15 times 4. 15 is the number of SMs as you noted, and I'd speculate that 4 is due to the 4x difference between L2 and L1 cache line sizes.

[/quote]

Your speculation is quite reasonable. I think that is the reason. Thanks!

Your speculation is quite reasonable. I think that is the reason. Thanks!