Read vectors with power-law distributed frequency, any cache optimization we can do?

Hi,

I need to read a m-by-f matrix by rows. My read will follow power-law distribution, i.e., some rows are going to be frequently read and a lot of rows are rarely read. The matrix is read-only.

Any cache optimization we can do using CUDA? Currently I am reading through texture cache. I wonder if we can explicitly control the cache so that I can cache more frequently read rows (I know the frequency for each row in advance).
Thanks!