64-bit mode for L1 cache? how fast is it
F.5.3 in the Programming Guide says that shared memory can provide 32*8 = 256 bytes per cycle per SM if using 64-bit mode. Is it possible to use this 64-bit mode for L1 cache?
F.5.3 in the Programming Guide says that shared memory can provide 32*8 = 256 bytes per cycle per SM if using 64-bit mode. Is it possible to use this 64-bit mode for L1 cache?

#1
Posted 04/30/2012 11:42 PM   
You can change the shared memory with cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte) but I don't think there is an equivalent for L1.
Let me ask around.
You can change the shared memory with cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte) but I don't think there is an equivalent for L1.

Let me ask around.

#2
Posted 05/01/2012 02:15 AM   
Any results from asking?
Any results from asking?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#3
Posted 03/30/2013 11:08 AM   
Scroll To Top