Barra, a CUDA-capable GPU simulator
if it is true, according to my data, is it mean the latency of getting operand from constant cache to ALU is less than 4 cycles, and getting from share memory is about 8 cycles?
You must Log In to add a comment.
New Private Message
Follow Us On
Copyright © 2014 NVIDIA Corporation