Barra, a CUDA-capable GPU simulator
if it is true, according to my data, is it mean the latency of getting operand from constant cache to ALU is less than 4 cycles, and getting from share memory is about 8 cycles?
You must log in to send a PM.
Please Login | Register to add a comment.
Not a member? Register Now