Using dual-issue in Fermi

jayshenoy · March 9, 2012, 10:18am

Hello, I’m planning to benchmark a kernel. How do I do a dual-issue MAD and MUL?. I tried doing a MUL operation after MAD, but the GFLOP/s seems to decrease.
Is there a way I could somehow direct the instruction to SFU?

Thanks in advance

tera · March 9, 2012, 5:05pm

I don’t think compute capability 2.0 devices are capable of issuing a MUL to the special function units in parallel to a MAD on the FPUs/cores. This was a property of 1.x devices. Compute capability 2.1 devices of course can issue a MUL to their extra set of cores in parallel to a MAD.

You might have to play a bit with context, alignment, and operands due to limited register file bandwidth. I’ve got to admit I never tried myself in earnest as my algorithms usually have a MUL/ADD ratio of 1, not 2.

jayshenoy · March 9, 2012, 5:17pm

Yeah. According to this paper it’s possible in GT200 using Mathematical Intrinsics. I was wondering if there was something like that for Fermi?.

Michael_H1 · March 10, 2012, 9:53pm

Isn’t the dual issue to two different warps ?
You seem to be thinking dual issueing like a CPU superscalar processor…

tera · March 10, 2012, 10:42pm

Yes, that is indeed what we are thinking. The two schedulers issuing instructions from two independent warps come on top of that, for (theoretically) up to 4 instructions issued in parallel per SM.

I’m not aware though that this has really been demonstrated on GF100, but both GT200 and GF104 are capable of this kind of dual-issue.