Using dual-issue in Fermi

Hello, I’m planning to benchmark a kernel. How do I do a dual-issue MAD and MUL?. I tried doing a MUL operation after MAD, but the GFLOP/s seems to decrease.
Is there a way I could somehow direct the instruction to SFU?

Thanks in advance

I don’t think compute capability 2.0 devices are capable of issuing a MUL to the special function units in parallel to a MAD on the FPUs/cores. This was a property of 1.x devices. Compute capability 2.1 devices of course can issue a MUL to their extra set of cores in parallel to a MAD.

You might have to play a bit with context, alignment, and operands due to limited register file bandwidth. I’ve got to admit I never tried myself in earnest as my algorithms usually have a MUL/ADD ratio of 1, not 2.

Yeah. According to this paper it’s possible in GT200 using Mathematical Intrinsics. I was wondering if there was something like that for Fermi?.

Isn’t the dual issue to two different warps ?
You seem to be thinking dual issueing like a CPU superscalar processor…

Yes, that is indeed what we are thinking. The two schedulers issuing instructions from two independent warps come on top of that, for (theoretically) up to 4 instructions issued in parallel per SM.

I’m not aware though that this has really been demonstrated on GF100, but both GT200 and GF104 are capable of this kind of dual-issue.