Integer Arithmetic 32 integer arithmetic performance

Wai_K_Wu · March 7, 2007, 3:42am

Would people from nVidia shed some light on 16bit and 32bit integer arithmethic performance? There is lots of talk about float point performance in the doc, but only a sentance or two on 16bit and 32bit arithmetic performance. I would like to know more about integer arithmetic latencies as most of my code involves 16bit and 32bit, both signed and unsigned operations.

Thanks.

chapuni · March 7, 2007, 7:48am

I am trying number cruncher application using integer.

I know only #6.1.1.1 of the Programming Guide says “any integer operations and 24-bit multiplication are about same as most of float operations”
I would like to know, too, how performance of integer operation is.

Wai, have you checked out *.ptx intermediate file?
I guess it is interesting.

My application uses lots of “Lookup tables” and it might make its bottleneck.
I will post questions when I gave up tuning.

prkipfer · March 7, 2007, 9:54am

I would also be interested to hear if anyone has noticed a performance improvement when compiling the .cubin with -fastimul (24bit integer)

Peter

Mark_Harris · March 7, 2007, 7:46pm

A:

int a = foo();

int b = bar();

int ab = a * b;

B:

int a = foo();

int b = bar();

int ab = __mul24(a,b);

Code B (or code A compiled with --fastimul) should definitely compile to fewer instructions than code A (with default compiler options).

On G80:

24-bit integer multiplies are full-speed. 32-bit integer multiplies require a multiple instruction sequence.

32-bit float mul, add, and mad, and 32-bit integer add, shifts, and logic operations are full speed.

full-speed = 2 cycles per 32-thread warp.

Mark

Wai_K_Wu · March 7, 2007, 8:42pm

Can you tell us what the penalty is in machine cycles?