How much speed of 64bit integer algebra in the latest GPUs?

Hi, all!

I am trying to estimate the performance of huge calculation application involving 64bit integer algebra.
I could find double precision peak performance metrics for the latest GPUs in the ndivia’s site.
But could not about 64bit integer algebra.

Please somebody give me information about the performance metrics of 64bit integer.

64-bit integer operations are emulated on all NVIDIA GPUs. Their exact performance differs by GPU architecture and could even differ based on code context. It would therefore be best to measure the performance in actual context, rather than relying on estimates.

To estimate the performance, consider that 64-bit integer addition, subtraction, negation, and logical operation are emulated using two 32-bit arithmetic or logic instructions each. 64-bit multiply is an inlined instruction sequence roughly equivalent to about four 32-bit integer multiplies for the low order 64 bits of the result. About twice that for the high-order 64 bits of the result, i.e. __umul64hi(). 64-bit integer division or modulo are called subroutines roughly equivalent to about forty 32-bit multiplies plus forty 32-bit arithmetic / logic instructions.

[Later:]

I noticed I left out shift operations. On recent GPUs these are equivalent to about four 32-bit integer adds for the general case, due to the presence of a funnel shifter. On older GPUs they were roughly equivalent to ten 32-bit integer arithmetic / logic instructions for the general case.

Hi, njuffa,

Thank you for your clear suggestion.
I feel I would become able to roughly estimate 64bit integer performance with your information.
That’s enough helpful at my current purpose.
cheers.