GET STARTED

GET INVOLVED

Authorization Required

Not a member? Register Now

With that proviso, I don't think your question makes much sense. How do you convert two 32-bit floating point numbers into a single 64-bit floating point number to do an add?

Are you talking about integer arithmetic?

Instructions that natively operate on 64-bit data (double-precision add, multiply, and FMA as mentioned by txbob; conversions to double precision; conversions from floating-point to 64-bit integer; load/store instructions using 64-bit addresses) use

alignedregister pairs consisting of consecutive even/odd register numbers, such as R2:R3.64-bit integer operations are emulated via 32-bit integer operations, using two 32-bit registers for each of the operands.

Newbie here...

Generally speaking, a floating-point multiply returns the

top-mostbits of the full product, while an integer multiply returns thebottom-mostbits of the full product. Youmaybe able to produce the effect of an integer multiply by using denormals to represent integers, I haven't thought it through.A double-precision floating-point number can represent integers up to 2**53 accurately, you can't even add full 64-bit integers.

In any event, given that you have a Titan V in hand, why not simply prototype whatever it is you have in mind? Below is a small program that shows how we can do integer addition with double-precision add,

without conversion overhead. Note that in terms of performance, this relies on GPUs handling denormals at full speed, which I am pretty sure is the case for all of NVIDIA's GPUs (but by all means, test that assumption).Maybe what you have in mind is more like the following, where four integers are added pairwise using a single double-precision addition. Note that this requires avoiding overflow conditions (otherwise the lower portion of the mantissa addition will "bleed" into the higher portion, or the higher mantissa portion will "bleed" too much into the exponent field, causing incorrect results).

As you said, perhaps it's best just to try...

My recommendation would be to write code in a clear and straightforward manner, and let the profiler guide any "normal" optimizations, then resort to "ninja" optimizations only if you absolutely have to. Even if your envisioned scheme proves to be beneficial on Volta, it could easily be detrimental on other architectures ("brittle" performance).