Titan V FP16 Performance

Can someone from NVIDIA provide a solid spec for the Titan V’s FP16 performance?

I’ve seen 15TFLOPS FP32 and 110TFLOPS using the Tensor Cores, but no spec in the marketing materials for FP16.

FP16 not using TensorCore should be at double the FP32 rate, for this V100 based product. This is a characteristic of the V100 device, and similar to all other GPUs with full-rate FP16 throughput (i.e. sm_53, sm_60, sm_62, sm_70) This general principle is observable here:

[url]Programming Guide :: CUDA Toolkit Documentation

Thanks for the response.

I suspected as much, but wanted confirmation for the Titan V, since previous GTX incarnations did not have FP16 support like their datacenter counterparts (i.e. Titan Xp vs P100).

Can you confirm that the Titan V has native FP16?

It’s confirmed here

there’s a DP4A hardware instruction on the V100 chip.

dp4a has nothing to do with FP16 computation. You are thinking of INT8

The Titan V is a compute capability 7.0 device.

[url]https://www.reddit.com/user/hellotanjent[/url]

The FP16 throughput (not using TensorCore) for compute capability 7.0 is given in the table I already linked:

[url]Programming Guide :: CUDA Toolkit Documentation

Oh you’re right. Sorry about the misinformation. I’ve been focusing too much on integer stuff recently.