Best Maxwell compute density and $/flop

I am trying to do some initial speccing out of a compute box for a project, where I want to cram as much Maxwell compute capability in a box as possible, as cheaply as possible. Normally we use a single Titan X for this, but for the next upgrade we need more power!

My gut says that for price/performance the 980 (non TI) may be the best bet, but I am slightly nervous about the bus size as a lot of what we do is memory bound. Is there a way to tweak the bus size on a Titan X to get a performance measurement (unlikely but worth a punt).

Anyone have any recommendations for parts for a compute box, or someone (preferably UK/Europe based) who can make good custom compute boxes?

Cheers,
Tiomat

980 Ti has better GFLOPS/$ than 980.

If this is for a particular publicly available app, it might be a good idea to check application-specific websites or mailing lists or forums for recommendation. For example, I am aware that the AMBER developers provide specific hardware recommendations [url]http://ambermd.org/gpus/recommended_hardware.htm[/url] as well as benchmark results for various GPUs.

Unfortunately it is for a custom set of kernels (ptychography) that are very FFT intensive which leads to it being very memory limited. I had not realised the level of performance jump from the 980-980Ti so I had naively assumed that the extra £100 wouldn’t be justifiable. It looks like the 980Tis may be the best price/performance point, plus that ‘solves’ the memory bandwidth problem. Cheers.

As with all “certified” solutions, you’re paying the premium for the support, not the hardware. Another notorious example is the NVIDIA DIGITS DevBox: it’s priced at $15,000, but if you purchase the individual parts and assemble it yourself, it only costs ~$7,000.

In all fairness, the AMBER web page I pointed to also has recommendations for do-it-yourselfers at the bottom, in addition to the pointers to certified solutions.

As to the rationale behind certified solutions: In various contexts, the ability to have “someone to yell at when things do not work” is more important than the absolutely lowest hardware cost.

how many cards do you plan to place in a box?

Currently I have more a performance target to hit (so much work per unit time), but it will likely be >4 so being able to fit 8 in a single box would be great. My current thoughts are something along the lines of a 3U rack server with space for 8 dual slot GPUs, but finding someone who can build one with GeForces in is quite hard. My fallback is finding components and doing it myself but then I’d be shifting the development risk onto myself.

i suppose the next question then is - what total number of devices, more or less?

4 cards per motherboard seems to be a pivot - beyond that you generally move to rack servers
you then gain density, at the cost of price, and likely a more sophisticated cooling system/ design

and the cost/ density distribution seems skewed