The source code for the CUDA accelerated Linpack is now available to all registered developers.
The code has been released under BSD license.
Few remarks:
There is NO support for the code ( the CUDA_LINPACK_README.txt has detailed instructions ).
The code requires a Fermi card (It uses a fast DGEMM implementation written in Fermi assembler) with more than 2GB of memory ( all the Tesla 20x0 will qualify)
The library that intercepts the DGEMM and DTRSM calls could easily be used in other codes that are DGEMM intensive.
(1) Log into NVIDIA’s Registered Developer website at https://nvdeveloper.nvidia.com
(2) On the right-hand side of the starting page, you will see a column titled “Newest Documents And Downloads”
(3) The second item from the top is a link for “CUDA accelerated Linpack”
“2) The code requires a Fermi card (It uses a fast DGEMM implementation written in Fermi assembler) with more than 2GB of memory ( all the Tesla 20x0 will qualify)”
I have tried this code on a small cluster with multiple GTX 580’s/590 per node. 1.5GB ea. It appears to work.
What problems would you expect with the smaller memory? Would performance be expected to improve with a larger GPU memory?
I also note that I am using smaller node memories as well. My Impression is that adding more memory by either increasing the number of nodes or increasing the amount of memory on all nodes will allow me to solve larger array’s with resulting higher FMAX. Can you use different memory sizes on individual nodes or will it be limited to the size of the smallest node?
I have registered on the nvdeveloper zone, yet, it does not give me access. I am looking to download and run the Linpack code that was shown in the presentation by E. Phillips and M. Fatic. Can someone please help me here. Thanks!
I have the same problem. I have created the account, but the link provided here won’t recognise my login and password. I think it points to an old developer’s site, hence this issue.
Could you please provide the instructions on how to obtain the GPU LINPACK code on the new (current) developers site?
This paper describes the use of CUDA to accelerate the Linpack benchmark on heterogenous clusters, where both CPUs and GPUs are used in synergy with minor or no modifications to the original source code. A host library intercepts the calls to DGEMM and DTRSM and executes them simultaneously on both GPUs and CPU cores. An 8U cluster is able to sustain more than a Teraflop using a CUDA accelerated version of HPL.
(1) Login at User account | NVIDIA Developer
(2) Click on green link “CUDA/GPU Computing Registered Developer Program”
(3) Look for “CUDA Accelerated Linpack”, and click the link there
(4) Click green “I accept” string at the bottom of the license agreement
At this point a download window should pop up. I just exercised the entire process using my own registered developer account, so it should work.
I just exercised it with my own developer account and it worked perfectly, would really like to have a look at this superb piece of code and it’s performance-level!