Running HPC Benchmarks on TK1 cluster

I would like to run HPL 2.1, MILC7, and HPCG 3.0 on this small cluster of NVIDIA Jetson TK1s. It seems NVIDIA has provided a binary of HPCG 3.0, but it appears to be for the Fermi class GPUs and Linux64 bit. NVIDIA also provided a complete installation package including make files for HPL 2.0, this is stated as for Linux 64 bit as well as for FERMIs. So I am not sure how to go about getting CUDA support on the embedded boards for these two benchmarks. Does anyone know if NVIDIA can provide binaries for our little Jetsons? Otherwise, any idea how to build these benchmarks with CUDA enabled?

MILC7 leverages the QUDA library for GPU acceleration, I am building this now.

In my attempts to run HPL on a cluster of Jetson TK1s with openMPI 1.10.2 compiled for CUDA, I have discovered the following:

You can not just replace your standard BLAS libs with cuBLAS

The HPL benchmark code provided by NVIDIA [url]https://developer.nvidia.com/accelerated-computing-developer-program-home[/url] is for x86 64 bit and fermi cards, I do not think it will work with 32 bit arm7. I was not able to build it. Any suggestions would be appreciated.

Here is another thread where someone is attempting the same thing
[url]https://devtalk.nvidia.com/default/topic/908451/?offset=5#4802042[/url]