hpl-2.0_cuda tuning

hello everyone.

i want to use hpl2.0_cuda to evaluate the performance of the cluster. in my cluster, it consisted of 8 nodes with CPU:Intel Xeon Nehalem W3520 2.66GHz 8M 4.8GT/sec x 1,Tesla C1060 x 1. as far as i know the block size NB=960 is the best. now i want to set a problem size N.

if i set N like 10000 20000 30000 like these(NB=960,PQ=24 total 8 processes), cudasetmatrix() can not be executed successful. is there any rule to tune problem size N?

thank you!

Hello,
I am a student in Peking University,China, My name is Acejim.
Would you please kind enough to send me a GPU LINPACK code(“CUDA accelerated Linpack” ) ?
I’ve tried my best to download from https://nvdeveloper.nvidia.com,but failed.

(1) Go to http://developer.nvidia.com/
(2) Click on green link “Registered Developer Website” in upper right corner
(3) login (or create a new account, then log in)
(4) click on green link “CUDA/GPU Computing Registered Developer Program”
(5) locate the section “CUDA Accelerated Linpack”
(6) click on green link “follow this link”
(7) click “I Accept” to accept usage conditions

Your download should start at this point.