i want to use hpl2.0_cuda to evaluate the performance of the cluster. in my cluster, it consisted of 8 nodes with CPU:Intel Xeon Nehalem W3520 2.66GHz 8M 4.8GT/sec x 1,Tesla C1060 x 1. as far as i know the block size NB=960 is the best. now i want to set a problem size N.
if i set N like 10000 20000 30000 like these(NB=960,PQ=24 total 8 processes), cudasetmatrix() can not be executed successful. is there any rule to tune problem size N?
Hello,
I am a student in Peking University,China, My name is Acejim.
Would you please kind enough to send me a GPU LINPACK code(“CUDA accelerated Linpack” ) ?
I’ve tried my best to download from https://nvdeveloper.nvidia.com,but failed.
(1) Go to http://developer.nvidia.com/
(2) Click on green link “Registered Developer Website” in upper right corner
(3) login (or create a new account, then log in)
(4) click on green link “CUDA/GPU Computing Registered Developer Program”
(5) locate the section “CUDA Accelerated Linpack”
(6) click on green link “follow this link”
(7) click “I Accept” to accept usage conditions