With routines such as these we are ever so close to having functional “sgetrs” which calls on the existing “strsm” and the
simple, but not yet existing “slaswp”. The combination sgetrf and sgetrs solves the equation Ax=b for x, i.e., x=A\b. This being
a holy grail at the moment.
I have hardware one step below the Q9550/gtx 280: a Q6600 quadcore cpu and a gtx 260. I get the following:
[codebox]
…glapack> ./benchmark
Device: GeForce GTX 260, 1296 MHz clock, 895 MB memory.
Errors reported are 1-norms of the residual such as ||A-QR||_1.
Cholesky LU QR
-------------- -------------- --------------
N Gflop/s error Gflop/s error Gflop/s error
1000 14.83 0.80 42.96 34.48 54.31 8.78
2000 101.17 1.07 97.62 60.93 123.00 12.67
3000 140.38 1.21 130.77 80.04 150.68 13.79
4000 111.16 0.94 101.29 106.74 168.95 16.81
5000 174.11 1.53 154.04 124.38 188.27 17.73
6000 172.13 1.43 173.10 146.37 196.90 20.60
7000 180.64 1.68 173.76 159.69 202.71 21.18
8000 190.27 1.61 180.69 193.38 207.50 22.29
9000 194.35 1.50 187.24 206.19 212.15 25.96
10000 198.41 1.67 192.23 225.67 215.90 27.75
11000 199.69 1.78 194.05 238.32 220.92 26.88
[/codebox]
I am somewhat stunned that the 260 is only about 2/3 as fast as the 280 for this benchmark. Perhaps it is the cpu/gpu combination that is conspiring to be slower? I have 8 GB of slowish ram in my system, preferring lots of ram over fast ram. Perhaps the code has some special tuning for the 280?
[codebox]
… glapack> ./benchmark -cpu
Device: GeForce GTX 260, 1296 MHz clock, 895 MB memory.
Errors reported are 1-norms of the residual such as ||A-QR||_1.
Cholesky LU QR
-------------- -------------- --------------
N Gflop/s error Gflop/s error Gflop/s error
1000 12.95 0.87 32.01 24.60 39.90 6.47
2000 32.06 0.97 36.39 53.54 51.76 6.71
3000 38.37 0.90 44.59 81.45 47.21 9.00
4000 48.96 0.85 45.72 98.10 49.07 7.62
5000 47.45 1.11 42.56 125.48 50.32 11.28
6000 46.80 1.21 42.53 155.80 51.31 10.47
7000 46.76 1.17 51.04 166.25 51.59 13.42
8000 40.01 1.19 52.32 197.28 52.47 14.37
9000 48.41 1.18 43.29 223.64 52.66 13.83
10000 48.89 1.21 53.09 244.25 42.80 16.26
11000 51.22 1.18 43.80 265.48 52.91 16.33
12000 50.13 1.23 43.68 300.44 43.16 17.32
13000 40.73 1.20 43.53 300.52 43.38 19.32
14000 40.94 1.22 44.06 335.17 43.21 19.20
15000 41.40 1.32 43.36 346.97 42.79 18.02
[/codebox]
I’ve toyed with upgrading to a Q9550 but I am not sure it is worth the $300 it would take… I paid $400 for my gtx 260 last June which brings tears to my eyes now…