Is there any source code available for the benchmark? Otherwise trying to perform a similar test under CUDA might be really difficult...

Also, is there any price for a CUDA implementation that beats the CPU?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

Prize of course... where has the edit function gone?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

Can you beat the simplicity of use of its parallel library?

Take any compiled test you want

http://www.equation.com/servlet/equation.cmd?fa=laipebenchmark

and compare your CUDA speed with Intel/AMD multi-core CPUs.

Test - the solution of sparse band system of equations.

Can you beat the simplicity of use of its parallel library?

Take any compiled test you want

http://www.equation.com/servlet/equation.cmd?fa=laipebenchmark

and compare your CUDA speed with Intel/AMD multi-core CPUs.

Test - the solution of sparse band system of equations.

1 cpu 2.46s

2 cpu 1.22s

3 cpu 0.83s

4 cpu 0.67s

5 cpu 0.58s

6 cpu 0.50s

1 cpu 2.46s

2 cpu 1.22s

3 cpu 0.83s

4 cpu 0.67s

5 cpu 0.58s

6 cpu 0.50s

Also, is there any price for a CUDA implementation that beats the CPU?

Also, is there any price for a CUDA implementation that beats the CPU?

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

constant * variable + constant * variable + constant * variable <= 1000;

If so can you give an example of how the input would look like ?

constant * variable + constant * variable + constant * variable <= 1000;

If so can you give an example of how the input would look like ?

x = 2y

y = x + 4

Or even

x/y = 2

y - x = 4

Of course, there can be more variables...

However, because they are linear you will never see

x = y^2

x = 2y

y = x + 4

Or even

x/y = 2

y - x = 4

Of course, there can be more variables...

However, because they are linear you will never see

x = y^2

Ultimate Gaming Rig:

Dell Latitude XT2

Windows 7 64bit

Intel Core 2 Duo U9600 1.6 GHz

3GB DDR3 1200MHz underclocked to 800 MHz (YAY DELL!)

Intel GMA4500MHD

156GB SATAII 5400RPM HDD

Cold Boot Time: 12 Seconds to desktop (Take that Lenovo with i5 + SSD & 40 second boot)