Optimal CUDA performance out of gamers gpu

Hello!

Want to setup a card here for optimal CUDA performance under debian (linux). What is the best hardware/software setup?

Idea here is a text-only install of Debian - i access the machine then over SSH - the machine is in a different officeroom anyway.

The machine has a built in graphics card from XGI z9s videocard. It is a Xeon box with 8 cores L5420.

What is the optimal setup to get 24/24 crunching time on the gamerscard?
I do not need a 2nd gpu from nvidia for that, do i?

Debian is the main linux distribution i use here (also for firewalls with stripped linux kernels and only 1 service active), so i know how to set that one up.

Yet i wonder how to get the optimal efficiency out of the hardware. What would be really cool is if i can run a kernel on it for longer than 2 seconds. Previous gamerscards i used, yet in a graphical environment, they got hung if a kernel ran for more than 1.5 seconds a CUDA kernel.

The sieving of prime numbers and testing of primes, would be great if i can run kernels that run longer than under 1 second. Is that somehow possible to achieve?