CUDA vs OpenGL + Cg wrt performance

Does anyone have experiences with rewriting an OpenGL + Cg code running on G7X cards to CUDA for the G80? How much performance gain is to be expected?

Little bit more precisely: my application is a memory limited sparse matrix multiplication that runs at 10 Gflops on G7X using OpenGL + Cg and it runs at around 30 Gflops on the G80 with the same code (that is without using CUDA). This 300% performance gain is pretty impressive already and I was wondering if it was worthwhile to rewrite the whole thing in CUDA. Would that bring further speed improvements?

I know the question is rather vague (the answer I guess depends on the details of the code) but I would be very happy to hear anyone’s experiences comparing OpenGL + Cg vs CUDA codes.

(Hey, this is the first Linux post, or what? :))

The performance gain you’ll get by using CUDA over the graphics API largely depends on how much your application can take advantage of the shared memory.

Access to shared memory is very fast (basically the same speed as registers), whereas graphics has to access data through textures.

For image convolution, CUDA is about 2x faster than an equivalent shader-based solution.

The ability to perform scattered writes also enables a lot of new algorithms, which can bring even larger speedups.

Thanks for the reply, I guess I’ll just have to do it and see what I get :)