Cuda vs. OpenGL for simple imaging shaders
Lets say I want write a simple image filtering algorithm, for sake of argument, say a gaussian blur.

I can do this very easily and painlessly by writing an OpenGL shader. I know that opengl will be
loading my gpu very efficiently, coalescing memory, etc... without worrying about it.

I can do the same thing in CUDA, but it is very hard and I have to be very careful to do everything just right.

My question: If the algorithm you want can be written as a shader (like the blur) is it worth writing the
CUDA version? Will it likely be faster?

I understand that CUDA is much more flexible, that is not the question. At some point you have to go to CUDA, I understand.

But I was just wondering about these tradeoffs and whether simple imagers are just as good in OpenGL or will the CUDA
implementation be much faster?

thanks!

(and yes, I understand that I can just try it... but maybe someone already has??)
Lets say I want write a simple image filtering algorithm, for sake of argument, say a gaussian blur.



I can do this very easily and painlessly by writing an OpenGL shader. I know that opengl will be

loading my gpu very efficiently, coalescing memory, etc... without worrying about it.



I can do the same thing in CUDA, but it is very hard and I have to be very careful to do everything just right.



My question: If the algorithm you want can be written as a shader (like the blur) is it worth writing the

CUDA version? Will it likely be faster?



I understand that CUDA is much more flexible, that is not the question. At some point you have to go to CUDA, I understand.



But I was just wondering about these tradeoffs and whether simple imagers are just as good in OpenGL or will the CUDA

implementation be much faster?



thanks!



(and yes, I understand that I can just try it... but maybe someone already has??)

#1
Posted 06/24/2010 01:55 PM   
For simple per-pixel operations (e.g. color conversion), there is not much difference.

For filters like Gaussian blur that can take advantage of shared memory in CUDA, we have measured up to a 2x performance improvement over OpenGL.
For simple per-pixel operations (e.g. color conversion), there is not much difference.



For filters like Gaussian blur that can take advantage of shared memory in CUDA, we have measured up to a 2x performance improvement over OpenGL.

#2
Posted 06/24/2010 02:52 PM   
[quote name='Simon Green' post='1077424' date='Jun 24 2010, 08:52 AM']For simple per-pixel operations (e.g. color conversion), there is not much difference.

For filters like Gaussian blur that can take advantage of shared memory in CUDA, we have measured up to a 2x performance improvement over OpenGL.[/quote]

Thanks for that experience, it was just what I was wondering. I'll have to think about why OpenGL doesn't use the shared memory as efficiently?? Probably relying on general texture caching? It must be the generality of per-pixel operations somehow...

But thanks! Does the difference go away as the computation per pixel access goes up?
[quote name='Simon Green' post='1077424' date='Jun 24 2010, 08:52 AM']For simple per-pixel operations (e.g. color conversion), there is not much difference.



For filters like Gaussian blur that can take advantage of shared memory in CUDA, we have measured up to a 2x performance improvement over OpenGL.



Thanks for that experience, it was just what I was wondering. I'll have to think about why OpenGL doesn't use the shared memory as efficiently?? Probably relying on general texture caching? It must be the generality of per-pixel operations somehow...



But thanks! Does the difference go away as the computation per pixel access goes up?

#3
Posted 06/24/2010 03:59 PM   
OpenGL doesn't have shared memory (no concept of a thread block).

Yes, the perf. benefit of smem depends of how memory-bandwidth limited the kernel is (imaging kernels often are), and goes up the more times data in shared memory is re-used.
OpenGL doesn't have shared memory (no concept of a thread block).



Yes, the perf. benefit of smem depends of how memory-bandwidth limited the kernel is (imaging kernels often are), and goes up the more times data in shared memory is re-used.

#4
Posted 06/24/2010 04:03 PM   
Scroll To Top