Lets say I want write a simple image filtering algorithm, for sake of argument, say a gaussian blur.
I can do this very easily and painlessly by writing an OpenGL shader. I know that opengl will be
loading my gpu very efficiently, coalescing memory, etc… without worrying about it.
I can do the same thing in CUDA, but it is very hard and I have to be very careful to do everything just right.
My question: If the algorithm you want can be written as a shader (like the blur) is it worth writing the
CUDA version? Will it likely be faster?
I understand that CUDA is much more flexible, that is not the question. At some point you have to go to CUDA, I understand.
But I was just wondering about these tradeoffs and whether simple imagers are just as good in OpenGL or will the CUDA
implementation be much faster?
thanks!
(and yes, I understand that I can just try it… but maybe someone already has??)
Thanks for that experience, it was just what I was wondering. I’ll have to think about why OpenGL doesn’t use the shared memory as efficiently?? Probably relying on general texture caching? It must be the generality of per-pixel operations somehow…
But thanks! Does the difference go away as the computation per pixel access goes up?
OpenGL doesn’t have shared memory (no concept of a thread block).
Yes, the perf. benefit of smem depends of how memory-bandwidth limited the kernel is (imaging kernels often are), and goes up the more times data in shared memory is re-used.