I have algorithm which has few parallel path so that I can make it on cuda. I want to do image processing so I have 10000 smaller images and want to process each image on 1 cuda block. is this a good technique ? please help. I am learning cuda programming. I made image size as small 20*20 pixels. but I face problem of inconsistent results.
You may want to take a look at the NPP library that ships with CUDA, it may already offer ready-made functions that implement the functionality you desire or need. Alternatively, you may want to look into a CUDA accelerated library such as OpenCV.
Nimish, if you need assistance in using custom CUDA Kernels that operate on images that are read in from openCV then I can be of assistance. Let me know in here or via email (joshua.holloway@okstate.edu).
If anyone else needs help with this topic, let me know.