Same question I asked on StackOverflow http://stackoverflow.com/questions/35191932/parallelizing-slic-superpixels-algorithm-on-opencv-2-4
background
OpenCV 3.0 has a contrib module where SLIC superpixels are introduced. In the previous version, only SEEDS superpixel implementation was there.
I cannot jump to OpenCV 3.0 since I am working with Nvidia Jetson TX1 which is CPU optimized with a closed source version of OpenCV 2.4.10 .
I intend to take benefit of the highly parallel architecture of the Jetson. I got a simple SLIC superpixel code working with OpenCV 2.4.
problem
My question is: how to take benefit of the OpenCV GPU class for parallelizing a bunch of nested for loops [ which is happening in my case]
Basically I have something like this :
for(;;) // reading video frame by frame
{
Mat im;
cap >> im;
...
...
for(int i=0; i<5; i++) //Localized K-means. 5 iterations is enough for my purpose.
{
for(cluster_no=0; cluster_no<total_clusters; cluster_no++)
{
...
// Determine length and width of the current cluster.
...
for(each pixel in the current cluster span)
{
// Compute distance between current cluster center and current pixel
...
}
}
}
// Compute new cluster centres based on the pixel distribution.
// Update the clusters.
}
Looking at the OpenCV 3.0 implementation https://github.com/Itseez/opencv_contrib/blob/master/modules/ximgproc/src/slic.cpp#L1217 (line 1217), they too have optimised this using the parallel_for function of TBB. Is there a way to do this using the OpenCV 2.4 GPU class. Or if there’s not, does that mean I have to use CUDA explicitly.