Real Time image Processing CUDA
Hi all

I really need some help and advice as I'm new with CUDA coding and image processing.

I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm.but I am not sure what would be the best option for realtime processing.
my system :
CPU Intel Xeon x5660 2.8Ghz(2 processors)
GPU NVIDIA Quadro 5000

can you please give me some idea about the following questions:

1. I do I need to add any Image Processing library addition to CUDA ??? if yes what do you suggest?

2. as I am new to CUDA programming, can I create a matrix for pixel[i,j] containing values for images [1:n] for each pixel in the image size? for example for 1000 images with 200x200 size I will end up with 40000 matrix each
containing 1000 values for one pixel? Does CUDA gives me some options like OpenCV to have a Matrices ? or Vector ?

please if you have any idea or recommendation, let me know.
I really need some expert advice.
Thank you
Hi all



I really need some help and advice as I'm new with CUDA coding and image processing.



I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm.but I am not sure what would be the best option for realtime processing.

my system :

CPU Intel Xeon x5660 2.8Ghz(2 processors)

GPU NVIDIA Quadro 5000



can you please give me some idea about the following questions:



1. I do I need to add any Image Processing library addition to CUDA ??? if yes what do you suggest?



2. as I am new to CUDA programming, can I create a matrix for pixel[i,j] containing values for images [1:n] for each pixel in the image size? for example for 1000 images with 200x200 size I will end up with 40000 matrix each

containing 1000 values for one pixel? Does CUDA gives me some options like OpenCV to have a Matrices ? or Vector ?



please if you have any idea or recommendation, let me know.

I really need some expert advice.

Thank you

#1
Posted 04/25/2012 11:22 AM   
ArrayFire is a CUDA library that has both image processing library functions as well as easy matrix manipulation and subscripting and sounds like a good fit. Links for your questions are below:

[quote name='samaneh' date='25 April 2012 - 06:22 AM' timestamp='1335352935' post='1400773']
1. I do I need to add any Image Processing library addition to CUDA ??? if yes what do you suggest?
[/quote]

Image Processing: [url="http://www.accelereyes.com/arrayfire/c/group__image__mat.htm"]http://www.accelereyes.com/arrayfire/c/group__image__mat.htm[/url]

[quote name='samaneh' date='25 April 2012 - 06:22 AM' timestamp='1335352935' post='1400773']
2. as I am new to CUDA programming, can I create a matrix for pixel[i,j] containing values for images [1:n] for each pixel in the image size? for example for 1000 images with 200x200 size I will end up with 40000 matrix each
containing 1000 values for one pixel? Does CUDA gives me some options like OpenCV to have a Matrices ? or Vector ?
[/quote]

Manipulating matrices and subscripting: [url="http://www.accelereyes.com/arrayfire/c/page_quickref.htm"]http://www.accelereyes.com/arrayfire/c/page_quickref.htm[/url]
ArrayFire is a CUDA library that has both image processing library functions as well as easy matrix manipulation and subscripting and sounds like a good fit. Links for your questions are below:



[quote name='samaneh' date='25 April 2012 - 06:22 AM' timestamp='1335352935' post='1400773']

1. I do I need to add any Image Processing library addition to CUDA ??? if yes what do you suggest?





Image Processing: http://www.accelereyes.com/arrayfire/c/group__image__mat.htm



[quote name='samaneh' date='25 April 2012 - 06:22 AM' timestamp='1335352935' post='1400773']

2. as I am new to CUDA programming, can I create a matrix for pixel[i,j] containing values for images [1:n] for each pixel in the image size? for example for 1000 images with 200x200 size I will end up with 40000 matrix each

containing 1000 values for one pixel? Does CUDA gives me some options like OpenCV to have a Matrices ? or Vector ?





Manipulating matrices and subscripting: http://www.accelereyes.com/arrayfire/c/page_quickref.htm

John Melonakos ([email="john.melonakos@accelereyes.com"]john.melonakos@accelereyes.com[/email])

#2
Posted 04/25/2012 02:44 PM   
Although, your system is among the best, for real time implementation of your algorithm, you have to consider the limitation of memory transformation bandwidth from host to device and the other way around.

To reach the maximum memory blocks transformation, you'd better use Textures and Buffer Objects in OpenGL to "copy next frames", "process previous frames" and "depict the results", simultaneously (of course, it's not the only way). Afterwards, based on your algorithm (do not forget to design parallel version of your algorithm in advance!!!), you can utilize faster memory structures like shared memory as a draft to compute whatever you want.

If you need more details let me know (sa d~o~t dehghani a~t gmail d~o~t com).
Although, your system is among the best, for real time implementation of your algorithm, you have to consider the limitation of memory transformation bandwidth from host to device and the other way around.



To reach the maximum memory blocks transformation, you'd better use Textures and Buffer Objects in OpenGL to "copy next frames", "process previous frames" and "depict the results", simultaneously (of course, it's not the only way). Afterwards, based on your algorithm (do not forget to design parallel version of your algorithm in advance!!!), you can utilize faster memory structures like shared memory as a draft to compute whatever you want.



If you need more details let me know (sa d~o~t dehghani a~t gmail d~o~t com).

#3
Posted 04/26/2012 12:37 AM   
[quote name='samaneh' date='25 April 2012 - 01:22 PM' timestamp='1335352935' post='1400773']
Hi all

I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm.but I am not sure what would be the best option for realtime processing.
my system :
[/quote]

You can try putting the image into a 2D texture (to be more exact a cudaArray that is bound to a texture). That gives you cached read access, and when required also 2D bilinear interpolation that allows for supersampling calculations).

I seriously doubt that the data rate resulting from a 1000 FPS image capture can be transferred to the GPU in real time. PCI-Express bandwidth limitations. Can you give us the expected data rate that you get from the cam?

There are options for DMA to the card, which might be supported by a Quadro. Like direct transfer from a video capture card to a GPU. But that is one of the enterprise solutions that I do not have too much information about.

Christian
[quote name='samaneh' date='25 April 2012 - 01:22 PM' timestamp='1335352935' post='1400773']

Hi all



I am trying to implement an algorithm for a system which the camera get 1000fps, and I need to get the value of each pixel in all images and do the different calculation on the evolution of pixel[i][j] in N number of images, for all the pixels in the images. I have the (unsigned char *ptr) I want to transfer them to the GPU and start implementing the algorithm.but I am not sure what would be the best option for realtime processing.

my system :





You can try putting the image into a 2D texture (to be more exact a cudaArray that is bound to a texture). That gives you cached read access, and when required also 2D bilinear interpolation that allows for supersampling calculations).



I seriously doubt that the data rate resulting from a 1000 FPS image capture can be transferred to the GPU in real time. PCI-Express bandwidth limitations. Can you give us the expected data rate that you get from the cam?



There are options for DMA to the card, which might be supported by a Quadro. Like direct transfer from a video capture card to a GPU. But that is one of the enterprise solutions that I do not have too much information about.



Christian

#4
Posted 04/26/2012 08:43 AM   
[quote name='cbuchner1' date='26 April 2012 - 12:13 PM' timestamp='1335429812' post='1401154']
There are options for DMA to the card, which might be supported by a Quadro. Like direct transfer from a video capture card to a GPU. But that is one of the enterprise solutions that I do not have too much information about.
[/quote]

Quadro cards with "SDI I/O" option support SD/HD broadcast standards (25, 30, 50 and 60 fps not something like 1000 fps).

The best way to use DMA is based on "Buffer Objects" !!!.

Bests,
Saeed
[quote name='cbuchner1' date='26 April 2012 - 12:13 PM' timestamp='1335429812' post='1401154']

There are options for DMA to the card, which might be supported by a Quadro. Like direct transfer from a video capture card to a GPU. But that is one of the enterprise solutions that I do not have too much information about.





Quadro cards with "SDI I/O" option support SD/HD broadcast standards (25, 30, 50 and 60 fps not something like 1000 fps).



The best way to use DMA is based on "Buffer Objects" !!!.



Bests,

Saeed

#5
Posted 04/26/2012 11:48 AM   
hi all,

the link to the Image processing library was quite helpful & informative, thanks for that,
by the way, how d you approach a video file, say AVI ? I had a look into the forum,
some suggest using MsVideoForWindowsLibrary, some suggest using OpenCV, I feel comfortable
leaving all the dirty job to OpenCV but is there any throughput trade-offs ?

regards,

rooz
hi all,



the link to the Image processing library was quite helpful & informative, thanks for that,

by the way, how d you approach a video file, say AVI ? I had a look into the forum,

some suggest using MsVideoForWindowsLibrary, some suggest using OpenCV, I feel comfortable

leaving all the dirty job to OpenCV but is there any throughput trade-offs ?



regards,



rooz

#6
Posted 04/26/2012 08:43 PM   
[quote name='palang' date='26 April 2012 - 12:43 PM' timestamp='1335472991' post='1401326']
hi all,

the link to the Image processing library was quite helpful & informative, thanks for that,
by the way, how d you approach a video file, say AVI ? I had a look into the forum,
some suggest using MsVideoForWindowsLibrary, some suggest using OpenCV, I feel comfortable
leaving all the dirty job to OpenCV but is there any throughput trade-offs ?

regards,

rooz
[/quote]


For the decoding of video, you can use the decoder API (which is part of CUDA, as opposed to having to use any external libraries). You also have the added advantage of having the decoded frames already in GPU memory, whereas if you use a CPU-based method for decoding you'll have to transfer the video frames to the GPU after decoding them to perform GPU-based image processing.
[quote name='palang' date='26 April 2012 - 12:43 PM' timestamp='1335472991' post='1401326']

hi all,



the link to the Image processing library was quite helpful & informative, thanks for that,

by the way, how d you approach a video file, say AVI ? I had a look into the forum,

some suggest using MsVideoForWindowsLibrary, some suggest using OpenCV, I feel comfortable

leaving all the dirty job to OpenCV but is there any throughput trade-offs ?



regards,



rooz







For the decoding of video, you can use the decoder API (which is part of CUDA, as opposed to having to use any external libraries). You also have the added advantage of having the decoded frames already in GPU memory, whereas if you use a CPU-based method for decoding you'll have to transfer the video frames to the GPU after decoding them to perform GPU-based image processing.

#7
Posted 05/08/2012 08:46 PM   
Scroll To Top