how to avoid tearing artifacts? tearing artifacts appear in host application
Hi,
I'm coding a plugin (Freeframe) for a host application (VVVV -> [url="http://vvvv.org)"]http://vvvv.org)[/url]. I want to use CUDA in my plugin source code e.g. for performing a simple background subtraction (subtract reference image from current camera frame). Using the simple code, I can see in the host application's render window that the reference background image is subtracted from the original camera image. But: there are heavy tearing artifacts I cannot explain. Any suggestions how to fix this? (switching VSync on in the NVidia control settings didn't fix the problem - category "global 3D settings").

My system:
Windows XP Home SP3
Intel Core 2 Duo (2,00GHz)
2GB RAM
Microsoft Visual Studio 2008 Standard Edition
GeForce 8600GT, 512MB
graphics driver: CUDA 180.60 beta
SDK: NVIDIA_SDK10_CUDA_2.10.1126.1520.exe
monitor resolution:1280x1024 32bit @85Hz


here is a codesnippet:


[codebox]////// File main.cpp ////////

...

//pFrame contains the current camera frame. The VVVV host application calls indirectly this method (which is part of the plugin dll). The modified data in pFrame will be rendered by the host application (using DirectX)
DWORD plugClass::processFrame24Bit(LPVOID pFrame)
{
...
EnterCriticalSection(&CriticalSection);
...
//pSrc, pDest, m_pBackgroundReference are OpenCV images of type IplImage*
processBackroundSubtraction((unsigned char*)pSrc->imageData, (unsigned char*)pDest->imageData,
(unsigned char*)m_pBackgroundReference->imageData, numImageDataElements, buffer);
...
LeaveCriticalSection(&CriticalSection);
...
}
//////////////////////////////


/////// File BackgroundSubtraction.cu ////////////

#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>


__global__ void backgroundSubtractionOnDevice(unsigned char* current, unsigned const char* reference, int count)
{
int idx = blockIdx.x*blockDim.x + threadIdx.x;
if (idx < count)
{
int value = current[idx] - reference[idx];
current[idx] = (value < 0) ? 0 : (unsigned char) value;
}
}

//TODO:
//resolve tearing artifacts (sync issue?)
//optimise:
//1) upload reference image only if necessary.
//2) allocate/deallocate GPU memory only if input image format changes (image width, height, or depth)
//3) what are best values for blockSize and numBlocks?
//4) use shared memory for faster data access
extern "C"
void processBackroundSubtraction(const unsigned char* pSrc, unsigned char* pDest,
const unsigned char* pBackgroundReference, int count, char* error)
{
unsigned char* current_dev;
unsigned char* reference_dev;
size_t size = count * sizeof(unsigned char);

//upload to GPU
cudaMalloc((void **) &current_dev, size);
cudaMalloc((void **) &reference_dev, size);
cudaMemcpy(current_dev, pSrc, size, cudaMemcpyHostToDevice);
cudaMemcpy(reference_dev, pBackgroundReference, size, cudaMemcpyHostToDevice);
//background subtraction on GPU
int blockSize = 128;
int numBlocks = (int)ceil(count / (float)blockSize);
backgroundSubtractionOnDevice <<< numBlocks, blockSize >>> (current_dev, reference_dev, count);
//download from GPU
cudaMemcpy(pDest, current_dev, size, cudaMemcpyDeviceToHost);

cudaFree(current_dev);
cudaFree(reference_dev);

//get error message if GPU background subtraction failed
cudaError_t cudaErrorCode = cudaGetLastError();
if (cudaErrorCode != cudaSuccess)
{
strcpy(error, "CUDA processBackroundSubtraction() failed to launch error = ");
char cNumber[128];
itoa(cudaErrorCode, cNumber, 10);
strcat(error, cNumber);
}
}

///////////////////////////////////////////////////////[/codebox]
Hi,

I'm coding a plugin (Freeframe) for a host application (VVVV -> http://vvvv.org). I want to use CUDA in my plugin source code e.g. for performing a simple background subtraction (subtract reference image from current camera frame). Using the simple code, I can see in the host application's render window that the reference background image is subtracted from the original camera image. But: there are heavy tearing artifacts I cannot explain. Any suggestions how to fix this? (switching VSync on in the NVidia control settings didn't fix the problem - category "global 3D settings").



My system:

Windows XP Home SP3

Intel Core 2 Duo (2,00GHz)

2GB RAM

Microsoft Visual Studio 2008 Standard Edition

GeForce 8600GT, 512MB

graphics driver: CUDA 180.60 beta

SDK: NVIDIA_SDK10_CUDA_2.10.1126.1520.exe

monitor resolution:1280x1024 32bit @85Hz





here is a codesnippet:





[codebox]////// File main.cpp ////////



...



//pFrame contains the current camera frame. The VVVV host application calls indirectly this method (which is part of the plugin dll). The modified data in pFrame will be rendered by the host application (using DirectX)

DWORD plugClass::processFrame24Bit(LPVOID pFrame)

{

...

EnterCriticalSection(&CriticalSection);

...

//pSrc, pDest, m_pBackgroundReference are OpenCV images of type IplImage*

processBackroundSubtraction((unsigned char*)pSrc->imageData, (unsigned char*)pDest->imageData,

(unsigned char*)m_pBackgroundReference->imageData, numImageDataElements, buffer);

...

LeaveCriticalSection(&CriticalSection);

...

}

//////////////////////////////





/////// File BackgroundSubtraction.cu ////////////



#include <cuda.h>

#include <cuda_runtime.h>

#include <stdio.h>





__global__ void backgroundSubtractionOnDevice(unsigned char* current, unsigned const char* reference, int count)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx < count)

{

int value = current[idx] - reference[idx];

current[idx] = (value < 0) ? 0 : (unsigned char) value;

}

}



//TODO:

//resolve tearing artifacts (sync issue?)

//optimise:

//1) upload reference image only if necessary.

//2) allocate/deallocate GPU memory only if input image format changes (image width, height, or depth)

//3) what are best values for blockSize and numBlocks?

//4) use shared memory for faster data access

extern "C"

void processBackroundSubtraction(const unsigned char* pSrc, unsigned char* pDest,

const unsigned char* pBackgroundReference, int count, char* error)

{

unsigned char* current_dev;

unsigned char* reference_dev;

size_t size = count * sizeof(unsigned char);



//upload to GPU

cudaMalloc((void **) &current_dev, size);

cudaMalloc((void **) &reference_dev, size);

cudaMemcpy(current_dev, pSrc, size, cudaMemcpyHostToDevice);

cudaMemcpy(reference_dev, pBackgroundReference, size, cudaMemcpyHostToDevice);

//background subtraction on GPU

int blockSize = 128;

int numBlocks = (int)ceil(count / (float)blockSize);

backgroundSubtractionOnDevice <<< numBlocks, blockSize >>> (current_dev, reference_dev, count);

//download from GPU

cudaMemcpy(pDest, current_dev, size, cudaMemcpyDeviceToHost);



cudaFree(current_dev);

cudaFree(reference_dev);



//get error message if GPU background subtraction failed

cudaError_t cudaErrorCode = cudaGetLastError();

if (cudaErrorCode != cudaSuccess)

{

strcpy(error, "CUDA processBackroundSubtraction() failed to launch error = ");

char cNumber[128];

itoa(cudaErrorCode, cNumber, 10);

strcat(error, cNumber);

}

}



///////////////////////////////////////////////////////[/codebox]

#1
Posted 01/08/2009 03:23 PM   
I doubt that the tearing has anything to do with CUDA. What graphics API are you using to display the image? Is v-sync enabled in the control panel?
I doubt that the tearing has anything to do with CUDA. What graphics API are you using to display the image? Is v-sync enabled in the control panel?

#2
Posted 01/08/2009 05:25 PM   
I don't have access to the host application's source code, which is responsible for rendering the image. But on the developer's website (vvvv.org) they say, they're using DirectX (9).
The problem remains the same when Vsync is enabled in the NVidia control settings.
Is the problem related to my CUDA code (note: it's my first CUDA code)? Does cudaMemcpy start to copy the data though backgroundSubtractionOnDevice<<< numBlocks, blockSize>>> (...) may not have finished yet?
I don't have access to the host application's source code, which is responsible for rendering the image. But on the developer's website (vvvv.org) they say, they're using DirectX (9).

The problem remains the same when Vsync is enabled in the NVidia control settings.

Is the problem related to my CUDA code (note: it's my first CUDA code)? Does cudaMemcpy start to copy the data though backgroundSubtractionOnDevice<<< numBlocks, blockSize>>> (...) may not have finished yet?

#3
Posted 01/09/2009 08:34 AM   
[quote name='drjones' post='488473' date='Jan 9 2009, 02:34 AM']Is the problem related to my CUDA code (note: it's my first CUDA code)? Does cudaMemcpy start to copy the data though backgroundSubtractionOnDevice<<< numBlocks, blockSize>>> (...) may not have finished yet?[/quote]
No. The memcpy call will wait for the kernel to complete.
[quote name='drjones' post='488473' date='Jan 9 2009, 02:34 AM']Is the problem related to my CUDA code (note: it's my first CUDA code)? Does cudaMemcpy start to copy the data though backgroundSubtractionOnDevice<<< numBlocks, blockSize>>> (...) may not have finished yet?

No. The memcpy call will wait for the kernel to complete.

#4
Posted 01/09/2009 12:58 PM   
[quote name='MisterAnderson42' post='488540' date='Jan 9 2009, 01:58 PM']No. The memcpy call will wait for the kernel to complete.[/quote]

So my code looks alright?
Hm... what else might be the problem? Yesterday I thought, that it might be due to VVVV rendering while I manipulate image data in my plugin code using CUDA. But I did image processing (using OpenCV) before I added CUDA code, too, and there was no tearing at all. So I guess, that the host application doesn't render the image, as long as my plugin doesn't return. But this assumption implies that my CUDA code is wrong... *confused*
[quote name='MisterAnderson42' post='488540' date='Jan 9 2009, 01:58 PM']No. The memcpy call will wait for the kernel to complete.



So my code looks alright?

Hm... what else might be the problem? Yesterday I thought, that it might be due to VVVV rendering while I manipulate image data in my plugin code using CUDA. But I did image processing (using OpenCV) before I added CUDA code, too, and there was no tearing at all. So I guess, that the host application doesn't render the image, as long as my plugin doesn't return. But this assumption implies that my CUDA code is wrong... *confused*

#5
Posted 01/09/2009 01:36 PM   
Scroll To Top