Hi Marstiger,
I’m not a CUDA GURU, but I had the same problem. You prbably already solved your issue, but for other people, maybe its interesting to read how to solve this.
First I had :
A WaitTrackAndStopWithMovieExport which ONLY did some workflow activities.
A FollowObjectComplete which did the tracking after an object was investigated.
Typical in the FollowObject function :
WriteLog(“FollowObjectComplete: CUDA GoodFeaturesToTrackDetector_GPU.\r\n”, tntContext);
cornerDetector = gpu::GoodFeaturesToTrackDetector_GPU(MAX_CORNERS,0.01, 5.0,3,0,0.04);
GpuMat gpumatImgA = GpuMat(imgA);
GpuMat gpumatImgB = GpuMat(imgB);
GpuMat gpumatforegroundMasked = GpuMat(foreGroundMasked);
GpuMat gpumatCornersA;
cornerDetector(gpumatImgA,gpumatCornersA,gpumatforegroundMasked);
GpuMat gpumatNextPts;
GpuMat gpumatStatus;
GpuMat gpumatError;
gpu::PyrLKOpticalFlow lkTracker;
lkTracker.sparse(gpumatImgA, gpumatImgB, gpumatCornersA, gpumatNextPts, gpumatStatus, &gpumatError);
WriteLog(“FollowObjectComplete: CUDA PyrLKOpticalFlow.\r\n”, tntContext);
Okay… with the upper code in a while loop, the first time some memory needs to be allocated at the device. The cudaruntime may be awake some time but memory allocation is an importent issue too.
Now, what I’ve done :
Put these havy objects into global scope of your class (like videotoolbox) :
class VideoToolbox
{
public:
VideoToolbox();
int MAX_RINGBUFFER_SIZE; //number of objects in buffer
gpu::GoodFeaturesToTrackDetector_GPU cornerDetector;
…
…
}
Do a stupid call, but WITH data to this cornerDetector object :
in WaitTrackAndStopWithMovieExport(…) i placed the code :
//Wake up GPU for cornerdetection. Alloceren geheugen kost nl veel tijd.
cornerDetector = gpu::GoodFeaturesToTrackDetector_GPU(MAX_CORNERS,0.01, 5.0,3,0,0.04);
IplImage *imgA = GetFrameFromSharedMemoryBuffer(tntContext->RingBufferName, 0);
IplImage *imgB = GetFrameFromSharedMemoryBuffer(tntContext->RingBufferName, 0);
IplImage *foreGroundMasked = GetForeGroundMasked(tntContext, 0, tntContext->fgMask, true, imgB);
GpuMat gpumatImgA = GpuMat(imgA);
GpuMat gpumatImgB = GpuMat(imgB);
GpuMat gpumatforegroundMasked = GpuMat(foreGroundMasked);
GpuMat gpumatCornersA;
cornerDetector(gpumatImgA,gpumatCornersA,gpumatforegroundMasked);
cvReleaseImage(&imgA);
cvReleaseImage(&imgB);
cvReleaseImage(&foreGroundMasked);
//end wakeup cuda or device memory allocation
now I didn’t change anything in my FollowObjectComplete funtion besisdes that this object is not declared there anymore.
The effect is, when some time can be wasted, do this call so not only the CUDA runtime is alive, but the memory is allocated for this object too.
Hope I helped some people.
Rudy