KLT NvMOT usage

Hi,

We would like to try the reference KLT tracker with our non-GStreamer pipeline.
In order to do this we try to call the query, init and process functions with hand-crafted arguments. We link our application with nvds_mot_klt.
Query tells us that the KLT tracker needs gray color format image with unified mem type.
So we call the init function with the following arguments:

/* INIT */

	char path[] = "/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/tracker_config.yml";
	void* yDevMem;
	const uint32_t width = static_cast< uint32_t >( 2560 );
	const uint32_t height = static_cast< uint32_t >( 720 );
	const uint32_t pitch = static_cast< uint32_t >( width );
	cudaMallocManaged( &yDevMem, width * height, cudaMemAttachHost );
	const uint32_t fullSize = pitch * height;

	// IN params
	NvMOTPerTransformBatchConfig batchConfig[ 1 ];
	batchConfig->bufferType = NVBUF_MEM_CUDA_UNIFIED;
	batchConfig->colorFormat = NVBUF_COLOR_FORMAT_GRAY8;
	batchConfig->maxHeight = height;
	batchConfig->maxPitch = pitch;
	batchConfig->maxSize = fullSize;
	batchConfig->maxWidth = width;

	NvMOTConfig pConfigIn;
	pConfigIn.computeConfig = NVMOTCOMP_CPU;					/**< Compute target. see NvMOTCompute */
	pConfigIn.maxStreams = 1;									/**< Maximum number of streams in a batch. */
	pConfigIn.numTransforms = 1;								/**< Number of NvMOTPerTransformBatchConfig entries in perTransformBatchConfig */
	pConfigIn.perTransformBatchConfig = batchConfig;			/**< List of numTransform batch configs including type and resolution, one for each transform*/
		pConfigIn.miscConfig.gpuId = 0;								/**< GPU to be used. */
		pConfigIn.miscConfig.maxObjPerBatch = 0;					/**< Max number of objects to track per stream. 0 means no limit. */
		pConfigIn.miscConfig.maxObjPerStream = 0;					/**< Max number of objects to track per batch. 0 means no limit. */
	pConfigIn.customConfigFilePathSize = sizeof( path ) + 1;	/**< The char length in customConfigFilePath */
	pConfigIn.customConfigFilePath = path;						/**< Path to the tracker's custom config file. Null terminated */

	// OUT Params
	NvMOTContextHandle pContextHandle;
	NvMOTConfigResponse pConfigResponse;

	{
		const auto status = NvMOT_Init( &pConfigIn, &pContextHandle, &pConfigResponse );
		if( status != NvMOTStatus_OK ) {
			std::cout << "Error";
		}
	}

This call returns NvMOTStatus_OK.

Then we call NvMOT_Process with the following setup:

/* PROCESS */

	// IN Params
	NvBufSurfaceParams bufferParam[ 1 ];
	bufferParam->width = width;								/** width of buffer */
	bufferParam->height = height;							/** height of buffer */
	bufferParam->pitch = pitch;								/** pitch of buffer */
	bufferParam->colorFormat = NVBUF_COLOR_FORMAT_GRAY8;		/** color format */
	bufferParam->layout = NVBUF_LAYOUT_PITCH;				/** BL or PL for Jetson, ONLY PL in case of dGPU */
	bufferParam->dataSize = fullSize;							/** size of allocated memory */
	bufferParam->dataPtr = yDevMem;						/** pointer to allocated memory, Not valid for NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE */
		bufferParam->planeParams.num_planes = 1;						/** Number of planes */
		bufferParam->planeParams.width[ 0 ] = width;				/** width of planes */
		bufferParam->planeParams.height[ 0 ] = height;				/** height of planes */
		bufferParam->planeParams.pitch[ 0 ] = pitch;				/** pitch of planes in bytes */
		bufferParam->planeParams.offset[ 0 ] = 0;						/** offsets of planes in bytes */
		bufferParam->planeParams.psize[ 0 ] = pitch * height;	/** size of planes in bytes */
		bufferParam->planeParams.bytesPerPix[ 0 ] = 1;					/** bytes taken for each pixel */

	bufferParam->mappedAddr.addr[ 0 ] = nullptr;			/** pointers of mapped buffers. Null Initialized values.*/
	bufferParam->mappedAddr.eglImage = nullptr;
	//bufferParam->bufferDesc;	/** dmabuf fd in case of NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE type memory. Invalid for other types. */

	NvBufSurfaceParams* bufferParamPtr{ bufferParam };

	NvMOTObjToTrack objects[ 1 ];
	objects->classId = 0;				/**< Class of the object to be tracked. */
	objects->bbox.x = 40;				/**< Bounding box. */
	objects->bbox.y = 40;
	objects->bbox.width = 40;
	objects->bbox.height = 40;
	objects->confidence = 1.f;			/**< Detection confidence of the object. */
	objects->doTracking = true;			/**< True: track this object.  False: do not initiate  tracking on this object. */
	objects->pPreservedData = nullptr;	/**< Used for the client to keep track of any data associated with the object. */

	NvMOTFrame frame;
	frame.streamID = 0;								/**< The stream source for this frame. */
	frame.frameNum = 0;								/**< Frame number sequentially identifying the frame within a stream. */
	frame.timeStamp = 0;							/**< Timestamp of the frame at the time of capture. */
	frame.timeStampValid = true;					/**< The timestamp value is properly populated. */
	frame.doTracking = true;						/**< True: track objects in this frame; False: do not track this frame. */
	frame.reset = false;							/**< True: reset tracking for the stream. */
	frame.numBuffers = 1;							/**< Number of entries in bufferList. */
	frame.bufferList = &bufferParamPtr;				/**< Array of pointers to buffer params. */
		frame.objectsIn.detectionDone = true;
		frame.objectsIn.numAllocated = 1;
		frame.objectsIn.numFilled = 1;
		frame.objectsIn.list = objects;

	NvMOTProcessParams processParams;
	processParams.numFrames = 1;
	processParams.frameList = &frame;

	// OUT Params
	NvMOTTrackedObjBatch outputTrackedBatch;

	{
		const auto status = NvMOT_Process( pContextHandle, &processParams, &outputTrackedBatch );
		if( status != NvMOTStatus_OK ) {
			std::cout << "Error";
		}
	}

But it fails, status is NvMOTStatus_Error :( There is no log, error details so we are stuck. Could you please verify our setup or give us some guidance what we do wrong? We are not sure about the proper allocation of the image data for example. KLT tracker needs unified memory, so we allocate it with cudaMallocManaged with cudaMemAttachHost - because NVMOTCOMP_CPU. Is it the right thing to do? The interface requires pitch for planes, however cudaMallocManaged gives a non-pitched allocation.

Thank you in advance:
Adam

1 Like

Hi, Adam

Could you extract simple source that we can compile and test in our environment directly.
We would like to share it with our internal team to get a suggestion.

Thanks.

Hi AastaLLL,

Thank you for your message, please find the sample code below.

#include <iostream>
#include <nvdstracker.h>
#include <cuda_runtime_api.h>


int main()
{
	/* INIT */
	char path[] = "/opt/nvidia/deepstream/deepstream-4.0/samples/configs/deepstream-app/tracker_config.yml";
	void* yDevMem;
	const uint32_t width = static_cast< uint32_t >( 2560 );
	const uint32_t height = static_cast< uint32_t >( 720 );
	const uint32_t pitch = static_cast< uint32_t >( width );
	cudaMallocManaged( &yDevMem, width * height, cudaMemAttachHost );
	const uint32_t fullSize = pitch * height;

	// IN params
	NvMOTPerTransformBatchConfig batchConfig[ 1 ]{};
	batchConfig->bufferType = NVBUF_MEM_CUDA_UNIFIED;
	batchConfig->colorFormat = NVBUF_COLOR_FORMAT_GRAY8;
	batchConfig->maxHeight = height;
	batchConfig->maxPitch = pitch;
	batchConfig->maxSize = fullSize;
	batchConfig->maxWidth = width;

	NvMOTConfig pConfigIn{};
	pConfigIn.computeConfig = NVMOTCOMP_CPU;					/**< Compute target. see NvMOTCompute */
	pConfigIn.maxStreams = 1;									/**< Maximum number of streams in a batch. */
	pConfigIn.numTransforms = 1;								/**< Number of NvMOTPerTransformBatchConfig entries in perTransformBatchConfig */
	pConfigIn.perTransformBatchConfig = batchConfig;			/**< List of numTransform batch configs including type and resolution, one for each transform*/
	pConfigIn.miscConfig.gpuId = 0;								/**< GPU to be used. */
	pConfigIn.miscConfig.maxObjPerBatch = 0;					/**< Max number of objects to track per stream. 0 means no limit. */
	pConfigIn.miscConfig.maxObjPerStream = 0;					/**< Max number of objects to track per batch. 0 means no limit. */
	pConfigIn.customConfigFilePathSize = sizeof( path );		/**< The char length in customConfigFilePath */
	pConfigIn.customConfigFilePath = path;						/**< Path to the tracker's custom config file. Null terminated */

	// OUT Params
	NvMOTContextHandle pContextHandle{};
	NvMOTConfigResponse pConfigResponse{};

	{
		const auto status = NvMOT_Init( &pConfigIn, &pContextHandle, &pConfigResponse );
		if( status != NvMOTStatus_OK ) {
			std::cout << "Error";
		}
	}

	/* PROCESS */

	// IN Params
	NvBufSurfaceParams bufferParam[ 1 ]{};
	bufferParam->width = width;								/** width of buffer */
	bufferParam->height = height;							/** height of buffer */
	bufferParam->pitch = pitch;								/** pitch of buffer */
	bufferParam->colorFormat = NVBUF_COLOR_FORMAT_GRAY8;		/** color format */
	bufferParam->layout = NVBUF_LAYOUT_PITCH;				/** BL or PL for Jetson, ONLY PL in case of dGPU */
	bufferParam->dataSize = fullSize;							/** size of allocated memory */
	bufferParam->dataPtr = yDevMem;						/** pointer to allocated memory, Not valid for NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE */
		bufferParam->planeParams.num_planes = 1;						/** Number of planes */
		bufferParam->planeParams.width[ 0 ] = width;				/** width of planes */
		bufferParam->planeParams.height[ 0 ] = height;				/** height of planes */
		bufferParam->planeParams.pitch[ 0 ] = pitch;				/** pitch of planes in bytes */
		bufferParam->planeParams.offset[ 0 ] = 0;						/** offsets of planes in bytes */
		bufferParam->planeParams.psize[ 0 ] = pitch * height;	/** size of planes in bytes */
		bufferParam->planeParams.bytesPerPix[ 0 ] = 1;					/** bytes taken for each pixel */

	bufferParam->mappedAddr.addr[ 0 ] = nullptr;			/** pointers of mapped buffers. Null Initialized values.*/
	bufferParam->mappedAddr.eglImage = nullptr;
	//bufferParam->bufferDesc;	/** dmabuf fd in case of NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE type memory. Invalid for other types. */

	NvBufSurfaceParams* bufferParamPtr{ bufferParam };

	NvMOTObjToTrack objects[ 1 ]{};
	objects->classId = 0;				/**< Class of the object to be tracked. */
	objects->bbox.x = 40;				/**< Bounding box. */
	objects->bbox.y = 40;
	objects->bbox.width = 40;
	objects->bbox.height = 40;
	objects->confidence = 1.f;			/**< Detection confidence of the object. */
	objects->doTracking = true;			/**< True: track this object.  False: do not initiate  tracking on this object. */
	objects->pPreservedData = nullptr;	/**< Used for the client to keep track of any data associated with the object. */

	NvMOTFrame frame{};
	frame.streamID = 0;								/**< The stream source for this frame. */
	frame.frameNum = 0;								/**< Frame number sequentially identifying the frame within a stream. */
	frame.timeStamp = 0;							/**< Timestamp of the frame at the time of capture. */
	frame.timeStampValid = true;					/**< The timestamp value is properly populated. */
	frame.doTracking = true;						/**< True: track objects in this frame; False: do not track this frame. */
	frame.reset = false;							/**< True: reset tracking for the stream. */
	frame.numBuffers = 1;							/**< Number of entries in bufferList. */
	frame.bufferList = &bufferParamPtr;				/**< Array of pointers to buffer params. */
		frame.objectsIn.detectionDone = true;
		frame.objectsIn.numAllocated = 1;
		frame.objectsIn.numFilled = 1;
		frame.objectsIn.list = objects;

	NvMOTProcessParams processParams{};
	processParams.numFrames = 1;
	processParams.frameList = &frame;

	// OUT Params
	NvMOTTrackedObjBatch outputTrackedBatch{};

	{
		const auto status = NvMOT_Process( pContextHandle, &processParams, &outputTrackedBatch );
		if( status != NvMOTStatus_OK ) {
			std::cout << "Error";
		}
	}
}

The CMake file:

cmake_minimum_required( VERSION 2.8 )

project( DSTest LANGUAGES CXX )

add_executable( ${PROJECT_NAME} "main.cpp" )

find_package( CUDA REQUIRED )
target_include_directories( DSTest SYSTEM PUBLIC /opt/nvidia/deepstream/deepstream-4.0/sources/includes ${CUDA_INCLUDE_DIRS} )
find_library( KLT_MOT NAMES nvds_mot_klt HINTS /opt/nvidia/deepstream/deepstream-4.0/lib )

target_link_libraries( DSTest ${KLT_MOT} ${CUDA_LIBRARIES} )

Thanks, Adam

By looking at the code, two things caught my eyes:

One obvious error is that pitch values should be in bytes, whereas width values are in pixels. So we need to modify the following:

const uint32_t pitch = static_cast< uint32_t >( width );


<< const uint32_t pitch = static_cast< uint32_t >( width ) * sizeof( uint32_t );

I don’t know if this would make any difference, but I usually set the EGL mapped address the same as CUDA memory for dGPU case. Of course, for Jetson, we should provide a proper EGL mapped pointer. So if the user is runnning the app on dGPU (i.e., x86), then try the following. If running on Jetson, then should provide proper EGL mapped pointer.

bufferParam->mappedAddr.addr[ 0 ] = nullptr;


<< bufferParam->mappedAddr.addr[ 0 ] = yDevMem;

Hi pshin,

I thought KLT MOT runs on gray images (NvMOT_Query returns NVBUF_COLOR_FORMAT_GRAY8 as required color format, NVMOTCOMP_CPU as compute target and NVBUF_MEM_CUDA_UNIFIED as memory type). If it uses gray images 1 pixel is 1 byte, this is why I used

const uint32_t pitch = static_cast< uint32_t >( width );

I’ve tried your suggestion ( const uint32_t pitch = static_cast< uint32_t >( width ) * sizeof( uint32_t ); ) but NvMOT_Process still returns NvMOTStatus_Error.

I’ve also tried setting the mapped address too, but nothing changed. Is it required to set those values in the case of KLT MOT, since it does the computation on CPU side?

Hello Adam,

Yes, that’s a good point. The # of pixels would be the # of bytes in GRAY8 color.

Is there specific reason you set cudaMemAttachHost when allocating memory like below?

cudaMallocManaged( &yDevMem, width * height, cudaMemAttachHost );

CUDA documentation says:

If cudaMemAttachHost is specified, then the allocation should not be accessed from devices that have a zero value for the device attribute cudaDevAttrConcurrentManagedAccess; an explicit call to cudaStreamAttachMemAsync will be required to enable access on such devices.

So, please try using the default option, which is cudaMemAttachGlobal. I am not sure if that would make any difference for now as I haven’t reproduced your error.

AastaLLL,

Could you reproduce the user’s case and find any issue pls?

Hi,

We can reproduce this issue internally.
Will update more information with you later.

Thanks.

Hi,

@pshin, I changed cudaMemAttachHost to cudaMemAttachGlobal but the error remains.

@AastaLLL thank you!

Hi,

Thanks for your patience.

Please see the following change for the fix:

diff --git a/main.cpp b/main.cpp
index 16d8027..749b436 100644
--- a/main.cpp
+++ b/main.cpp
@@ -64,7 +64,7 @@ int main()
                bufferParam->planeParams.psize[ 0 ] = pitch * height;   /** size of planes in bytes */
                bufferParam->planeParams.bytesPerPix[ 0 ] = 1;                                  /** bytes taken for each pixel */
 
-       bufferParam->mappedAddr.addr[ 0 ] = nullptr;                    /** pointers of mapped buffers. Null Initialized values.*/
+       bufferParam->mappedAddr.addr[ 0 ] = yDevMem;                    /** pointers of mapped buffers. Null Initialized values.*/
        bufferParam->mappedAddr.eglImage = nullptr;
        //bufferParam->bufferDesc;      /** dmabuf fd in case of NVBUF_MEM_SURFACE_ARRAY and NVBUF_MEM_HANDLE type memory. Invalid for other types. */
 
@@ -100,6 +100,9 @@ int main()
 
        // OUT Params
        NvMOTTrackedObjBatch outputTrackedBatch{};
+        NvMOTTrackedObjList outObj[ 1 ]{};
+        outputTrackedBatch.numAllocated = 1;
+        outputTrackedBatch.list = outObj;
 
        {
                const auto status = NvMOT_Process( pContextHandle, &processParams, &outputTrackedBatch );

There are two issue found in your source:

  1. The input memory address is not set properly
  2. The memory allocation for output was missing

After fixing the above problems, NvMOT_Process can run successfully without error.

Thanks.

Oh, thank you, it works :) Marking pTrackedObjBatch output parameter misled us.

Hello, can you explain me the outputTrackBatch , the output we are getting as tracked co ordinates, as i m not able to club it with next frame , when i do cv: imshow.

Hi gandhisaloni77,

Please open a new topic for your issue. Thanks