How to perform fish-eye lens distortion correction in gstreamer pipeline? (Hfov ~150)

Hello
I want to include lens distortion correction element in gstreamer pipeline.

e.g.

gst-launch-1.0 nvarguscamerasrc maxperf=1 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1' ! <b>distortioncorrection</b> ! nvv4l2h265enc control-rate=1 bitrate=8000000 ! 'video/x-h265, stream-format=(string)byte-stream' ! h265parse ! qtmux ! filesink location=test.mp4 -e

Knowing the camera matrix and distortion coefficients. Is there any distortion correction element provided by nvidia. (If possible hardware accelerated)

By camera matrix, I mean Fx, Fy, Cx, Cy. Distortion coefficients are K1, K2, P1, P2, K3.
Thanks

Hi,
You may install DeepStream SDK 4.0.1 and check the sample

deepstream_sdk_v4.0.1_jetson\sources\apps\sample_apps\deepstream-dewarper-test

I think this dewarper is for donut shape image of a fisheye camera.

But I was intending to correct distortion due to wide-angle lens camera ~150 deg Hfov.

I am attaching an example in this comment. Do you think dewarper function can perform this distortion correction, knowing the coefficients Fx, Fy, Cx, Cy, K1, K2, P1, P2 and K3.

Thanks.

Hi,
On DS4.0.1, it only supports 360-degree input. We will evaluate to support more cases.

I need the same… Therefor I would like to wrap the undistort-function of openCV in a gstreamer-plugin. But I wonder if it is possible to modify the stream. In the example-apps, there were only modifications on meta data.

Thanks

+1

+1

+1

+1

I would suggest you to use nvivafilter to do this.
Let’s say you’ve done calibration with OpenCV and calibration parameters are available.
You could load it in init() in nvivafilter and compute mapping with initUndistortRectifyMap or something.
Then, write down your CUDA code to remap in gpu_process.

I already tried this, and it worked.

So would you be willing to share some code?

I did the chessboard calibration and already have the lens parameters.

I verified them using python and effectively correct the distortion and make the image rectilinear.

Now I’m a little lost about how to bring all this to the C/Cuda world.

thanks in advance

+1

Rary would you be able to share how you went about doing this? Thank you.

Hi all,

We’ve recently implemented such a kernel for the nvivafilter, unfortunately I can’t provide any code without consulting with my company but I’ll try to outline the method, I hope that’ll help.

First, you must define two device accessible memory arrays to hold mapx and mapy (the output of initUndistortRectifyMap):

device float device_mapx[HEIGHT][WIDTH];
device float devcie_mapy[HEIGHT][WIDTH];

In the init function (already declared in the sample code provided) read the camera matrix and distortion coefficients from XML files (if you’ve extracted the parameters in Python there are several easy tools to convert numpy arrays to XML files). Once you have the matrices, call initUndistortRectifyMap, store mapx and mapy in regular, host memory matrices (not the device matrices declared at the top of the code), then copy mapx/mapy matrices from host memory to device memory using cudaMemcpyToSymbol.

Now you can correct the input image in the kernel using mapx/mapy, I also recommend copying the input image before undistorting (do not undistort in place).

I hope this helps and I hope I could share some more code soon.

+1

Hi,
Are there some news about this evaluation ?
Thanks!

1 Like

I cannot comment about NVIDIA’s evaluation, but it should not be that difficult to apply correction with GPU once you have the opencv correction maps for x and y in float format.

You would first need an opencv version built with CUDA support. Here I’ve been using a 4.2.0 dev version.

This example is a simplified version of nvivafilter plugin. Its sources are available in public_sources.tbz2.

Basically, this example uses constant 640x480 resolution. So you would declare these const and variables:

#include "opencv2/core.hpp"
#include "opencv2/calib3d.hpp"
#include "opencv2/cudawarping.hpp" 

const int max_width = 640;
const int max_height = 480;
static cv::cuda::GpuMat gpu_xmap, gpu_ymap;

In Init() function you would set your xmap and ymap (load your ones the way you want):

init (CustomerFunction * pFuncs)
{
  pFuncs->fPreProcess = pre_process;
  pFuncs->fGPUProcess = gpu_process;
  pFuncs->fPostProcess = post_process;

  /* Initialize maps from CPU. */
  cv::Mat xmap(max_height, max_width, CV_32FC1);
  cv::Mat ymap(max_height, max_width, CV_32FC1);

  //fill matrices with your values
  cv::Mat cam(3, 3, cv::DataType<float>::type);
  cam.at<float>(0, 0) = 528.53618582196384f;
  cam.at<float>(0, 1) = 0.0f;
  cam.at<float>(0, 2) = 314.01736116032430f;

  cam.at<float>(1, 0) = 0.0f;
  cam.at<float>(1, 1) = 532.01912214324500f;
  cam.at<float>(1, 2) = 231.43930864205211f;

  cam.at<float>(2, 0) = 0.0f;
  cam.at<float>(2, 1) = 0.0f;
  cam.at<float>(2, 2) = 1.0f;

  cv::Mat dist(4, 1, cv::DataType<float>::type);  
  dist.at<float>(0, 0) = -0.11839989180635836f;
  dist.at<float>(1, 0) = 0.25425420873955445f;
  dist.at<float>(2, 0) = 0.0013269901775205413f;
  dist.at<float>(3, 0) = 0.0015787467748277866f;

  cv::fisheye::initUndistortRectifyMap(cam, dist, cv::Mat(), cam, cv::Size(max_width, max_height), CV_32FC1, xmap, ymap);

  /* upload to GpuMats */
  gpu_xmap.upload(xmap);
  gpu_ymap.upload(ymap);
}

Once this is done, it’s ready for remapping frames. You would process each frame this way:

static void cv_process_RGBA(void *pdata, int32_t width, int32_t height)
{
    cv::cuda::GpuMat d_Mat_RGBA(height, width, CV_8UC4, pdata);
    cv::cuda::GpuMat d_Mat_RGBA_Src;
    d_Mat_RGBA.copyTo(d_Mat_RGBA_Src); // cannot avoid one copy
    cv::cuda::remap(d_Mat_RGBA_Src, d_Mat_RGBA, gpu_xmap, gpu_ymap, cv::INTER_CUBIC, cv::BORDER_CONSTANT, cv::Scalar(0.f, 0.f, 0.f, 0.f));

    // Check
    if(d_Mat_RGBA.data != pdata)
	std::cerr << "Error reallocated buffer for d_Mat_RGBA" << std::endl;
}

Last thing would be to call this processing when an RGBA (or ABGR) frame is received. In function gpu_process(), you would change the relevant section to:

  if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {
    if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_ABGR) {
 	cv_process_RGBA(eglFrame.frame.pPitch[0], eglFrame.width, eglFrame.height);
    } else if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR) {
      printf ("Invalid eglcolorformat NV12\n");
    } else
      printf ("Invalid eglcolorformat %d\n", eglFrame.eglColorFormat);
  }

Note that in older L4T releases, it was instead CU_EGL_COLOR_FORMAT_BGRA and the codes did change, so it is not binary compatible between versions.

Adapt the makefile to your opencv install directory :

CVCCFLAGS:=-I$(OPENCV_DIR)/include/opencv4
CVLDFLAGS:=-L$(OPENCV_DIR)/lib -lopencv_core -lopencv_calib3d  -lopencv_cudawarping

Build with make and test with:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$OPENCV_DIR/lib
gst-launch-1.0 videotestsrc ! video/x-raw, width=640, height=480, framerate=30/1 ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12, width=640, height=480' ! nvivafilter customer-lib-name=./lib-gst-custom-opencv_cudaprocess.so cuda-process=true ! 'video/x-raw(memory:NVMM), format=RGBA, width=640, height=480' ! nvoverlaysink
Attachments

Main source to be saved as gst-custom-opencv_cudaprocess.cu:

/*
 * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *  * Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 *  * Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 *  * Neither the name of NVIDIA CORPORATION nor the names of its
 *    contributors may be used to endorse or promote products derived
 *    from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
 * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

#include <stdio.h>
#include <stdlib.h>
#include <iostream>

#include <cuda.h>

#include "opencv2/core.hpp"
#include "opencv2/calib3d.hpp"
#include "opencv2/cudawarping.hpp" 

#include "cudaEGL.h"

#if defined(__cplusplus)
extern "C" void Handle_EGLImage (EGLImageKHR image);
extern "C" {
#endif

typedef enum {
  COLOR_FORMAT_Y8 = 0,
  COLOR_FORMAT_U8_V8,
  COLOR_FORMAT_RGBA,
  COLOR_FORMAT_NONE
} ColorFormat;

typedef struct {
  /**
  * cuda-process API
  *
  * @param image   : EGL Image to process
  * @param userPtr : point to user alloc data, should be free by user
  */
  void (*fGPUProcess) (EGLImageKHR image, void ** userPtr);

  /**
  * pre-process API
  *
  * @param sBaseAddr  : Mapped Surfaces(YUV) pointers
  * @param smemsize   : surfaces size array
  * @param swidth     : surfaces width array
  * @param sheight    : surfaces height array
  * @param spitch     : surfaces pitch array
  * @param sformat    : surfaces format array
  * @param nsurfcount : surfaces count
  * @param userPtr    : point to user alloc data, should be free by user
  */
  void (*fPreProcess)(void **sBaseAddr,
                      unsigned int *smemsize,
                      unsigned int *swidth,
                      unsigned int *sheight,
                      unsigned int *spitch,
                      ColorFormat *sformat,
                      unsigned int nsurfcount,
                      void ** userPtr);

  /**
  * post-process API
  *
  * @param sBaseAddr  : Mapped Surfaces(YUV) pointers
  * @param smemsize   : surfaces size array
  * @param swidth     : surfaces width array
  * @param sheight    : surfaces height array
  * @param spitch     : surfaces pitch array
  * @param sformat    : surfaces format array
  * @param nsurfcount : surfaces count
  * @param userPtr    : point to user alloc data, should be free by user
  */
  void (*fPostProcess)(void **sBaseAddr,
                      unsigned int *smemsize,
                      unsigned int *swidth,
                      unsigned int *sheight,
                      unsigned int *spitch,
                      ColorFormat *sformat,
                      unsigned int nsurfcount,
                      void ** userPtr);
} CustomerFunction;

void init (CustomerFunction * pFuncs);

#if defined(__cplusplus)
}
#endif


/**
  * Dummy custom pre-process API implematation.
  * It just access mapped surface userspace pointer &
  * memset with specific pattern modifying pixel-data in-place.
  *
  * @param sBaseAddr  : Mapped Surfaces pointers
  * @param smemsize   : surfaces size array
  * @param swidth     : surfaces width array
  * @param sheight    : surfaces height array
  * @param spitch     : surfaces pitch array
  * @param nsurfcount : surfaces count
  */
static void
pre_process (void **sBaseAddr,
                unsigned int *smemsize,
                unsigned int *swidth,
                unsigned int *sheight,
                unsigned int *spitch,
                ColorFormat  *sformat,
                unsigned int nsurfcount,
                void ** usrptr)
{
  /* add your custom pre-process here
     we draw a green block for demo */
   printf ("pre-process %dx%d size %d\n", *swidth, *sheight, *smemsize); 
}

/**
  * Dummy custom post-process API implematation.
  * It just access mapped surface userspace pointer &
  * memset with specific pattern modifying pixel-data in-place.
  *
  * @param sBaseAddr  : Mapped Surfaces pointers
  * @param smemsize   : surfaces size array
  * @param swidth     : surfaces width array
  * @param sheight    : surfaces height array
  * @param spitch     : surfaces pitch array
  * @param nsurfcount : surfaces count
  */
static void
post_process (void **sBaseAddr,
                unsigned int *smemsize,
                unsigned int *swidth,
                unsigned int *sheight,
                unsigned int *spitch,
                ColorFormat  *sformat,
                unsigned int nsurfcount,
                void ** usrptr)
{
  /* add your custom post-process here
     we draw a green block for demo */
   printf ("post-process %dx%d size %d\n", *swidth, *sheight, *smemsize); 
}



static cv::cuda::GpuMat gpu_xmap, gpu_ymap;

static void cv_process_RGBA(void *pdata, int32_t width, int32_t height)
{
    cv::cuda::GpuMat d_Mat_RGBA(height, width, CV_8UC4, pdata);
    cv::cuda::GpuMat d_Mat_RGBA_Src;
    d_Mat_RGBA.copyTo(d_Mat_RGBA_Src); // cannot avoid one copy
    cv::cuda::remap(d_Mat_RGBA_Src, d_Mat_RGBA, gpu_xmap, gpu_ymap, cv::INTER_CUBIC, cv::BORDER_CONSTANT, cv::Scalar(0.f, 0.f, 0.f, 0.f));

    // Check
    if(d_Mat_RGBA.data != pdata)
	std::cerr << "Error reallocated buffer for d_Mat_RGBA" << std::endl;
}



/**
  * Performs CUDA Operations on egl image.
  *
  * @param image : EGL image
  */
static void
gpu_process (EGLImageKHR image, void ** usrptr)
{
  CUresult status;
  CUeglFrame eglFrame;
  CUgraphicsResource pResource = NULL;

  cudaFree(0);
  status = cuGraphicsEGLRegisterImage(&pResource, image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);

  if (status != CUDA_SUCCESS) {
    printf("cuGraphicsEGLRegisterImage failed : %d \n", status);
    return;
  }

  status = cuGraphicsResourceGetMappedEglFrame( &eglFrame, pResource, 0, 0);
  if (status != CUDA_SUCCESS) {
    printf ("cuGraphicsSubResourceGetMappedArray failed\n");
  }

  status = cuCtxSynchronize();
  if (status != CUDA_SUCCESS) {
    printf ("cuCtxSynchronize failed \n");
  }

  if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {
    if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_ABGR) {
 	cv_process_RGBA(eglFrame.frame.pPitch[0], eglFrame.width, eglFrame.height);
    } else if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_YUV420_SEMIPLANAR) {
      printf ("Invalid eglcolorformat NV12\n");
    } else
      printf ("Invalid eglcolorformat %d\n", eglFrame.eglColorFormat);
  }

  status = cuCtxSynchronize();
  if (status != CUDA_SUCCESS) {
    printf ("cuCtxSynchronize failed after memcpy \n");
  }

  status = cuGraphicsUnregisterResource(pResource);
  if (status != CUDA_SUCCESS) {
    printf("cuGraphicsEGLUnRegisterResource failed: %d \n", status);
  }
}

const int max_width = 640;
const int max_height = 480;

extern "C" void
init (CustomerFunction * pFuncs)
{
  pFuncs->fPreProcess = pre_process;
  pFuncs->fGPUProcess = gpu_process;
  pFuncs->fPostProcess = post_process;

  /* Initialize maps from CPU */
  cv::Mat xmap(max_height, max_width, CV_32FC1);
  cv::Mat ymap(max_height, max_width, CV_32FC1);

   //fill matrices
  cv::Mat cam(3, 3, cv::DataType<float>::type);
  cam.at<float>(0, 0) = 528.53618582196384f;
  cam.at<float>(0, 1) = 0.0f;
  cam.at<float>(0, 2) = 314.01736116032430f;

  cam.at<float>(1, 0) = 0.0f;
  cam.at<float>(1, 1) = 532.01912214324500f;
  cam.at<float>(1, 2) = 231.43930864205211f;

  cam.at<float>(2, 0) = 0.0f;
  cam.at<float>(2, 1) = 0.0f;
  cam.at<float>(2, 2) = 1.0f;

  cv::Mat dist(4, 1, cv::DataType<float>::type);  
  dist.at<float>(0, 0) = -0.11839989180635836f;
  dist.at<float>(1, 0) = 0.25425420873955445f;
  dist.at<float>(2, 0) = 0.0013269901775205413f;
  dist.at<float>(3, 0) = 0.0015787467748277866f;

  cv::fisheye::initUndistortRectifyMap(cam, dist, cv::Mat(), cam, cv::Size(max_width, max_height), CV_32FC1, xmap, ymap);

  /* upload to GpuMats */
  gpu_xmap.upload(xmap);
  gpu_ymap.upload(ymap);
}

extern "C" void
deinit (void)
{

}

Makefile:

###############################################################################
#
# Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#  * Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
#  * Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#  * Neither the name of NVIDIA CORPORATION nor the names of its
#    contributors may be used to endorse or promote products derived
#    from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
###############################################################################

# Location of the CUDA Toolkit
CUDA_PATH ?= /usr/local/cuda
INCLUDE_DIR = /usr/include
LIB_DIR = /usr/lib/aarch64-linux-gnu
TEGRA_LIB_DIR = /usr/lib/aarch64-linux-gnu/tegra
OPENCV_DIR = /usr/local/opencv-github-4.2.0-dev

# For hardfp
#LIB_DIR = /usr/lib/arm-linux-gnueabihf
#TEGRA_LIB_DIR = /usr/lib/arm-linux-gnueabihf/tegra

OSUPPER = $(shell uname -s 2>/dev/null | tr "[:lower:]" "[:upper:]")
OSLOWER = $(shell uname -s 2>/dev/null | tr "[:upper:]" "[:lower:]")

OS_SIZE = $(shell uname -m | sed -e "s/i.86/32/" -e "s/x86_64/64/" -e "s/armv7l/32/")
OS_ARCH = $(shell uname -m | sed -e "s/i386/i686/")

GCC ?= g++
NVCC := $(CUDA_PATH)/bin/nvcc -ccbin $(GCC)

# internal flags
NVCCFLAGS   := --shared -std=c++11
CCFLAGS     := -fPIC  -std=c++11
CVCCFLAGS:=-I$(OPENCV_DIR)/include/opencv4
CVLDFLAGS:=-L$(OPENCV_DIR)/lib -lopencv_core -lopencv_calib3d -lopencv_cudawarping

LDFLAGS     :=

# Extra user flags
EXTRA_NVCCFLAGS   ?=
EXTRA_LDFLAGS     ?=
EXTRA_CCFLAGS     ?=

override abi := aarch64
LDFLAGS += --dynamic-linker=/lib/ld-linux-aarch64.so.1

# For hardfp
#override abi := gnueabihf
#LDFLAGS += --dynamic-linker=/lib/ld-linux-armhf.so.3
#CCFLAGS += -mfloat-abi=hard

ifeq ($(ARMv7),1)
NVCCFLAGS += -target-cpu-arch ARM
ifneq ($(TARGET_FS),)
CCFLAGS += --sysroot=$(TARGET_FS)
LDFLAGS += --sysroot=$(TARGET_FS)
LDFLAGS += -rpath-link=$(TARGET_FS)/lib
LDFLAGS += -rpath-link=$(TARGET_FS)/usr/lib
LDFLAGS += -rpath-link=$(TARGET_FS)/usr/lib/$(abi)-linux-gnu

# For hardfp
#LDFLAGS += -rpath-link=$(TARGET_FS)/usr/lib/arm-linux-$(abi)

endif
endif

# Debug build flags
dbg = 0
ifeq ($(dbg),1)
      NVCCFLAGS += -g -G
      TARGET := debug
else
      TARGET := release
endif

ALL_CCFLAGS :=
ALL_CCFLAGS += $(NVCCFLAGS)
ALL_CCFLAGS += $(EXTRA_NVCCFLAGS)
ALL_CCFLAGS += $(addprefix -Xcompiler ,$(CCFLAGS))
ALL_CCFLAGS += $(addprefix -Xcompiler ,$(EXTRA_CCFLAGS))

ALL_LDFLAGS :=
ALL_LDFLAGS += $(ALL_CCFLAGS)
ALL_LDFLAGS += $(addprefix -Xlinker ,$(LDFLAGS))
ALL_LDFLAGS += $(addprefix -Xlinker ,$(EXTRA_LDFLAGS))

# Common includes and paths for CUDA
INCLUDES  := -I./
LIBRARIES := -L$(LIB_DIR) -lEGL -lGLESv2
LIBRARIES += -L$(TEGRA_LIB_DIR) -lcuda -lrt

################################################################################

# CUDA code generation flags
ifneq ($(OS_ARCH),armv7l)
GENCODE_SM10    := -gencode arch=compute_10,code=sm_10
endif
GENCODE_SM20    := -gencode arch=compute_20,code=sm_20
GENCODE_SM30    := -gencode arch=compute_30,code=sm_30
GENCODE_SM32    := -gencode arch=compute_32,code=sm_32
GENCODE_SM35    := -gencode arch=compute_35,code=sm_35
GENCODE_SM50    := -gencode arch=compute_50,code=sm_50
GENCODE_SMXX    := -gencode arch=compute_50,code=compute_50
GENCODE_SM53    := -gencode arch=compute_53,code=compute_53  # for TX1 / Nano
GENCODE_SM62    := -gencode arch=compute_62,code=compute_62  # for TX2
GENCODE_SM72    := -gencode arch=compute_72,code=compute_72  # for Xavier

ifeq ($(OS_ARCH),armv7l)
# This only supports TK1(3.2) -like architectures
GENCODE_FLAGS   ?= $(GENCODE_SM32)
else
# This only support TX1/Nano(5.3) or TX2(6.2) or Xavier(7.2) -like architectures
GENCODE_FLAGS   ?= $(GEGENCODE_SM53) $(GENCODE_SM62) $(GENCODE_SM72)   
endif

# Target rules
all: build

build: lib-gst-custom-opencv_cudaprocess.so

gst-custom-opencv_cudaprocess.o : gst-custom-opencv_cudaprocess.cu
	$(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(CVCCFLAGS) $(GENCODE_FLAGS) -o $@ -c $<

lib-gst-custom-opencv_cudaprocess.so : gst-custom-opencv_cudaprocess.o
	$(NVCC) $(ALL_LDFLAGS) $(CVLDFLAGS) $(GENCODE_FLAGS) -o $@ $^ $(LIBRARIES)

clean:
	rm lib-gst-custom-opencv_cudaprocess.so gst-custom-opencv_cudaprocess.o

clobber: clean
3 Likes

Hi, it’s great. I was able to duplicate it and run with your pipeline successfully. And also I can pipe a video.h264 file thru the following pipeline:

gst-launch-1.0 \
filesrc location= ~/data/ar.h264 !  h264parse ! nvv4l2decoder ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12, width=1280, height=720' ! nvivafilter customer-lib-name=./lib-gst-custom-opencv_cudaprocess.so cuda-process=true ! 'video/x-raw(memory:NVMM), format=RGBA, width=1280, height=720' !  nvegltransform ! nveglglessink

However, when I add nvinfer into the pipeline, as below:

gst-launch-1.0 filesrc location= ~/data/ar.h264 ! h264parse ! nvv4l2decoder ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12, width=1280, height=720' ! nvivafilter customer-lib-name=./lib-gst-custom-opencv_cudaprocess.so cuda-process=true ! 'video/x-raw(memory:NVMM), format=RGBA, width=1280, height=720' ! nvvideoconvert ! "video/x-raw(memory:NVMM), format=NV12" ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 ! nvinfer config-file-path= /opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt ! nvvideoconvert ! nvdsosd ! nvegltransform ! nveglglessink

I get the following errors:

0:00:06.499927295 27125   0x558c06b190 INFO                 nvinfer gstnvinfer.cpp:602:gst_nvinfer_logger:<nvinfer0> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1681> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:06.508901994 27125   0x558c06b190 INFO                 nvinfer gstnvinfer_impl.cpp:311:notifyLoadModelStatus:<nvinfer0> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt sucessfully
Pipeline is PREROLLING ...
Got context from element 'eglglessink0': gst.egl.EGLDisplay=context, display=(GstEGLDisplay)NULL;
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
nvbuf_utils: nvbuffer Payload Type not supported
NvBufferGetParams failed for src_dmabuf_fd
nvbuffer_transform Failed
gst_nvvconv_transform: NvBufferTransform Failed 
ERROR: from element /GstPipeline:pipeline0/GstH264Parse:h264parse0: Internal data stream error.
Additional debug info:
gstbaseparse.c(3611): gst_base_parse_loop (): /GstPipeline:pipeline0/GstH264Parse:h264parse0:
streaming stopped, reason error (-5)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...

Any clue what am I missing? Please help. Thanks a lot.

Sorry, I can’t provide any solution… Seems there are different NVMM formats depending on using nvvidconv and nvivafilter as opposed to DeepStream. Someone with better knowledge may further comment.
On DS side I’ve just quickly tried dewarp sample, but had not got fine results (very slow, failing to keep sync). Not digged further, though, you may have a look to it and create a new topic if facing problems.

Thanks anyway, your post helps a lot for me to get into the door of running opencv on GPU ; ) will dig in more on DS compatibility.

The only workaround I see would be to use v4l2loopback, where a virtual node would be fed by a first gstreamer pipeline using nvivafilter and nvvidconv to create a preprocessed video stream, and then use this node as source for second DS pipeline.
Not sure if you would have to use NV12 or BGRx as virtual node format, but anyway this might be a CPU expensive solution, so better try the full DS way.

[EDIT: quickly checked further. It may work with:
Install v4l2loopback and create virtual node /dev/video1:

# Install v4l2loopback and utils
sudo apt-get update
sudo apt-get install v4l2loopback-dkms v4l2loopback-utils

# Create a virtual video node /dev/video1
sudo modprobe v4l2loopback exclusive_caps=1 video_nr=1

Then use first pipeline for feeding video in BGRx format with undistorted video into /dev/video1:

gst-launch-1.0 -ev nvarguscamerasrc ! nvivafilter customer-lib-name=./lib-gst-custom-opencv_cudaprocess.so cuda-process=true ! "video/x-raw(memory:NVMM), format=RGBA" ! nvvidconv ! video/x-raw, format=BGRx ! identity drop-allocation=1 ! v4l2sink device=/dev/video1

Then use DS pipeline from v4l2loopback node in another terminal:

gst-launch-1.0 -v v4l2src device=/dev/video1 ! nvvideoconvert ! 'video/x-raw(memory:NVMM), format=NV12' ! m.sink_0     nvstreammux name=m batch-size=1 width=1920 height=1080 ! queue ! nvinfer config-file-path=/opt/nvidia/deepstream/deepstream-5.0/sources/apps/sample_apps/deepstream-test1/dstest1_pgie_config.txt ! nvvideoconvert ! nveglglessink sync=false

]