I’m trying to access dwImageCuda pixel data to create a 2D matrix of picture (similar to cv::Mat format) in order to send that data to v4l2loopback and stream the data to the network.
First I tryed to convert dwImageCuda image to cv::Mat but all I get is strange black and white image. Example code:
Then I tryed to access pixel data from dwImageCuda dptr and construct the matrix by myself, but I couldn’t find a way to do that.
Am I doing something wrong? What is the correct way of doing this?
Hi anja,
Just for your information, CUDA can store data in pitch linear(dptr) format and block linear format(cudaArray).
Can you please copy the data from cudaArray to host memory and check if it actually contains data instead of dptr? Please refer to appropriate cuda Memcpy function at CUDA Runtime API :: CUDA Toolkit Documentation
In the question, img_cuda->dptr[0] points to the first image plane only. All image planes are required to construct the color image properly.
It is also possible that the pixel values are of floating type (between 0 & 1) and thus direct conversion of those to CV_8U might not yield appropriate results.
Here is a simple example to copy data from dwImageCUDA into a cv::Mat or cv::cuda::GpuMat - pixel by pixel.
The format of the input dwImageCUDA must be known as the memory mapping depends on format.
The format for dwImageCUDA *cudaImage; can be found at cudaImage->prop.format.
The copy can be performed in following way:
The format of dwImageCUDA is considered to be DW_IMAGE_FORMAT_RGB_FLOAT16_PLANAR, as an example. That is a 3 channel RGB image with pixel values in float16 type, in range 0 to 1.
__device__ unsigned char __uc(__half const x){
unsigned int i8 = (unsigned int)(__half2float(x) * 255.0);
return (unsigned char) (i8 > 255) ? 255 : i8;
}
__global__ void cudaRgb2MatBgr(
void * const red, // input
void * const green, // input
void * const blue, // input
size_t const pitch, // input
unsigned int const rows, // input
unsigned int const cols, // input
unsigned char * const data, // output
size_t const step // input
){
// global thread ID
long int const id = blockIdx.x * blockDim.x + threadIdx.x;
// indices for mat
int const rm = id / step; // row in Mat
int const cm = (id - (rm*step)); // column in Mat
int const ch = cm % 3; // color channel in Mat (3 channels for BGR image)
// indices for cudaImage
int const rc = id / pitch; // row in cuda image
int const cc = (id - (rc*pitch)); // column in cuda image
// pointers to color planes
__half* r16 = (__half*) red; // pointer to red image plane
__half* g16 = (__half*) green; // pointer to green image plane
__half* b16 = (__half*) blue; // pointer to blue image plane
__half* bgr[3] = { b16, g16, r16 };
// check if indices are within bounds
if( (rm < rows) && (cm < cols*3) ){
data[rm*step + cm] = __uc(bgr[ch][rc*pitch/3 + cc/3]); // copy data
}
}
// -------------------------------------------------
int cudaImage2Mat(dwImageCUDA *cudaImage, cv::Mat &im){
// dwImageCUDA * is input
// cv::Mat is output
// Create an empty cv::cuda::Mat with the dimensions of input dwImageCUDA
cv::cuda::GpuMat gim(imProp.height, imProp.width, CV_8UC3, cv::Scalar(0));
int blockSize, gridSize;
// Number of threads in each thread block
blockSize = 1024;
// Number of thread blocks in grid
unsigned int n = gim.rows * gim.step;
gridSize = (int)ceil((float)n/blockSize);
// call the kernel
cudaRgb2MatBgr<<<gridSize, blockSize>>>(
cudaImage->dptr[0], // red plane
cudaImage->dptr[1], // green plane
cudaImage->dptr[2], // blue plane
cudaImage->pitch[0], // pitch is usually same for all planes (but verify that)
imProp.height, // image rows
imProp.width, // image columns
gim.data, // receive pixels values here
gim.step // step is pitch for cv::cuda::Mat
);
cudaDeviceSynchronize();
gim.download(im); // copy the data into cpu Mat (cv::Mat)
return 0;
}
Note: float16 (half precision) support for CUDA is available through #include <cuda_fp16.h>
while i was using your script to convert to open cv mat, i encountered the following error while running the binary
terminate called after throwing an instance of ‘cv::Exception’
what(): OpenCV(4.1.1) /home/nvidia/opencv-4.1.1/modules/core/src/cuda/gpu_mat.cu:249: error: (-217:Gpu API call) an illegal memory access was encountered in function ‘download’