MMAPI dqBuffer blocking problem

li_lin · April 20, 2017, 9:59am

Hi:

I am using mmapi to do real-time video decoding and display functions, refer to samples / 00_video_decode I did a thread, but I found that when i wanted dec_capture_loop_fcn to quit, there was a chance to block in the function

dec->capture_plane.dqBuffer (v4l2_buf, & dec_buffer, NULL, 0);

By tracking the print, I found that the specific blocking position is a function

Ret = v4l2_ioctl (fd, VIDIOC_DQBUF, & v4l2_buf);

I tried adding non-blocking parameters when creating Dec.

Ctx.dec = NvVideoDecoder :: createVideoDecoder (decname, O_NONBLOCK);

From the print, this->blocking is set to 0.
But it does not seem to have any effect.

please help, thanks!

DaneLLL · April 21, 2017, 1:10am

Hi Li,
We have an issue for encoder:
[url]https://devtalk.nvidia.com/default/topic/987024/jetson-tx1/question-about-v4l2-api-for-encode-of-tx1/post/5132125/#5132125[/url]

For decoder, it should be good. Please share how to reproduce the issue you observed.

li_lin · April 21, 2017, 3:20am

Hi DaneLLL,

Thanks for your reply.

In order to reproduce the issue, I still modified samples/00_video_decode

The following is where I have changed

1、add the O_NONBLOCK flag， NvVideoDecoder::createVideoDecoder(“dec0”, O_NONBLOCK);
2、After the dec->output_plane.qBuffer is called once, I paused the main process.
3、add print in dec->capture_plane.dqBuffer

In non-blocking mode, the dec->capture_plane.dqBuffer function is still not returned after no NALU is fed

Here’s my code:

#include "NvApplicationProfiler.h"
#include "NvUtils.h"
#include <errno.h>
#include <fstream>
#include <iostream>
#include <linux/videodev2.h>
#include <malloc.h>
#include <pthread.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>

#include "video_decode.h"

#define TEST_ERROR(cond, str, label) if(cond) { \
	cerr << str << endl; \
	error = 1; \
	goto label; }

#define CHUNK_SIZE 4000000
#define MIN(a,b) (((a) < (b)) ? (a) : (b))

#define IS_NAL_UNIT_START(buffer_ptr) (!buffer_ptr[0] && !buffer_ptr[1] && \
		!buffer_ptr[2] && (buffer_ptr[3] == 1))

#define IS_NAL_UNIT_START1(buffer_ptr) (!buffer_ptr[0] && !buffer_ptr[1] && \
		(buffer_ptr[2] == 1))

using namespace std;


	static int
read_decoder_input_chunk(ifstream * stream, NvBuffer * buffer)
{
	// Length is the size of the buffer in bytes
	streamsize bytes_to_read = MIN(CHUNK_SIZE, buffer->planes[0].length);

	stream->read((char *) buffer->planes[0].data, bytes_to_read);
	// It is necessary to set bytesused properly, so that decoder knows how
	// many bytes in the buffer are valid
	buffer->planes[0].bytesused = stream->gcount();
	return 0;
}

	static void
abort(context_t *ctx)
{
	ctx->got_error = true;
	ctx->dec->abort();
}


	static void
query_and_set_capture(context_t * ctx)
{
	NvVideoDecoder *dec = ctx->dec;
	struct v4l2_format format;
	struct v4l2_crop crop;
	int32_t min_dec_capture_buffers;
	int ret = 0;
	int error = 0;
	uint32_t window_width;
	uint32_t window_height;

	// Get capture plane format from the decoder. This may change after
	// an resolution change event
	ret = dec->capture_plane.getFormat(format);
	TEST_ERROR(ret < 0,
			"Error: Could not get format from decoder capture plane", error);

	// Get the display resolution from the decoder
	ret = dec->capture_plane.getCrop(crop);
	TEST_ERROR(ret < 0,
			"Error: Could not get crop from decoder capture plane", error);

	cout << "Video Resolution: " << crop.c.width << "x" << crop.c.height
		<< endl;


	if (!ctx->disable_rendering)
	{
		// Destroy the old instance of renderer as resolution might have changed
		delete ctx->renderer;

		if (ctx->fullscreen)
		{
			// Required for fullscreen
			window_width = window_height = 0;
		}
		else if (ctx->window_width && ctx->window_height)
		{
			// As specified by user on commandline
			window_width = ctx->window_width;
			window_height = ctx->window_height;
		}
		else
		{
			// Resolution got from the decoder
			window_width = crop.c.width;
			window_height = crop.c.height;
		}

		// If height or width are set to zero, EglRenderer creates a fullscreen
		// window
		ctx->renderer =
			NvEglRenderer::createEglRenderer("renderer0", window_width,
					window_height, ctx->window_x,
					ctx->window_y);
		TEST_ERROR(!ctx->renderer,
				"Error in setting up renderer. "
				"Check if X is running or run with --disable-rendering",
				error);
		if (!ctx->renderer)
		{
			pause();
		}
		if (ctx->stats)
		{
			ctx->renderer->enableProfiling();
		}

		ctx->renderer->setFPS(ctx->fps);
	}

	// deinitPlane unmaps the buffers and calls REQBUFS with count 0
	dec->capture_plane.deinitPlane();

	// Not necessary to call VIDIOC_S_FMT on decoder capture plane.
	// But decoder setCapturePlaneFormat function updates the class variables
	ret = dec->setCapturePlaneFormat(format.fmt.pix_mp.pixelformat,
			format.fmt.pix_mp.width,
			format.fmt.pix_mp.height);
	TEST_ERROR(ret < 0, "Error in setting decoder capture plane format", error);

	// Get the minimum buffers which have to be requested on the capture plane
	ret = dec->getMinimumCapturePlaneBuffers(min_dec_capture_buffers);
	TEST_ERROR(ret < 0,
			"Error while getting value of minimum capture plane buffers",
			error);

	// Request (min + 5) buffers, export and map buffers
	ret =
		dec->capture_plane.setupPlane(V4L2_MEMORY_MMAP,
				min_dec_capture_buffers + 5, false,
				false);
	TEST_ERROR(ret < 0, "Error in decoder capture plane setup", error);


	// Capture plane STREAMON
	ret = dec->capture_plane.setStreamStatus(true);
	TEST_ERROR(ret < 0, "Error in decoder capture plane streamon", error);

	// Enqueue all the empty capture plane buffers
	for (uint32_t i = 0; i < dec->capture_plane.getNumBuffers(); i++)
	{
		struct v4l2_buffer v4l2_buf;
		struct v4l2_plane planes[MAX_PLANES];

		memset(&v4l2_buf, 0, sizeof(v4l2_buf));
		memset(planes, 0, sizeof(planes));

		v4l2_buf.index = i;
		v4l2_buf.m.planes = planes;
		ret = dec->capture_plane.qBuffer(v4l2_buf, NULL);
		TEST_ERROR(ret < 0, "Error Qing buffer at output plane", error);
	}
	cout << "Query and set capture successful" << endl;
	return;

error:
	if (error)
	{
		abort(ctx);
		cerr << "Error in " << __func__ << endl;
	}
}

	static void *
dec_capture_loop_fcn(void *arg)
{
	context_t *ctx = (context_t *) arg;
	NvVideoDecoder *dec = ctx->dec;
	struct v4l2_event ev;
	int ret;

	cout << "Starting decoder capture loop thread" << endl;
	// Need to wait for the first Resolution change event, so that
	// the decoder knows the stream resolution and can allocate appropriate
	// buffers when we call REQBUFS
	do
	{
		ret = dec->dqEvent(ev, 50000);
		if (ret < 0)
		{
			if (errno == EAGAIN)
			{
				cerr <<
					"Timed out waiting for first V4L2_EVENT_RESOLUTION_CHANGE"
					<< endl;
			}
			else
			{
				cerr << "Error in dequeueing decoder event" << endl;
			}
			abort(ctx);
			break;
		}
	}
	while (ev.type != V4L2_EVENT_RESOLUTION_CHANGE);

	// query_and_set_capture acts on the resolution change event
	if (!ctx->got_error)
		query_and_set_capture(ctx);

	// Exit on error or EOS which is signalled in main()
	while (!(ctx->got_error || dec->isInError() || ctx->got_eos))
	{
		NvBuffer *dec_buffer;

		// Check for Resolution change again
		ret = dec->dqEvent(ev, false);
		if (ret == 0)
		{
			switch (ev.type)
			{
				case V4L2_EVENT_RESOLUTION_CHANGE:
					query_and_set_capture(ctx);
					continue;
			}
		}

		while (1)
		{
			struct v4l2_buffer v4l2_buf;
			struct v4l2_plane planes[MAX_PLANES];

			memset(&v4l2_buf, 0, sizeof(v4l2_buf));
			memset(planes, 0, sizeof(planes));
			v4l2_buf.m.planes = planes;

			// Dequeue a filled buffer
			printf("before Dq\n");
			if (dec->capture_plane.dqBuffer(v4l2_buf, &dec_buffer, NULL, 0))
			{
				if (errno == EAGAIN)
				{
					usleep(1000);
				}
				else
				{
					abort(ctx);
					cerr << "Error while calling dequeue at capture plane" <<
						endl;
				}
				break;
			}
			printf("after Dq, TotalDequeued %d bufs\n", dec->capture_plane.getTotalDequeuedBuffers());


			// EglRenderer requires the fd of the 0th plane to render the buffer
			ctx->renderer->render(dec_buffer->planes[0].fd);


			// Not writing to file
			// Queue the buffer back once it has been used.
			if (dec->capture_plane.qBuffer(v4l2_buf, NULL) < 0)
			{
				abort(ctx);
				cerr <<
					"Error while queueing buffer at decoder capture plane"
					<< endl;
				break;
			}
		}
	}
	cout << "Exiting decoder capture loop thread" << endl;
	return NULL;
}

	static void
set_defaults(context_t * ctx)
{
	memset(ctx, 0, sizeof(context_t));
	ctx->fullscreen = false;
	ctx->window_height = 0;
	ctx->window_width = 0;
	ctx->window_x = 0;
	ctx->window_y = 0;
	ctx->out_pixfmt = 1;
	ctx->fps = 30;

	pthread_mutex_init(&ctx->queue_lock, NULL);
	pthread_cond_init(&ctx->queue_cond, NULL);
}

	int
main(int argc, char *argv[])
{
	context_t ctx;
	int ret = 0;
	int error = 0;
	uint32_t i;
	uint32_t frame_cnt;
	bool eos = false;
	char *nalu_parse_buffer = NULL;
	NvApplicationProfiler &profiler = NvApplicationProfiler::getProfilerInstance();

	int k = 0;

	while(1)
	{
		printf("!!!!!! loop_cnt = %d\n", k);
		k++;
		ret = 0;
		error = 0;
		eos = false;
		nalu_parse_buffer = NULL;

		set_defaults(&ctx);

		if (parse_csv_args(&ctx, argc, argv))
		{
			fprintf(stderr, "Error parsing commandline arguments\n");
			return -1;
		}

		ctx.dec = NvVideoDecoder::createVideoDecoder("dec0", O_NONBLOCK);
		TEST_ERROR(!ctx.dec, "Could not create decoder", cleanup);

		if (ctx.stats)
		{
			profiler.start(NvApplicationProfiler::DefaultSamplingInterval);
			ctx.dec->enableProfiling();
		}

		// Subscribe to Resolution change event
		ret = ctx.dec->subscribeEvent(V4L2_EVENT_RESOLUTION_CHANGE, 0, 0);
		TEST_ERROR(ret < 0, "Could not subscribe to V4L2_EVENT_RESOLUTION_CHANGE",
				cleanup);

		if (ctx.input_nalu)
		{
			nalu_parse_buffer = new char[CHUNK_SIZE];
		}
		else
		{
			// Set V4L2_CID_MPEG_VIDEO_DISABLE_COMPLETE_FRAME_INPUT control to false
			// so that application can send chunks of encoded data instead of forming
			// complete frames. This needs to be done before setting format on the
			// output plane.
			ret = ctx.dec->disableCompleteFrameInputBuffer();
			TEST_ERROR(ret < 0,
					"Error in decoder disableCompleteFrameInputBuffer", cleanup);
		}

		// Set format on the output plane
		ret = ctx.dec->setOutputPlaneFormat(ctx.decoder_pixfmt, CHUNK_SIZE);
		TEST_ERROR(ret < 0, "Could not set output plane format", cleanup);

		// V4L2_CID_MPEG_VIDEO_DISABLE_DPB should be set after output plane
		// set format
		if (ctx.disable_dpb)
		{
			ret = ctx.dec->disableDPB();
			TEST_ERROR(ret < 0, "Error in decoder disableDPB", cleanup);
		}

		if (ctx.enable_metadata)
		{
			ret = ctx.dec->enableMetadataReporting();
			TEST_ERROR(ret < 0, "Error while enabling metadata reporting", cleanup);
		}

		if (ctx.skip_frames)
		{
			ret = ctx.dec->setSkipFrames(ctx.skip_frames);
			TEST_ERROR(ret < 0, "Error while setting skip frames param", cleanup);
		}

		// Query, Export and Map the output plane buffers so that we can read
		// encoded data into the buffers
		ret = ctx.dec->output_plane.setupPlane(V4L2_MEMORY_MMAP, 10, true, false);
		TEST_ERROR(ret < 0, "Error while setting up output plane", cleanup);

		ctx.in_file = new ifstream(ctx.in_file_path);
		TEST_ERROR(!ctx.in_file->is_open(), "Error opening input file", cleanup);

		if (ctx.out_file_path)
		{
			ctx.out_file = new ofstream(ctx.out_file_path);
			TEST_ERROR(!ctx.out_file->is_open(), "Error opening output file",
					cleanup);
		}


		ret = ctx.dec->output_plane.setStreamStatus(true);
		TEST_ERROR(ret < 0, "Error in output plane stream on", cleanup);

		pthread_create(&ctx.dec_capture_loop, NULL, dec_capture_loop_fcn, &ctx);

		frame_cnt = 0;
		struct v4l2_buffer v4l2_buf;
		struct v4l2_plane planes[MAX_PLANES];
		NvBuffer *buffer;
		while (!eos && !ctx.got_error && !ctx.dec->isInError())
		{

			memset(&v4l2_buf, 0, sizeof(v4l2_buf));
			memset(planes, 0, sizeof(planes));

			v4l2_buf.m.planes = planes;

			if(frame_cnt < ctx.dec->output_plane.getNumBuffers())
			{
				buffer = ctx.dec->output_plane.getNthBuffer(frame_cnt);
				v4l2_buf.index = frame_cnt;
			}
			else
			{
				ret = ctx.dec->output_plane.dqBuffer(v4l2_buf, &buffer, NULL, -1);
				if (ret < 0)
				{
					cerr << "Error DQing buffer at output plane" << endl;
					abort(&ctx);
					break;
				}
			}

			read_decoder_input_chunk(ctx.in_file, buffer);

			v4l2_buf.m.planes[0].bytesused = buffer->planes[0].bytesused;

			ret = ctx.dec->output_plane.qBuffer(v4l2_buf, NULL);
			if (ret < 0)
			{
				cerr << "Error Qing buffer at output plane" << endl;
				abort(&ctx);
				break;
			}
			printf("!!!!!!!!! after output plane qBuf, i paused here\n");
			pause();

			if (v4l2_buf.m.planes[0].bytesused == 0)
			{
				eos = true;
				cout << "Input file read complete" << endl;
				break;
			}
			frame_cnt++;
		}



		// After sending EOS, all the buffers from output plane should be dequeued.
		// and after that capture plane loop should be signalled to stop.
		while (ctx.dec->output_plane.getNumQueuedBuffers() > 0 &&
				!ctx.got_error && !ctx.dec->isInError())
		{
			struct v4l2_buffer v4l2_buf;
			struct v4l2_plane planes[MAX_PLANES];

			memset(&v4l2_buf, 0, sizeof(v4l2_buf));
			memset(planes, 0, sizeof(planes));

			v4l2_buf.m.planes = planes;
			ret = ctx.dec->output_plane.dqBuffer(v4l2_buf, NULL, NULL, -1);
			if (ret < 0)
			{
				cerr << "Error DQing buffer at output plane" << endl;
				abort(&ctx);
				break;
			}
		}

		// Signal EOS to the decoder capture loop
		ctx.got_eos = true;


		if (ctx.stats)
		{
			profiler.stop();
			ctx.dec->printProfilingStats(cout);
			if (ctx.renderer)
			{
				ctx.renderer->printProfilingStats(cout);
			}
			profiler.printProfilerData(cout);
		}

cleanup:
		if (ctx.dec_capture_loop)
		{
			pthread_join(ctx.dec_capture_loop, NULL);
		}

		if (ctx.dec && ctx.dec->isInError())
		{
			cerr << "Decoder is in error" << endl;
			error = 1;
		}

		if (ctx.got_error)
		{
			error = 1;
		}

		// The decoder destructor does all the cleanup i.e set streamoff on output and capture planes,
		// unmap buffers, tell decoder to deallocate buffer (reqbufs ioctl with counnt = 0),
		// and finally call v4l2_close on the fd.
		delete ctx.dec;

		// Similarly, EglRenderer destructor does all the cleanup
		delete ctx.renderer;
		delete ctx.in_file;
		delete ctx.out_file;
		delete[] nalu_parse_buffer;


		sleep (4);
	}

	free(ctx.in_file_path);
	free(ctx.out_file_path);

	if (error)
	{
		cout << "App run failed" << endl;
	}
	else
	{
		cout << "App run was successful" << endl;
	}

	return -error;
}

I execute the following command

./video_decode ../../data/video/sample_outdoor_car_1080p_10fps.h264 H264

DaneLLL · April 21, 2017, 8:53am

Hi Li,
Because h264 decoding refers to previous-decoded frames, so there will be a few frame delay in getting decoded frames. The behavior looks normal.

Do you see large interval of frame-delay in your case?

li_lin · April 21, 2017, 9:57am

Hi DaneLLL,

The delay of several frames does not matter
I mean, in the case of non-blocking, the dqBuffer function should return after the timeout, but in fact the dqBuffer function is still blocked there unless the NALU is sent to the decoder

in L4TMultimediaAPIReference

mentioned in the description of func dqBuffer 's parameter num_retries
In case of non-blocking mode, this is equivalent to the number of milliseconds to try to dequeue a buffer.

This problem seems to have been resolved on my side, I used the function,
Ctx.dec-> output_plane.setStreamStatus (false);
before ctx.got_eos = true;

Dqbuffer can be successfully returned.
Can you tell me why? thanks

DaneLLL · April 24, 2017, 6:01am

Hi Li,
The capture plane buffers are owned by decoder after they are queued.

For example, for h264 stream with one reference frame, if you sends frame 1 and frame 2 to output plane, only decoded frame 1 will be sent to capture plane because decoded frame 2 are kept for decoding frame 3. Unless EOS is sent or decoding is stopped.

luyangijj9c · May 24, 2018, 8:53pm

Hi DaneLLL,

The question I posted in this link (https://devtalk.nvidia.com/default/topic/1035602/inconsistent-and-very-long-h-264-encoding-latency/#5261793) is very similar to this one. What I observed is that if I send frame 1 and frame 2 to the output plane of the encoder, only encoded frame 1 will be sent to capture plane. I can only get encoded frame 2 after frame 3 is sent to the output plane of the encoder.

So is there any trick that I can get the encoded frame 2 before frame 3 is sent to the output plane? Actually, there is no reason for the encoder to keep the encoded frame 2 as a reference, because the encoded frame is just some h264 bytes. The encoder should only keep the original frame 2 and release the encoded frame 2 immediately after it is encoded.

DaneLLL · May 25, 2018, 1:20am

Hi,
The request is an enhancement. We currently have no plan for this.

shiretu · July 22, 2018, 2:37am

Hi all,

@DaneLLL:
This thread drifted away from the core issue. According to V4l2 specs, when the device FD is opened in non-blocking mode, any and all ioctls made should return immediately, with or without real data, and the errno set on EAGAIN when the call could not be completed immediately. That includes calls like VIDIOC_DQBUF, VIDIOC_DQEVENT, etc.

I’ve opened another bug report, trying to remove all misleading facts. It is also super easy to reproduce. Here is the link:
https://devtalk.nvidia.com/default/topic/1037743/jetson-tx2/vidioc_dqbuf-blocks/

Best regards,
Andrei

DaneLLL · July 27, 2018, 6:52am

Hi Andrei, we suggest you use NvVideoEncoder.

shiretu · July 29, 2018, 3:21pm

Thank you DaneLLL.

I will post my updates there, not to clutter this thread.