NvJPEGEncoder::encodeFromXxx - How to calculate buffer size?

I had a mysterious crash in my software which I think I’ve finally tracked down to a sometimes too-small buffer being passed to NvJPEGEncoder::encodeFromBuffer or NvJPEGEncoder::encodeFromFd.

I was calculating the required upper-bound buffer size as:

unsigned long out_buf_size = (w * h * 3 / 2) * 2;

Where I got this calculating, I have no memory, but it’s clearly wrong for certain images. Also, it results in a crash if it’s too small.

What’s the correct way to calculate the maximum-possible buffer size required by NvJPEGEncoder::encodeFromXxx ?

Thanks!

Hi,
By default is is size of YUV420( widthheight1.5 )

unsigned long out_buf_size = ctx.in_width * ctx.in_height * 3 / 2;

It shall not exceed it after compression. Very unlikely to happen because you further double the buffer size. Is your input format YUV420?

Is your input format YUV420?

Yes.

It shall not exceed it <b>after</b> compression.

I’ll take your word for it, but I guess the real question is whether the output buffer needs to be larger than this during compression. (When the crash occurs.)

<b>Very unlikely</b> to happen because you further double the buffer size.

Your “very unlikely” phrasing makes me nervous. :) It is indeed rare, but as I say, for certain rare input images, I get a crash. In fact, I was only able to avoid the crash (for this rare cases) by changing the fudge-multiplier to 20! Hence my question about the correct safe size.

For completeness, here is my code that takes an RGB OpenCV mat and encodes it:

cv::Mat rgb;
auto w = rgb.cols;
auto h = rgb.rows;

// Convert to YUV
cv::Mat yuv;
cv::cvtColor(rgb, yuv,  cv::COLOR_RGB2YUV_I420);

// Convert to NvBuffer
NvBuffer nvbuf(V4L2_PIX_FMT_YUV420M, w, h, 0);
nvbuf.allocateMemory();
auto ret = read_video_frame((const char*)yuv.data, yuv.step[0]*yuv.rows, nvbuf);
if(ret < 0) throw runtime_error("read_video_frame error - " + to_string(ret));

// Allocate enough memory for resulting image
unsigned long out_buf_size = (w * h * 3 / 2) * 2; // ???????
vector<uchar> memjpg;
memjpg.resize(out_buf_size);
auto out_buf_ptr = memjpg.data();

// Encode JPEG
unique_ptr<NvJPEGEncoder> jpgEnc( NvJPEGEncoder::createJPEGEncoder("jpenenc") );
ret = jpgEnc->encodeFromBuffer(nvbuf, JCS_YCbCr, &out_buf_ptr, out_buf_size, quality);

I doubt the hardware encoder uses your buffer at all for “working storage” during compression.

If the size of the buffer times 20 changes the behavior of your program, I think you have a bug somewhere else (or the runtime library has a bug, that’s also possible.)
Your code, as written, won’t work right, because you don’t allocate or fill in the matrices.

Try running it with valgrind, or ASAN, to figure out where the problem might be.

I’m very suspicious of the double indirect of &out_buf_ptr though. This seems to indicate that the library will change the value of out_buf_ptr and out_buf_size.
The documentation says:

This is a terrible API, because it doesn’t specify whether the application has used operator new, the new operator, malloc(), or some other allocator, and you don’t know how the encoder will “allocate more memory.”
Thus, the only safe use of this API is to pass in “nullptr” for out_buf_ptr, and “0” for out_buf_size, and let the encoder do the allocation.
It sounds to me like this is the real problem, and if you get rid of your vector and pass in nullptr, it won’t crash.
You’re then left with the question of how to deallocate the memory it allocated – free()? delete? operator delete()? Who knows!

1 Like

Holy molly I’m not sure how I didn’t see that documentation before, but that’s insane to say the least. Thanks for pointing that out.

I’ll wait for someone from Nvidia to comment further and answer your questions, since those answers are required to use this API (and not in the docs as far as I can tell).

One response:

you don't allocate or fill in the matrices

You can assume I have a valid RGB mat at the start. :)

You know what happens when I assume? I make an ASS

out of U and ME.

Anyway, there is a way you can use this API in a reasonably well supported way.

Keep a global value that you use as the pointer and size. Initialize to null and 0. Always pass those into this API. This means that the driver will re-allocate the buffer as appropriate/needed, using whatever mechanism it’s using internally.

If you want to do parallel encodes, you’d need one global per thread, and/or some way of locking access to the globals.

Also, you’d live with this buffer in your heap for the lifetime of your program, but at least you won’t leak MANY buffers, and it will go away when the process exits.

Re assuming: I had a math teacher in high-school that used to say that. She also used to throw chalk at anyone who didn’t seem like they were paying attention. A wise woman before I could appreciate it. Nevertheless, I assure you that my data isn’t the issue. The problem is reproducible and only happens for certain (likely pathological) images and only if the quality argument is set very high.

In any case, thank you for your suggestion, but where are you getting this information? I guess jpeg_mem_dest() documentation? Are you sure the normal libjpeg stuff applies here? In particular, when you say:

[i]the driver will re-allocate the buffer as appropriate/needed[/i]

Where are you getting this? I can’t find reference to that anywhere…

Thanks much.

Hi,
Please check below items:
Is yuv.step[0]*yuv.rows equal to width x height x 1.5?

Not sure how vector works. Could you try the same code in 05_jpeg_encode:

unsigned char *out_buf = new unsigned char[out_buf_size];

What is quality value? Not see it being set in the code.

In the Jetson downloadable PDF documentation for the NVIDIA JPEG encoder library.

Turns out, it’s also available as a web page: L4T Multimedia API Reference: NvJPEGEncoder Class Reference

Sorry to keep pressing on this, but I can’t see how you drew your conclusions for the documentation. The docs just say:

“The application may allocate the memory for storing the JPEG image. If the allocation is less than what is required, libjpeg allocates more memory. The out_buf pointer and out_buf_size are updated accordingly.”

This is pretty vague, but even if you figured out by experimentation that passing a pointer set to NULL will cause the library to allocate, how did you know that passing that same pointer will cause the library to realloc (i.e. free the existing allocation if it needs to alloc more)? I’m not doubting that you are right, I just don’t see how you came to those conclusions…

Also want to make sure that this is truly behavior supported by Nvidia so that it doesn’t break after some library update…

NULL is a valid allocation result when trying to allocate 0 bytes. (At least if by “allocation” you follow malloc()/realloc() rules.)
Thus, given that this is a C/C++ level API, the only reasonable interpretation of the documentation is that if the size is too small, the library will somehow re-allocate the provided pointer, copy whatever data it already wrote, and keep going.

How can the library do this?

realloc(NULL, size) is a valid C/C++ strategy.
delete NULL and delete NULL are also valid C/C++ strategies.
memcpy(dst, NULL, 0) is a valid C/C++ operation.
Thus, passing in size 0 and pointer NULL will force the library to allocate a new buffer, and update the pointer/size.

Once you have a pointer/size allocated by the driver, you can hang on to it, and whatever the driver does (realloc(), or memcpy()/delete/new, or whatever) should keep being correct.

I have no idea what usage the NVIDIA will ACTUALLY support. The interface, as designed, is clearly designed and documented by people who aren’t particularly experienced or skilled in interface design and publication, or perhaps people who were given 45 minutes to finish the job from scratch, so it may very well be that there’s no safe way to use this API at all. But if we assume that their developers have basic C/C++ standard library understanding, ideally from a Linux/glibc environment, then the above reasoning should hold.

But now who’s ASSuming?

Thanks snarky. Very much appreciate all of the responses.

Hi logidelic,
Please check suggestion in #8. If the issue is still present, please share steps to reproduce it in running 05_jpeg_encode.

…deleted as I spoke too soon… will re-post this with more info shortly.

Hi Dane. I went a little crazy trying to repro this in 05_jpeg_encode and finally figured out where the actual issue was:

ONE of the problems (not the only one, but the one I’m addressing here) is that it is unsafe to use a single instance of NvJPEGEncoder multiple times. This is not a thread-safety issue. Check out the attached simple repro: A slightly modified version of 05_jpeg_encode. You run it with the following command-line:

./jpeg_encode img_small.yuv 100 66 ./img_small_output.jpg -quality 1 --encode-buffer

Here’s what it does:

  • Instantiate NvJPEGEncoder
  • Use that instance to encode img_small.yuv. Fine.
  • Then use that same instance to encode a large image at quality level 100.

Result: A message of “Tegra Acceleration failed” and a program crash:

file - Begin
out_buf_size avail:9900
out_buf_size used:769
file - End
OTHER file - Begin
OTHER out_buf_size avail:5529600
Tegra Acceleration failed

You may suggest never using the same instance to encode more than one image. This is a fine suggestion but I have experienced a (less easily reproducible) crash that way as well! I’ll try to get a good repro on the other soon, but any comment on the above would be appreciated.
05_jpeg_encode.tar.gz (2.79 MB)

Hi logidelic,
Dynamic resolution change is not supported. You need to reset cinfo once the resolution changes:

{
    jpeg_destroy_compress(&cinfo);

    memset(&cinfo, 0, sizeof(cinfo));
    memset(&jerr, 0, sizeof(jerr));
    cinfo.err = jpeg_std_error(&jerr);

    jpeg_create_compress(&cinfo);
    jpeg_suppress_tables(&cinfo, TRUE);
}

Hi Dane,

I appreciate the response and will try your suggestion.

However, saying that “dynamic resolution change is not supported,” is not accurate. If it wasn’t supported then the API would throw an exception or return an error. If it was not designed to be used that way, then the NvJPEGEncoder constructor would accept a width and a height argument, rather than it being arguments to the encode call.

It works sometimes (most of the time) so most likely there are many many programs out there that are crashing randomly without the programmer knowing that this is a cause.

This is a bad bug, not a lack of support for some “dynamic” functionality.

Also, if I follow your suggestion (or, equivalently, just don’t re-use NvJPEGEncoder instances; I’ve tried both), I eventually get a crash somewhere near here (I say somewhere near here because it differs a bit from try to try):

Thread 29 "prog" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f6a7fbe10 (LWP 19062)]
__GI___pthread_mutex_lock (mutex=0x95952645ed4e51b2) at pthread_mutex_lock.c:65
65      pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  0x0000007fb10666f0 in __GI___pthread_mutex_lock (mutex=0x95952645ed4e51b2) at pthread_mutex_lock.c:65
#1  0x0000007f0f0aa760 in NvOsMutexLock () at /usr/lib/aarch64-linux-gnu/tegra/libnvos.so
#2  0x0000007f0a6b0660 in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so
#3  0x0000007f0a6b114c in  () at /usr/lib/aarch64-linux-gnu/tegra/libnvtvmr.so
#4  0x0000007fa80819c0 in jpegTegraEncoderGetBits () at /usr/lib/aarch64-linux-gnu/tegra/libnvjpeg.so
#5  0x0000007fa804ab44 in jpeg_finish_compress () at /usr/lib/aarch64-linux-gnu/tegra/libnvjpeg.so
#6  0x0000007fa03a97d4 in NvJPEGEncoder::encodeFromBuffer(NvBuffer&, J_COLOR_SPACE, unsigned char**, unsigned long&, int) ()

Any ideas? I have reason to believe that it has something to do with what some 3rd party (non-tegra-related) library happens to be doing at that moment, but I don’t see why…

Stupid question: If some other library is using the regular libjpeg stuff, doesn’t it touch the tegra-specific libnvjpeg.so or are the two totally separate beasts?

Thanks again.

To be fair, NVIDIA has the option to say “calling the encode function with a resolution different from the one used last time results in undefined behavior.” If they’re really nice, they’ll even put that in the documentation. (It’s not in the current documentation.) That would make it “unsupported” with zero changes to the actual code.

That being said, this NVIDIA JPEG encoder library looks like a train wreck of bad API design. I’d just stay away. JPEG encoding is simple enough I’m not sure what you get out of hardware acceleration, especially when it comes with problems as harsh as these?

I haven’t profiled things on the Xavier, but on the TX2 it made a huge difference. In my case I’m doing ridiculous amounts of JPG encoding (for reasons many), so I think the hardware acceleration is required in my particular use-case.

FWIW, it works fine for me if i reuse the same instance (without doing the destroy suggested by Dane), so long as the original buffer size I specify in the first (and subsequent calls) is large enough.