NvVideoEncoder repeated start-stop causes crash and memory leak

Hi,

Our product is based on Jetson TX2 running L4T 28.1. We mainly use the video converter and encoder to convert and encode 8 video channels received on 4 MIPI inputs through V4L2. The product works in a way that after boot the user will start-stop the encoding several times. During testing we found that after repeating start-stop multiple times our system crashes.
I created a slightly modified version of the Multimedia API sample project 01_video_encode, which can be found here:
http://home.mit.bme.hu/~szanto/tegra/01_video_encode_mod.zip
In a forever loop it repeatedly:

  • starts the encoding
  • encodes 250 frames
  • stops the encoding

There are two ways of operation (can be changed with USE_STATIC_ENC define): static encoder allocation and dynamic allocation. The difference is that when static allocation is used, the encoder is created only once; for non-static allocation the encoder is created and deleted in every cycle. The reason behind this is that earlier I found that delete(encoder) function sometimes hangs, and this seemed to be safer.

Static allocation:

Non-static allocation:

Memory leakage is much more visible in our product:

After boot:
RAM 182/7340MB (lfb 1779x4MB) cpu [0%@499,off,off,0%@500,0%@500,0%@500] EMC 0%@665 APE 150 GR3D 0%@114

During the first 8-channel encoding:
RAM 2410/7340MB (lfb 1222x4MB) cpu [64%@1573,off,off,55%@1574,53%@1574,54%@1574] EMC 13%@1600 APE 150 MSENC 1113 GR3D 18%@114
Encoding stopped:
RAM 1589/7340MB (lfb 1329x4MB) cpu [2%@652,off,off,0%@652,7%@653,1%@652] EMC 2%@665 APE 150 GR3D 0%@114

8-channel encoding running:
RAM 2871/7340MB (lfb 1105x4MB) cpu [72%@1504,off,off,50%@1504,54%@1508,59%@1500] EMC 30%@1600 APE 150 MSENC 1113 GR3D 42%@114
Encoding stopped:
RAM 2016/7340MB (lfb 1188x4MB) cpu [1%@499,off,off,0%@499,19%@499,13%@498] EMC 4%@665 APE 150 GR3D 0%@114

8-channel encoding running:
RAM 3292/7340MB (lfb 1001x4MB) cpu [67%@1559,off,off,54%@1559,52%@1559,70%@1552] EMC 25%@1600 APE 150 MSENC 1113 GR3D 22%@114
Encoding stopped:
RAM 2444/7340MB (lfb 1060x4MB) cpu [3%@1113,off,off,9%@1113,19%@1113,16%@1113] EMC 5%@665 APE 150 GR3D 0%@114

Note:
The issue may or may not have something in common with https://devtalk.nvidia.com/default/topic/1035330/jetson-tx2/-mmapi-r28-2-r28-1-deinitplane-of-nvvideoencoder-memory-leak-/1, but I was asked to start a new thread.

Help is appreciated.

Hi,
We suggest you try r28.2.1.
If you are close to mass production and cannot upgrade, please contact NVIDIA salesperson and let us know the status.

Hi Tessier,
Please check if the attachment in below link helps your case.
[url]https://devtalk.nvidia.com/default/topic/1045023/jetson-tx2/blocking-when-release-encoder-/post/5304198/#5304198[/url]

Hi DaneLLL,

I tried the slightly modified 01_video_encode sample (http://home.mit.bme.hu/~szanto/tegra/01_video_encode_mod.zip) on a Jetson using 28.2.1.
The only modification is that it stops and starts encoding in an endless loop.
Unfortunately it crashes with

NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 456, SyncPointValue = 0)

Logs:

Thanks for checking.

Hi Tessier,
For r28.2.1, please use prebuilt lib at
https://elinux.org/Jetson_TX2/28.2.1_patches
[MMAPI]Cannot run NvVideoDecoder in loop/Memory leak in NvVideoEncoder

Hi DaneLLL,

I tried the linked lib on the Jetson, but it does not solve the problem.

When I create encoder only once, I get the following error after 215 restarts:

NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 456, SyncPointValue = 0)

Log: http://home.mit.bme.hu/~szanto/tegra/01_static_log_2.txt

If I create and delete the encoder in every iteration, after 340 restarts I get:

NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 24, SyncPointValue = 0)

Log: http://home.mit.bme.hu/~szanto/tegra/01_dynamic_log_2.txt
In this case, according to tegrastats, memory usage does not seem to increase for the first ~90 restarts, but afterwards it starts to grow rapidly.

Could you please try the linked code to see if the issue is reproducable on your side?
./video_encode bunny_1280_UYVY_420.bin 1280 720 H265 encoded.h265

Thanks.

Hi Tessier,
we have verified it with below patch applied:

diff --git a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
index 96fb05d..67b40a1 100644
--- a/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
+++ b/multimedia_api/ll_samples/samples/00_video_decode/video_decode_main.cpp
@@ -977,6 +977,8 @@ set_defaults(context_t * ctx)
 int
 main(int argc, char *argv[])
 {
+static int run_cnt = 0;
+nextRun:
     context_t ctx;
     int ret = 0;
     int error = 0;
@@ -985,7 +987,7 @@ main(int argc, char *argv[])
     bool eos = false;
     char *nalu_parse_buffer = NULL;
     NvApplicationProfiler &profiler = NvApplicationProfiler::getProfilerInstance();
-
+printf("run cnt %d \n", run_cnt++);
     set_defaults(&ctx);
 
     if (parse_csv_args(&ctx, argc, argv))
@@ -1396,6 +1398,7 @@ cleanup:
     else
     {
         cout << "App run was successful" << endl;
+goto nextRun;
     }
     return -error;
 }

Please give it a try.

Hi DaneLLL,

I use the encoder, not the decoder, so I do not see how it would help me to try the decoder sample.

In the meantime I modified the encoder sample to use V4L2_MEMORY_DMABUF on the encoder output plane and:

  • Allocate the buffers using NvBufferCreateEx
  • Map the alloctaed buffers to CPU using NvBufferMemMap
  • At the end of each run, NvBufferMemUnMap and NvBufferDestroy for each allocated buffer.

Though my input file read is somewhat broken, I see no memory usage increase when the encoder is allocated and destroyed in each run.
When I statically allocate the encoder, memory usage increases slowly.

Hi DaneLLL,

I just realized that the 01_video_encode sample was changed in version 28.2.1 (my test code is based on 28.1), e.g. it already supports V4L2_MEMORY_DMABUF on the output plane.
I will check if there is a difference using the new sample.

Hi DaneLLL,

Unsurprisingly the situation is the same with the 01_video_encode found in 28.2.1:

  • If using V4L2_MEMORY_MMAP, after ~90 restarts memory usage increases rapidly.
  • V4L2_MEMORY_DMABUF seems to be free of the issue. At least, if the encoder is deleted in every cycle.

The main reason I use static encoder allocation is that earlier there was an issue with reallocation: as far as I remember, when another encoder instance was running, allocating and starting a new instance failed. I will check if this was fixed in 28.2.1 (or do you have any info?) - if yes, then using V4L2_MEMORY_DMABUF with allocating/deleting the encoder seems to be a good workaround for the issue.

Hi Tessier,
If using V4L2_MEMORY_DMABUF is good for you. Please use it.

We shall check and fix V4L2_MEMORY_MMAP case in future release.

Hi DaneLLL,

Well, theoretically it should be ok, but…
In the real use-case there is a video converter before the encoder. Initially the capture plane of the VIC was V4L2_MEMORY_MMAP and the output plane of the encoder was V4L2_MEMORY_DMABUF, so as I understand buffers are allocated by the VIC capture plane.

If I use V4L2_MEMORY_DMABUF on both mentioned planes, buffers should be allocated with NvBufferCreate(). As I see, if there is no processing between the VIC and the encoder (there will be, but ignore that for the moment), all I supposed to do is:

  • Setup both planes similarly: setupPlane(V4L2_MEMORY_DMABUF, BUFFER_NUM, false, false)
  • Create buffers with NvBufferCreate(), which returns with a single dmabuf_fd.
  • Set the fd of all planes of v4l2_buffer (v4l2_buf.m.planes[i].m.fd) to the dmabuf_fd.
  • Queue the v4l2_buffer to the VIC capture plane.
    The remaining part of the SW should be the same as in the case of V4L2_MEMORY_MMAP/V4L2_MEMORY_DMABUF.

It seems I am missing something, as I get
nvbuf_utils: dmabuf_fd -1739128832 mapped entry NOT found
error.

I know that there is a sample with VIC → encoder, but that sample uses NvBufferTransform(). As I have to also fix the memory leak issue in L4T 28.1, I cannot use that; and this is why I use NvBufferCreate instead of NvBufferCreateEx.

Do you have an idea what am I missing?
Thanks.

Hi Tessier,
For r28.1, have you tried the link in comment #3?

Hi DaneLLL,

I tested the attached libtegrav4l2.so on our real system running r28.1 and it seems it does not solve the issue. Stop/restart VIC+encoder increases memory usage and after some cycles I get similar error to comment #6.

Hi Tessier,
We have verified it with the patch to 01_video_encode of r28.1:
https://devtalk.nvidia.com/default/topic/1045023/jetson-tx2/blocking-when-release-encoder-/post/5302265/#5302265
Could you try this?

Besides, could you share mode info about your project status? Is it close to production? Can you do migration to new release?

Hi DaneLLL,

Hmmm, strange. I will reflash 28.1 on my Jetson and give looped 01_video_encode a try with the linked library later.

We are close to production, and we already sent out some test systems. Anyway, we decided to migrate to 28.2.1 because stability is critical for our customers and I am almost sure that we can fix the memory leak issue on 28.2.1.
Beyond that, we have one more major issue which could be on the Tegra side.