NVMAP_IOC_WRITE failed: Interrupted system call

kko-smol · October 11, 2018, 1:17pm

Hello!
We use Jetson TX2 for capture and preprocess 6 cameras and send frames over network.
Now I try to run JPEG or H264 encoding and faced with problem:
when I run 6 cameras, I got messages in console:
“NVMAP_IOC_WRITE failed: Interrupted system call”
When this messages appears - image come broken(In JPEG - part of data at end lost. In H264 - image got artifacts until next I-frame)
When I run single camera - all ok.
Looks like some goes wrong under high load.
I attached htop screenshot when 6 application instances(per camera) run.
Each instance capture frames via V4L, do cuda processing, send image over network, convert colors to YUV420, encode to H264 or JPEG and send encoded image over network,
I found place in kernel sources, where error occurs(rw_handle in nvmap_ioctl.c), but dont know how fix it.

DaneLLL · October 12, 2018, 1:59am

Hi kko-smol,
Do you use r28.2.1?
Do you run 6 Bayer sensors? Or YUV sensors? Or USB cameras?
Do you use MMAPIs or gstreamer pipelines?

kko-smol · October 12, 2018, 6:12am

r28.2 (as I see 28.2.1 and 28.2 have same kernel)
YUV sensors connected via CSI.
I use your wrappers from tegra_multimedia_samples for MMAPI
Captute(USERPTR) to buffer from cudaMallocManaged for sharing with cuda
JPEG Encode: Convert from userptr(capture buffer) to MMapped buffer, then jpeg encodeFromFd
H264 Encode: convert from userptr(capture buffer) to userptr(of buffer from NvBufferCreateEx) and encode from userptr(NvBufferCreateEx) to mmapped buffer

DaneLLL · October 12, 2018, 7:16am

Hi kko-smol,
Please share how you connect the 6 cameras to CSI ports(A B C D E F port).
Are you able to run 6 cameras simultaneously via ‘v4l2-ctl’ commands?
Also can you try and check if you can run 6 cameras via 12_camera_v4l2_cuda sample? For using HW encoders, we suggest allocate NvBuffers.

kko-smol · October 12, 2018, 10:45am

At each port connected same cameras via 2 CSI-lanes. All ports are used.

Yes, we able run all 6 cameras simultaneously. When we capture images and send it over network - all works fine.
When I added cuda processing - it works too.

12_camera_v4l2_cuda sample cannot run: we have no connected display and work over ssh

root@jetson:~/tegra_multimedia_api/samples/12_camera_v4l2_cuda# ./camera_v4l2_cuda -d /dev/video0 -s 1280x1080 -f UYVY -c -v
INFO: camera_initialize(): (line:249) Camera ouput format: (1280 x 1080)  stride: 2560, imagesize: 2764800
No protocol specified
[ERROR] (NvEglRenderer.cpp:97) <renderer0> Error in opening display
[ERROR] (NvEglRenderer.cpp:152) <renderer0> Got ERROR closing display
ERROR: display_initialize(): (line:261) Failed to create EGL renderer
ERROR: init_components(): (line:286) Failed to initialize display
ERROR: main(): (line:530) Failed to initialize v4l2 components
nvbuf_utils: dmabuf_fd 0 mapped entry NOT found
nvbuf_utils: Can not get HW buffer from FD... Exiting...
App run failed

kko-smol · October 15, 2018, 9:34pm

I made some research about problem.
I found, that frame are copied inside driver when i use USERPTR. And sometimes copy_from_user cannot copy data and return not 0.

I changed memory-type of converter output_plane to MMAP and try copy frame in my application.
Like in previous situation, I sometimes get error if CPU heavy loaded. But now application failed with SIGSEGV, at “memcpy () at …/sysdeps/aarch64/memcpy.S:157” when it copy frame from camera buffer to MMAP-allocated NvBuffer from NvVideoConverter. Something strange occurs with buffer or mmu subsystem. 6 frames before this, copying done successful, but now copy from same address to same address failed

DaneLLL · October 16, 2018, 2:04am

Hi kko-smol,
Here is a sample demonstrating V4L2 camera → NvBuffer(fd) → VIC → NvVideoEncoder
[url]tegra_multimedia_API:dq buffer from encoder output_plane can not completed - Jetson TX2 - NVIDIA Developer Forums

If you don’t need NVEglRenderer, please remove it from the sample.

kko-smol · October 17, 2018, 12:22pm

Yes, this samples work when I run 6 cameras. But this sample not use cuda processing and not send data over network. i.e. not makes full system load.

Now I rewrote my app based on this sample:
capture to dmabuf, create EglImage and CUeglFrame for each capture buffer. Cuda result buffer allocated with cudaMallocManaged.

Now pipeline:

cam->dmabuf->cuda(cuEglFrame) -|->(cuda result) -> network
                               |->(source, captured dmabuf)->NvBufferTransform->NvVideoEncoder->Network

if I run one camera - it works fine(i.e. no errors while test)
when I run 6 cameras - after some seconds (10-20) of normal work, i see errors in some instances. Now from cuda:

cuCtxSynchronize Error: driver shutting down

cuCtxSynchronize call placed before and after call cuda kernel

If run 4 cameras - they work 5-10 minutes and one of instance gets same error.

Maybe is important: we use 10Gbit PCI-E card and send ~1Gbit/s per camera. It make significant load in kernel-time. Are jetson drivers stable in this conditions?

DaneLLL · October 18, 2018, 2:30am

Hi kko,
We have CUDA post-processing in 12_camera_v4l2_cuda:

static bool
cuda_postprocess(context_t *ctx, int fd)
{
    if (ctx->enable_cuda)
    {
        // Create EGLImage from dmabuf fd
        ctx->egl_image = NvEGLImageFromFd(ctx->egl_display, fd);
        if (ctx->egl_image == NULL)
            ERROR_RETURN("Failed to map dmabuf fd (0x%X) to EGLImage",
                    ctx->render_dmabuf_fd);

        // Running algo process with EGLImage via GPU multi cores
        HandleEGLImage(&ctx->egl_image);

        // Destroy EGLImage
        NvDestroyEGLImage(ctx->egl_display, ctx->egl_image);
        ctx->egl_image = NULL;
    }

    return true;
}

So are yo able to reproduce the issue if you run six cameras in six 12_camera_v4l2_cuda processes? Launch one camera in one process and run simultaneous 6 processes.