Deepstream crashes when rtsp fails

neophyte1 · January 31, 2020, 10:11am

So I am trying to infer on rtsp streams and amusing the deepstream reference app for this purpose. I am able to successfully infer on 2 rtsp streams. The issue is when the rtsp streams stop streaming for some time, the app crashes. How to stop this from happening i.e. even if the rtsp streams stop streaming for sometime the app continues to run and wait for the rtsp streams to start streaming again. Another issue is that if one of the rtsp streams stop streaming and the other is running and after sometime the first rtsp stream starts streaming again, even then the app doesnt function correctly.

DaneLLL · February 3, 2020, 5:20am

Hi,
There is a reference sample of dynamically adding/deleting sources.

Another user has also shared guidance:
https://devtalk.nvidia.com/default/topic/1064141/deepstream-sdk/adding-and-removing-streams-during-runtime/post/5400986/#5400986

Please take a look at the sample and check whether you can apply it to your usecase.

neophyte1 · February 4, 2020, 6:39am

HI,
I was taking a look at this link
https://devtalk.nvidia.com/default/topic/1065748/deepstream-sdk/automatically-restart-streams-in-deepstream-test5-app/?offset=3#5425165
This piece of code is already there in my deepstream-app.c file, but still I am facing a similar issue. Any leads?

neophyte1 · February 5, 2020, 4:11am

Also, if one of the rtsp streams fails. I get this error log -

WARNING from src_elem1: Could not read from resource.
Debug info: gstrtspsrc.c(5293): gst_rtspsrc_loop_udp (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Unhandled return value -7.
ERROR from src_elem1: Could not read from resource.
Debug info: gstrtspsrc.c(5361): gst_rtspsrc_loop_udp (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Could not receive message. (System error)
ERROR from src_elem1: Internal data stream error.
Debug info: gstrtspsrc.c(5653): gst_rtspsrc_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
streaming stopped, reason error (-5)
Reset source pipeline reset_source_pipeline 0x7f6a6341b0
,ERROR from src_elem1: Could not write to resource.
Debug info: gstrtspsrc.c(5997): gst_rtspsrc_try_send (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Could not send message. (System error)
ERROR from src_elem1: Could not write to resource.
Debug info: gstrtspsrc.c(8244): gst_rtspsrc_pause (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Could not send message. (System error)
WARNING from src_elem1: Could not read from resource.
Debug info: gstrtspsrc.c(5280): gst_rtspsrc_loop_udp (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
The server closed the connection.
ERROR from src_elem1: Could not open resource for reading and writing.
Debug info: gstrtspsrc.c(5348): gst_rtspsrc_loop_udp (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Could not connect to server. (Generic error)
ERROR from src_elem1: Internal data stream error.
Debug info: gstrtspsrc.c(5653): gst_rtspsrc_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
streaming stopped, reason error (-5)
ERROR from src_elem1: Could not open resource for reading and writing.
Debug info: gstrtspsrc.c(7469): gst_rtspsrc_retrieve_sdp (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstRTSPSrc:src_elem1:
Failed to connect. (Generic error)

And after sometime , even when the rtsp has reconnected I get this -

NvMapMemCacheMaint:1075334668 failed [14]

Lastly, when all the rtsp fails for some time, the app exits.

How to handle these cases ?

DaneLLL · February 5, 2020, 6:08am

Hi,
Please share information about your RTSP source(brand and model ID of the IP camera) for reference. It looks to be an issue that the source is not stable and stops streaming randomly. If we have the device, we can try to reproduce the issue.

neophyte1 · February 5, 2020, 6:13am

Yes, I wish to handle these case i.e when the source is not stable and stops streaming. What’s the best way to do that via c++ code ?

neophyte1 · February 5, 2020, 6:22am

So, I wanted to simulate the case when the rtsp is unable to stream due to any reason - camera restarted, network failure etc. For example, when I reboot any one of my rtsp cameras while the app is still running, I get the above mentioned error but the main functiond doesn’t return i.e the app keeps on running. Also, when I stop all of my cameras from streaming, the app returns after going to the error state. I wish to handle these failures gracefully. What would be the best way to deal with that ?

DaneLLL · February 5, 2020, 7:53am

Hi,
For giving further suggestion, we will need to reproduce the issue first. We have test cases of using RTSP sources and the cases are verified in each DeepStream SDK. We don’t hit/experience this error, so are not be able to give proper suggestion as of now. Would be great if you can share information about your RTSP source(brand and model ID of the IP camera). We can check if we have the device in core teams. If the information is confidential to you, please share a way to reproduce the issue. A method to reproduce it without the device. Once we can reproduce it, we can debug/discuss with teams.

neophyte1 · February 5, 2020, 8:19am

Brand - HIKVISION
model - DS-2CD202WF-I

neophyte1 · February 5, 2020, 10:56am

While debugging I found that the bus_callback was going to EOS while the connection timeout handling
code is written just for ERROR. How to do the same for EOS ?

neophyte1 · February 5, 2020, 11:56am

Also, after it goes to the error state and tries to reconnect for a few times, if unsuccessful, it reaches EOS. How to handle the reconnection logic or restart the whole pipeline successfully, when it reaches EOS.

AppCtx *appCtx = (AppCtx *) data;
  GST_CAT_DEBUG (NVDS_APP,
      "Received message on bus: source %s, msg_type %s",
      GST_MESSAGE_SRC_NAME (message), GST_MESSAGE_TYPE_NAME (message));
  switch (GST_MESSAGE_TYPE (message)) {
    case GST_MESSAGE_INFO:{
      GError *error = NULL;
      gchar *debuginfo = NULL;
      gst_message_parse_info (message, &error, &debuginfo);
      g_printerr ("INFO from %s: %s\n",
          GST_OBJECT_NAME (message->src), error->message);
      if (debuginfo) {
        g_printerr ("Debug info: %s\n", debuginfo);
      }
      g_error_free (error);
      g_free (debuginfo);
      break;
    }
    case GST_MESSAGE_WARNING:{
      GError *error = NULL;
      gchar *debuginfo = NULL;
      gst_message_parse_warning (message, &error, &debuginfo);
      g_printerr ("WARNING from %s: %s\n",
          GST_OBJECT_NAME (message->src), error->message);
      if (debuginfo) {
        g_printerr ("Debug info: %s\n", debuginfo);
      }
      g_error_free (error);
      g_free (debuginfo);
      break;
    }
    case GST_MESSAGE_ERROR:{
      GError *error = NULL;
      gchar *debuginfo = NULL;
      guint i = 0;
      gst_message_parse_error (message, &error, &debuginfo);
      g_printerr ("ERROR from %s: %s\n",
          GST_OBJECT_NAME (message->src), error->message);
      if (debuginfo) {
        g_printerr ("Debug info: %s\n", debuginfo);
      }

      NvDsSrcParentBin *bin = &appCtx->pipeline.multi_src_bin;
      for (i = 0; i < bin->num_bins; i++) {
        if (bin->sub_bins[i].src_elem == (GstElement *) GST_MESSAGE_SRC (message))
          break;
      }

      if ((i != bin->num_bins) &&
          (appCtx->config.multi_source_config[0].type == NV_DS_SOURCE_RTSP)) {
        // Error from one of RTSP source.
        NvDsSrcBin *subBin = &bin->sub_bins[i];

        if (!subBin->reconfiguring ||
            g_strrstr(debuginfo, "500 (Internal Server Error)")) {
          if (!subBin->reconfiguring) {
            // Check status of stream at regular interval.
            g_timeout_add (SOURCE_RESET_INTERVAL_IN_MS,
                           watch_source_status, subBin);
          }
          // Reconfigure the stream.
          subBin->reconfiguring = TRUE;
          g_timeout_add (20, reset_source_pipeline, subBin);
        }
        g_error_free (error);
        g_free (debuginfo);
        return TRUE;
      }

      g_error_free (error);
      g_free (debuginfo);
      appCtx->return_value = -1;
      appCtx->quit = TRUE;
      break;
    }
    case GST_MESSAGE_STATE_CHANGED:{
      GstState oldstate, newstate;
      gst_message_parse_state_changed (message, &oldstate, &newstate, NULL);
      if (GST_ELEMENT (GST_MESSAGE_SRC (message)) == appCtx->pipeline.pipeline) {
        switch (newstate) {
          case GST_STATE_PLAYING:
            NVGSTDS_INFO_MSG_V ("Pipeline running\n");
            GST_DEBUG_BIN_TO_DOT_FILE_WITH_TS (GST_BIN (appCtx->
                    pipeline.pipeline), GST_DEBUG_GRAPH_SHOW_ALL,
                "ds-app-playing");
            break;
          case GST_STATE_PAUSED:
            if (oldstate == GST_STATE_PLAYING) {
              NVGSTDS_INFO_MSG_V ("Pipeline paused\n");
            }
            break;
          case GST_STATE_READY:
            GST_DEBUG_BIN_TO_DOT_FILE_WITH_TS (GST_BIN (appCtx->pipeline.
                    pipeline), GST_DEBUG_GRAPH_SHOW_ALL, "ds-app-ready");
            if (oldstate == GST_STATE_NULL) {
              NVGSTDS_INFO_MSG_V ("Pipeline ready\n");
            } else {
              NVGSTDS_INFO_MSG_V ("Pipeline stopped\n");
            }
            break;
          case GST_STATE_NULL:
            g_mutex_lock (&appCtx->app_lock);
            g_cond_broadcast (&appCtx->app_cond);
            g_mutex_unlock (&appCtx->app_lock);
            break;
          default:
            break;
        }
      }
      break;
    }
    case GST_MESSAGE_EOS:{
      /*
       * In normal scenario, this would use g_main_loop_quit() to exit the
       * loop and release the resources. Since this application might be
       * running multiple pipelines through configuration files, it should wait
       * till all pipelines are done.
       */

neophyte1 · February 5, 2020, 7:02pm

How to restart the whole pipeline and reconnect the streams after the app reaches EOS state ?

DaneLLL · February 6, 2020, 1:17am

Hi,
For reproducing it, we would need your config file for running deepstream-app. Please kindly share it for reference.

neophyte1 · February 6, 2020, 4:28am

# Copyright (c) 2019 NVIDIA Corporation.  All rights reserved.
#
# NVIDIA Corporation and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA Corporation is strictly prohibited.

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=4
columns=2
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=4
uri= RTSP_URI
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0
#source-id=0

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=4
uri=rtsp://admin:edge1234@192.168.0.201:554/Streaming/Channels/1
#uri=file://../../streams/sample_1080p_h264.mp4
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0
#source-id=0


[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=1
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1


[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=1
batch-size=2
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=../models/Primary_Detector_Nano/resnet10.caffemodel_b2_fp32.engine
batch-size=2
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=4
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_nano.txt

[tracker]
enable=1
tracker-width=480
tracker-height=270
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-config-file required for IOU only
#ll-config-file=iou_config.txt
gpu-id=0


[ds-example]
enable=1
processing-width=1280
processing-height=720
full-frame=1
unique-id=15
gpu-id=0


[tests]
file-loop=0

I would still like to know (in general) how to restart the whole pipeline :-

when the bus gets EOS error message.
when the pipeline exits and goes to the done: state present the deepstream_app_main.c file which comes with the default deepstream reference app source code.

I have been trying a couple of things but have been largely unsuccessful.

neophyte1 · February 6, 2020, 7:12am

For reproducing the issue, I would suggest to use the config file and deepstream reference app. Then disconnect the network for some 30 secs/1 min. At first the bus sends ERROR msg and the app tries to reconnect to the rtsp stream, but after a few retries - the bus sends EOS msg and the app exits.

neophyte1 · February 7, 2020, 6:57am

Any update?

DaneLLL · February 7, 2020, 8:49am

Hi,
We are setting up the environment to reproduce the issue. Will update.

neophyte1 · February 10, 2020, 9:02am

Sure

miguel.taylor · February 11, 2020, 1:19am

Hi,

We provide an open-source element called GstInterpipe that may help you with your current application:

https://developer.ridgerun.com/wiki/index.php?title=GstInterpipe

This element basically decouples two GStreamer pipelines so that if one of them stops/fails the other one continues. We have successfully used it in conjunction with GstD (another of our open-source projects) for DeepStream pipeline recovery with RTSP and WebRTC streams.

We also offer engineering support hours if you need help with your application.

neophyte1 · February 11, 2020, 3:49am

The issue is that if a stream fails for sometime, and then comes back later - say after 2 mins - deepstream app is unable to reconnect with the stream . Another issue is that when all the streams fail,for a couple of seconds we get ERROR msg on the bus and the deepstream app tries to reconnect, but after a few secs, EOS message is thrown on the bus and the deepstream app quits itself. I have tried to increase this waiting period or have tried to restart the pipeline again after a few secs whenever the app goes to the ‘done’ state, but have not met with success.