Ethernet Queue Building Up

greg2 · September 15, 2017, 1:48am

Hi, I’m working on a project on the TX2 which streams video over RTMP. On the TX2 is L4T R27.1.0. The stream is produced and sent with a bitrate of 35Mb/s. In case it matters, default TX2 power model 3 is in use.

The streaming is done with gstreamer’s RTMP sink plugin (which uses librtmp, as I understand it) and pushed over the network to an RTMP server on same LAN (no external network used).

Problem:
When ethernet is used from dev kit slow-downs are seen in sending the data. Specifically in the gstreamer pipeline there is a queue directly before the rtmpsink which is filling up. The queue fills up faster than it’s emptied for an arbitrary amount of time, then suddenly empties or jumps down in size in a very short amount of time. This indicates that librtmp is not able to constantly push the data to through the socket at 35Mb/s - something in the socket is getting held up.

First step of analysis would be to look at software generating the video, librtmp, etc. (non-system software). All seems to be fine and the main problem is that if I use the dev kit WIFI instead of ethernet - this problem completely goes away. No queue build-up, no delay, no problems. This appears to be an ethernet-only problem. Also switched routers, problem persists.

We also have a custom board which utilizes the TX2, and has an ethernet port on it. Results with this are exactly the same as the dev kit, WIFI is smooth, ethernet is backing up. Here is a graph of the data:

Red = custom board ethernet
Orange = dev kit ethernet
Blue = custom board WIFI
Green = dev kit WIFI
X-Axis = time (seconds)
Y-Axis = number of buffers in queue (30 FPS, 30 buffers=1 second latency)

I’d imagine this points to something with the ethernet driver, or some ethernet queue in the TCP stack that does not apply to WIFI, or something of that nature, though I am not sure what. If anybody has any advice I would greatly appreciate it.

Thanks,
Greg

linuxdev · September 15, 2017, 2:03am

Under “ifconfig” and “iwconfig” can you see an MTU? I’m wondering first if this differs. Second, I’m not sure how the processes are arranged, but perhaps the consuming process could have its priority increased (reniced to -1 or -2)…just to see if this would smooth out the consumer some before the queue gets as large.

WayneWWW · September 15, 2017, 2:15am

Hi Greg,

Thanks for analysis. It looks like a problem. Could you help do following

Since the latest BSP is rel-28.1, could you move there and try it again?
Please raise the performance to maximum firstly to see if any enhancement.

sudo nvpmodel -m 0
sudo ./jetson_clocks.sh

Could you share your sample for me to reproduce on devkit? Can this error be reproduced by using rtsp application?

greg2 · September 15, 2017, 9:55pm

The MTU is set to 1500 it seems for both eth0 and wlan0 in ifconfig.
I don’t think it’s possible that the consumer has the issue - the consumer is connected via ethernet to the router in either case, and nothing on that system changes at all. Only the Dev Kit is what is switched between WIFI and ethernet between the tests to see these different results.

Yeah, I can try installing R28.1 and see if that helps. Perhaps there’s a fix in there or something.
Just tried model 0, it is indeed a little better, but the problem still persists. Over the course of an hour there were ~5 spikes of large 15 second delays and the overall chart looks quite similar to the ethernet graphs I posted yesterday, with the queue building up, then dumping or lowering quickly, rinse and repeat.
The code currently contains a fair amount of IP, so I can’t share directly. I’ll see if I can fork it and dilute it to a minimalist version which shows the issue which I can share here.

WayneWWW · September 18, 2017, 3:19am

Hi greg2,

I would like to reproduce this on my devkit. Few questions here before you share the modified sample.

How do you measure the buffer number? Do you add something in gst-plugin?
What is the input source and pipeline?

Please describe more if possible.

greg2 · September 18, 2017, 8:09am

Hi Wayne,

In the gstreamer ‘queue’ element, linked here: Plugins
There are 3 elements focused on (which all agree with each other): current-level-buffers, current-level-bytes, current-level-time. The graphs show the “current-level-buffers” value at each second. Given it’s 30fps, a value of 120 buffers is a 4 second delay.
Let me verify disclosure, then I’ll get back to you on pipeline. Input source is app. Currently the issue is being replicated with a tester piece of code (which I’m currently trying to strip of IP to provide for you guys).

I found some interesting results over the weekend. I got a new Jetson TX2 and put the R28.1 BSP on it to try that out. Model 3 produced the same results as I was seeing before, but INTERESTINGLY model 0 seems to resolve the problem! This was very surprising to me because A. model 0 on R27.1 BSP did NOT resolve the issue (or even help it), and B. model 3 on R28.1 didn’t resolve it, so I feel like it couldn’t have just been a bug fix in R28.1 or something, or else I’d expect not to see the same behavior with model 3.

Does using the Denver cores change the way ethernet works, to the best of your knowledge? I’m wondering if maybe model 0 is just masking the problem in this case, as I feel it’s odd that WIFI doesn’t exhibit this problem on R27.1 with model 3 or 0.

Here’s some graphs regarding BSP R28.1:

With the one chart on the right you can see it never went above 2 frames buffered, and with the 2nd chart, where I switched from model 3 to model 0 half way through you can clearly see the point in time where I switched and the amount buffered just drops to near 0 in basically no time at all, exhibiting this huge advantage of 0 over 3 (which only seems to exist in the R28.1 BSP package).

I should be able to get you some sample code to try on your dev kit tomorrow, though since I’m streaming to an internal RTMP server on our network, obviously that I cannot provide. It’s a basic setup though, just a basic nginx server with the RTMP plugin built from source.

WayneWWW · September 18, 2017, 9:16am

Hi greg2,

Please also run tegrastats in different nvpmodel state and paste the results.

sudo ./tegrastats

Does wifi also work good when using both mode 0 and mode 3?

WayneWWW · September 19, 2017, 9:03am

Hi greg2,

Before you give us the test sample, I would like to know if we can have some quick findings.
For example, is the ethernet bitrate smaller than wifi so that causes this issues?

greg2 · September 21, 2017, 7:10pm

For everything below I used the Jetson with the R28.1 BSP.

I ran tegrastats with ethernet for 3 minutes on both model 3 and model 0. Graph results are here, raw tegrastats logs in folder below. Again in the number of queued buffers comparison you can see clear difference.

Looks like the CPU is strained a fair bit more on model 3 than model 0. I’m not sure if that’s to be expected.

Raw Logs:

Files of interest:
921_ether_model3_stats.txt
921_ether_model0_stats.txt

I just tried WIFI on both model 3 and model 0 on R28.1 BSP for 7m30s and problem did not occur. I can let it run longer to just check it out, but as you saw with the other set of graphs from before, the problem typically happens very quickly. I haven’t seen this problem at all with WIFI. It’s been an ethernet only issue.

The bitrate is the same with this test app regardless of what interface or model is used, same binary. The bitrate in test app is 35Mb/s, which should easily be achievable on an internal Gb network like this with no traffic - even 2x that shouldn’t pose a problem.

We were hoping that if model 0 played nice all around on R28.1 then we could just use it. Unfortunately other aspects (different component of the program) suffer significantly on model 0 (we saw this same behavior with model 0 on R27.1). This is a bug in our program that we knew about, but can’t be fixed currently without significant re-work. This ethernet issue that shows up with RTMP streaming is completely separate from that, that’s one of the reasons I’m using this test app to reproduce this issue - since it completely separates this part of the application from any other potential issues/components it focuses purely on the streaming aspect. Because of the other issues though, we’d like to use model 3 if possible until we get that other part re-worked, though using model 3 became a problem when we saw RTMP having issues over ethernet interface.

Should have sample finished today.

greg2 · September 22, 2017, 1:37am

Here’s a sample to reproduce, existing binaries in the zip are for the TX2.
Hopefully you see the issue too. Obviously you’d have to change the RTMP path to a valid RTMP server. This was a local server in the office.

https://drive.google.com/file/d/0B6FhWjPAiQ_gS3JETU9nbkJScGc/view?usp=sharing

greg2 · September 23, 2017, 3:02am

I have some follow up info you guys may or may not be interested in:

I plugged in a USB to ethernet dongle and the problem completely goes away. Transferred at 35Mb/s, no buffering at all. Upped it to 46Mb/s, still works perfect. I tried plugging it both into our custom board as well as the dev kit - the problem instantly goes away using the USB dongle for ethernet on both of them.

As far as the internet interfaces are concerned, the USB->ethernet dongle shows up as eth1 to Linux using generic drivers, no driver or any software was installed for it to support it, and this is using whatever is there in the R27.1 BSP using power model 3. 28.1 BSP (model 3) with our custom board also works great at 35Mb/s and 50Mb/s using the ethernet dongle.

This leads me to 2 thoughts:

The ethernet drivers/kernel buffering/TCP stack settings/cfg which I mentioned I was concerned about earlier likely isn’t the problem, as I’m assuming that most of the stuff (except for the lowest layers) at the software level is all used regardless of whether carrier board ethernet or USB dongle ethernet is in use. May still be an issue here, but I feel it likely isn’t - at least not at the upper levels of the stack.
Given that this is happening on both the nvidia carrier board as well as our custom board it may potentially be either A. A flaw in both board designs, or B. Some sort of issue/limitation of the Jetson’s connector or buffering regarding the Jetson’s connector.

WayneWWW · September 25, 2017, 4:39am

Hi greg2,

Could you also share raw log of the throughput result with iperf3? It is RTMP, so I guess it should be a TCP-connection,right?

WayneWWW · September 25, 2017, 8:48am

Because it seems an interface issue according to #11, I just tried to use RTSP with following pipeline to mimic your sample.

./test-launch "videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! rtph264pay name=pay0 pt=96"

However, we cannot have same cpu usage result as yours(2x cpus to 100%). May I ask if you could reproduce it on rtsp as well? Also, please provide the dmesg when issue occurs.
Are you using 1G link for data transfer?

We have more resources and experience in RTSP than RTMP. Thanks.

greg2 · September 26, 2017, 4:17am

I believe it should be TCP, yeah. Today I ran an isolated iperf3 test and it seemed to transmit 35Mb/s without issues. I’ll run it longer tomorrow, but it didn’t deviate from 35Mb/s throughout the time I did, which is surprising to say the least.

WayneWWW:

Because it seems an interface issue according to #11, I just tried to use RTSP with following pipeline to mimic your sample.
./test-launch "videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! rtph264pay name=pay0 pt=96"
However, we cannot have same cpu usage result as yours(2x cpus to 100%). May I ask if you could reproduce it on rtsp as well? Also, please provide the dmesg when issue occurs.
Are you using 1G link for data transfer?

We have more resources and experience in RTSP than RTMP. Thanks.

Alright, so I tried the RTSP test launch with that pipe, there’s a few things to note:

There’s no queue in that pipeline, so the issue that is being seen where the queue is filling up won’t happen
I added a queue element before rtph264pay and that is not being added to at all. I’m not sure if this type of issue can be seen with gstreamer’s RTSP because with other pipelines even a perfect stream usually buffers 1 buffer here and there and that queue with RTSP is perfectly 0 the entire time, so not sure if the rtph264pay element is going to block at all and start filling up that queue.
I shared a log below which is the ffplay (ffmpeg) log receiving the test-launch RTSP stream of the pipeline you provided. As you can see it’s dropping lots of packets and has lots of errors, I’m not sure if this is expected with RTSP or not. The output of ffmpeg looks OK, but there’s many errors. Perhaps in this case gst’s RTSP plugin’s answer to non-receiving is to drop it instead of re-transmit.
Yeah, the connection is gigabit all around. A gigabit router with a gigabit switch. Tried bypassing switch and 4 different routers.
dmesg has no logs about this, it seems
We added some adaptive bitrate code to determine what bitrate it was OK with sending and it seems to level out at about 23Mb/s or so. Below that it doesn’t buffer at all, above it does.

ffmpeg receive log:

DaneLLL · September 26, 2017, 6:07am

Hi greg2,
Do you know how to run RTMP in gstreamer command? Probably like
‘gst-launch-1.0 videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! rtmpsink’

And the command to receive the stream?

Not sure but it probably is an issue in different configuration of packet size, but for now we cannot reproduce it with RTSP. We don’t have experience in RTMP, so need your help to share your knowledge.

WayneWWW · September 27, 2017, 4:31am

Hi greg2,

Please also share iperf3 and ethtool result with following command.
35Mb/s looks a little bit small…

TCP_UL:
Host: iperf -s -i 1
DUT: iperf -c <server_ip_addr> -i 1 -t 20

TCP_DL:
Host:  iperf -c <server_ip_addr> -i 1 -t 20
DUT: iperf -s -i 1

ethtool eth0

Full dmesg is also helpful.

Thanks

greg2 · September 28, 2017, 12:16am

Hi Dane, I used this pipeline, and the problem definitely exists here as well:

gst-launch-1.0 -e videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! flvmux ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! rtmpsink location="rtmp://192.168.86.44/rtmptest"

There is no receiving pipeline - it’s streaming to an Nginx RTMP server - it’s the same RTMP server which is working fine when the Jetson is using WIFI, so problem is on Jetson/client side of things.

The -e option in the pipeline sends “EOS” to pipeline when <ctrl+C> is pressed. If you remove the queue, as soon as you hit <ctrl+C> the pipeline receives EOS and gst-launch1.0 finishes/closes. This is because frames that couldn’t be pushed to rtmpsink were simply dropped.

If you use the pipeline as I did above, however, with the queue before the rtmpsink, now when you hit <ctrl+C> it says:

^Chandling interrupt.
Interrupt: Stopping pipeline ...
EOS on shutdown enabled -- Forcing EOS on the pipeline
Waiting for EOS...
<b>< WAIT FOR APPROX 45 SECONDS ></b>
Got EOS from element "pipeline0".
EOS received - stopping pipeline...

Then finishes. This indicates that the queue is full (about 1000+ or 1200 buffers!) when I hit <ctrl+C>, and then it has to wait until queue empties before it receives EOS and finishes the stream. So the problem is indeed fully repeatable outside of the test app. The only annoying part about this method of reproduction is that you cannot view the queue level (as far as I know) while the stream is running, so you can’t see it and graph it and the like. You can only estimate what it was at the end based on how long it takes the queue to empty. I imagine that this test may yield different results each time it’s run as the ‘number of buffers’ graph I showed in previous posts has a “shark tooth” pattern, and the queue completely empties (or near empties) at some points. If you happen to hit <ctrl+C> while it’s at a minimum it will finish very quickly, and you may not know the problem exists. On this run, however, I happened to hit <ctrl+C> while it must have been 1000+ buffers full, so it took a very long time to finish emptying and pushing all data to server, illustrating the problem.

The destination IP is LAN, though it has the same behavior regardless of where the stream is sent to.

I would agree that it’s probably an issue with packet size or something along those lines. Something specific to the interface, I would think, or this would happen with eth1 (the USB->ethernet dongle) and WIFI.

I’ll see if I can create a GST pipeline to receive the stream using an rtmpsrc, but I’m not sure how that works, if rtmpsrc can behave like a server and what-not. I’ll have to look into it.

WayneWWW:

Hi greg2,

Please also share iperf3 and ethtool result with following command.
35Mb/s looks a little bit small…
TCP_UL:
Host: iperf -s -i 1
DUT: iperf -c <server_ip_addr> -i 1 -t 20

TCP_DL:
Host:  iperf -c <server_ip_addr> -i 1 -t 20
DUT: iperf -s -i 1

ethtool eth0
Full dmesg is also helpful.

Thanks

Hi Wayne,

35Mb/s is a bit small for 4k, but it was a fairly low common denominator so I used it. It’s a number that should easily work on a LAN. As I mentioned in the previous post, iperf3 doesn’t show the issue - I attached a full log @ 35Mb/s below. I also ran iperf3 at 50Mb/s and it also didn’t show the issue.

Jetson (client, sender)

iperf3 -c 192.168.86.44 -p 8000 -i 1 -t 300 -b 35M

Ubuntu Desktop (server, receiver):

iperf3 -i 1 -p 8000 -s

iperf3 log: https://drive.google.com/file/d/0B6FhWjPAiQ_gX1RyUHZldnhBUVk/view?usp=sharing

ethtool log: https://drive.google.com/file/d/0B6FhWjPAiQ_gQy1iMUJ0UHVtUEk/view?usp=sharing

dmesg, as mentioned, doesn’t show any errors regarding this, I put the full log at the link below though if you want to check something out
dmesg log: https://drive.google.com/file/d/0B6FhWjPAiQ_gNFI4U0F1cFFLZmM/view?usp=sharing

I’m wondering what iperf3 is doing different to cause it to achieve the bandwidth.

DaneLLL · September 28, 2017, 1:27am

For double confirmation, so I only need one TX2(r28.1) running

gst-launch-1.0 -e videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! flvmux ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! rtmpsink location="rtmp://<IP_of_the_TX2>/rtmptest"

And press <ctrl+C> to observe the time of stopping the pipeline.

Because RTSP needs to set up server and client, I guess it is same for RTMP. No?

DaneLLL · September 28, 2017, 3:10am

Hi greg2,
My understanding is that you have an Nginx RTMP server, but we cannot connect to your RTMP server. For reproducing the issue, we need to set up an Nginx RTMP server locally. Are you able to share guidance to set up an Nginx RTMP server? Do you set it up on TX2(r28.1)?

greg2 · September 29, 2017, 1:40am

DaneLLL:

For double confirmation, so I only need one TX2(r28.1) running
gst-launch-1.0 -e videotestsrc pattern=1 ! video/x-raw,width=4096,height=2160,format=I420 ! nvvidconv ! omxh264enc bitrate=35000000 iframeinterval=1 control-rate=2 ! flvmux ! queue max-size-buffers=0 max-size-bytes=0 max-size-time=0 ! rtmpsink location="rtmp://<IP_of_the_TX2>/rtmptest"
And press <ctrl+C> to observe the time of stopping the pipeline.

Because RTSP needs to set up server and client, I guess it is same for RTMP. No?

Hi Dane,

Yes, you only need 1 TX2 running either r27.1 or r28.1 using nvpmodel 3 to see the issue.

Yes, pressing ctrl+C will send the EOS to stream. If there’s no data in queue then it will immediately say “Got EOS from element pipeline0” and terminate. If there’s 120 buffers in the queue, however (at 30 buffers per second), it’ll take 4 seconds to empty (120/30), receive EOS, and close - and if even more buffers built up in the queue then the longer it’ll take to receive EOS and finish. Alternatively, if the test app is run (instead of gst-launch) it will print out the current level of the queue once a second, so you don’t have to guesstimate at the end how many frames were in the queue based on how long it takes to receive EOS after you hit ctrl+C, the test app will simply tell you live at each second how many frames are in queue and you’ll be able to see it rise and fall.

Setting up the nginx RTMP server is actually very straight forward. We’re just using a normal 16.04 Ubuntu x86 system for local in-office testing. Instructions loosely based on what’s here, but using older nginx (1.7.5): Getting started with nginx rtmp · arut/nginx-rtmp-module Wiki · GitHub
0. Install some build requirements: sudo apt-get install build-essential libpcre3 libpcre3-dev libssl-dev

Download nginx 1.7.5 source tarball here: http://nginx.org/download/nginx-1.7.5.tar.gz
extract with: tar xzf nginx-1.7.5.tar.gz
clone RTMP module (should create ‘nginx-rtmp-module’ directory): git clone git://github.com/arut/nginx-rtmp-module.git
move into extracted nginx dir: cd nginx-1.7.5
configure env for build: ./configure --with-http_ssl_module --add-module=…/nginx-rtmp-module
build nginx + rtmp module using: make
install nginx using: sudo make install
I provided here a basic nginx config for RTMP app which will save stream in /home//tmp_rtmp of the server system when received, so create the /home//tmp_rtmp directory or change path in config. Add username of system to config where it says so that path is valid. You can put this config inside /usr/local/nginx/conf/nginx.conf
https://drive.google.com/file/d/0B6FhWjPAiQ_gY3RHdXhkQ2MzRW8/view?usp=sharing
You can start server (it won’t start automatically at boot) with: sudo /usr/local/nginx/sbin/nginx

ALTERNATIVELY (can use both)

We set up a public facing server which you guys can test against. It won’t save the stream and you can’t access it, but you can stream to it for testing to see the problem. Obviously it would use the public internet, however, so it’s not as controlled of an environment as a local LAN system running nginx+RTMP server would be. If you’d like to try that, the IP is here: 35.197.72.182 . So you can access it via the test app or via gst-launch using the url: rtmp://35.197.72.182/rtmptest .

I hope this helps you guys a bit in reproducing the issue.