Jetson TX1 strange network performance behaviour (still)

QUams · February 14, 2018, 7:20pm

Hi!

I encountered a lot very strange network performance behaviours with the TX1. To verify that it is not my personal setup a did the same with a TX1 on the TX1 development board and with a vanilla JetPack 3.1 installation on it.

When I use the internal r8152 network card and iperf3 to measure the performance I get around 300MBits/sec when sending to the device but full around 900MBits/sec when receiving from the device.

On the TX1:

nvidia@tegra-ubuntu:~$ iperf3 -s -p 12345

On any other working machine connected to the same GigE switch:
Receiving from the TX1

$iperf3 -c tegra-ubuntu.local -p 12345 -t 60 -i 10 -b 1G -R
Connecting to host tegra-ubuntu.local, port 12345
Reverse mode, remote host tegra-ubuntu.local is sending
[  7] local fe80::10b9:e3af:1f20:c50a port 52937 connected to fe80::f236:19c3:65f4:d4c0 port 12345
[ ID] Interval           Transfer     Bitrate
[  7]   0.00-10.00  sec  1.06 GBytes   911 Mbits/sec
[  7]  10.00-20.00  sec  1.08 GBytes   928 Mbits/sec
[  7]  20.00-30.00  sec  1.08 GBytes   928 Mbits/sec
[  7]  30.00-40.00  sec  1.08 GBytes   928 Mbits/sec
[  7]  40.00-50.00  sec  1.08 GBytes   928 Mbits/sec
[  7]  50.00-60.00  sec  1.08 GBytes   928 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  7]   0.00-60.00  sec  6.46 GBytes   925 Mbits/sec    0             sender
[  7]   0.00-60.00  sec  6.46 GBytes   925 Mbits/sec                  receiver

iperf Done.

Sending to the TX1:

$iperf3 -c tegra-ubuntu.local -p 12345 -t 60 -i 10 -b 1G
Connecting to host tegra-ubuntu.local, port 12345
[  7] local fe80::10b9:e3af:1f20:c50a port 52944 connected to fe80::f236:19c3:65f4:d4c0 port 12345
[ ID] Interval           Transfer     Bitrate
[  7]   0.00-10.00  sec   399 MBytes   335 Mbits/sec
[  7]  10.00-20.00  sec   431 MBytes   361 Mbits/sec
[  7]  20.00-30.00  sec   478 MBytes   401 Mbits/sec
[  7]  30.00-40.00  sec   408 MBytes   342 Mbits/sec
[  7]  40.00-50.00  sec   399 MBytes   335 Mbits/sec
[  7]  50.00-60.00  sec   517 MBytes   434 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  7]   0.00-60.00  sec  2.57 GBytes   368 Mbits/sec                  sender
[  7]   0.00-60.00  sec  2.57 GBytes   368 Mbits/sec                  receiver

iperf Done.

I found this thread https://devtalk.nvidia.com/default/topic/979635/jetson-tx1/ethernet-speed-increases-when-micro-usb-2-0-connector-is-connected/ which has similar problems, but the apperent fix at the end of that thread is not applicable anymore due to changes in the kernel over the last year.

A solid network performance is very important for the application of our company, so I need reliable speeds above 900MBits/sec.

Anyone had the same problem and solved it for a JetPack 3.1 / L4T 28.1 installation?

linuxdev · February 14, 2018, 8:17pm

This probably won’t matter, but you may want to first be certain that “ifconfig” from both sides reports MTU the same (each direction can have a different MTU). Also, if you have traffic going through a router (or expensive managed switch) it might behave differently depending on direction. Do be sure it is a switch and not a router the traffic goes through (or perhaps even direct…a cross over cable is needed if both ends don’t auto switch with non-crossover…and of course one end would have to also be a DHCP server in that case unless static IP addresses are used).

QUams · February 14, 2018, 9:47pm

Thanks for the tips, but bogus MTU settings were my very first idea as well. The switch really shouldn’t be the problem since the Jetson and the “other” computer are basically the only users on that particular switch, with an uplink to the dhcp server. So was very much with you on the first level of error hunt!

Interestingly I have no errors on the ifconfig outputs, but if I have use an iperf3 version which reports retries they are very high (three digits) for the TX1-receiving case.

I have currently no access to the systems, but I can post some examples tomorrow.

linuxdev · February 14, 2018, 9:50pm

An ifconfig listing for both systems after a session where there were errors would be enlightening. Before you start the traffic which has the errors you might want to run “dmesg --follow” so you can see if the kernel announces anything during that same session.

QUams · February 15, 2018, 5:13pm

Today I did a full range of tests on the hardware available to me, which at the moment are:

TX1 on a NVIDIA Development board running a clean Jetpack
TX1 on an Auvidea J120 running a custom 4.4.38 kernel and an ubuntu 16.04 base system
TX2 on an Auvidea J140 running a custom 4.4.38 kernel and an ubuntu 16.04 base system
Various PCs on ubuntu and one MacBook Pro

During all the tests the computers involved were connected to the same gigabit ethernet switch with only infrastructure (DHCP server) coming in on one port. I also tested 3 different switches:

TP Link TL-SG105E (5 Port GigE)
CyberData 011236A Embedded (3 Port GigE)
Netgear ProSafe GS108 (8 Port GigE)

The bandwidth to and from the TX1 (no big difference between the Nvidia Dev board and the auvidea J120) varies but never reaches the expected 900MBits/sec in both directions.

Interestingly there were no kernel messages related to the network whatsoever.

Two good examples are between two TX1 and the TX1 and the TX2. In the two outputs below, the server was always running on the other side and the client on the TX1 on the Nvidia Development board (ubuntu-tegra).

At the moment I am a bit puzzled, since the only HW option left are cables. I ordered a set of highend Cat 6a cables, which hopefully will arrive befor the weekend, but I really think there is something else wrong here, since the TX behaves pretty well on a cable and port the TX1s make funny things. I have the feeling that there is something wrong down the line of the USB ethernet conversion. Otherwise I can’t explain the really huge amount of retries as well.

Any ideas from anyone?!?

Connected to TX1 on Auvidea J120:

X1J120 Before:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:5a:ec:56
              inet addr:192.168.1.29  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::204:4bff:fe5a:ec56/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:4138 errors:0 dropped:0 overruns:0 frame:0
              TX packets:355 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:690435 (690.4 KB)  TX bytes:117975 (117.9 KB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:11938 (11.9 KB)  TX bytes:11938 (11.9 KB)

TX1DEV before:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:84
              inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::68b7:a7a3:435a:38fd/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:13023348 errors:0 dropped:0 overruns:0 frame:0
              TX packets:4775656 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:18478442603 (18.4 GB)  TX bytes:8620983329 (8.6 GB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:234 errors:0 dropped:0 overruns:0 frame:0
              TX packets:234 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:17815 (17.8 KB)  TX bytes:17815 (17.8 KB)
              
    wlan0     Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:82
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Results:

    $ iperf3 -c tegra-ubuntu -p 12345 -i 10 -t 60 -b 1G
    Connecting to host tegra-ubuntu, port 12345
    [  4] local 192.168.1.29 port 41176 connected to 192.168.1.26 port 12345
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-10.00  sec   874 MBytes   733 Mbits/sec  297    215 KBytes
    [  4]  10.00-20.00  sec   887 MBytes   744 Mbits/sec  211    283 KBytes
    [  4]  20.00-30.00  sec   882 MBytes   740 Mbits/sec  267    262 KBytes
    [  4]  30.00-40.00  sec   875 MBytes   734 Mbits/sec  258    249 KBytes
    [  4]  40.00-50.00  sec   877 MBytes   736 Mbits/sec  278    235 KBytes
    [  4]  50.00-60.00  sec   875 MBytes   734 Mbits/sec  300    256 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-60.00  sec  5.15 GBytes   737 Mbits/sec  1611             sender
    [  4]   0.00-60.00  sec  5.14 GBytes   737 Mbits/sec                  receiver
    
    iperf Done.
    $ iperf3 -c tegra-ubuntu -p 12345 -i 10 -t 60 -b 1G -R
    Connecting to host tegra-ubuntu, port 12345
    Reverse mode, remote host tegra-ubuntu is sending
    [  4] local 192.168.1.29 port 41180 connected to 192.168.1.26 port 12345
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-10.00  sec   799 MBytes   670 Mbits/sec
    [  4]  10.00-20.00  sec   806 MBytes   677 Mbits/sec
    [  4]  20.00-30.00  sec   814 MBytes   683 Mbits/sec
    [  4]  30.00-40.00  sec   822 MBytes   689 Mbits/sec
    [  4]  40.00-50.00  sec   821 MBytes   688 Mbits/sec
    [  4]  50.00-60.00  sec   821 MBytes   689 Mbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-60.00  sec  4.77 GBytes   683 Mbits/sec  5294             sender
    [  4]   0.00-60.00  sec  4.77 GBytes   683 Mbits/sec                  receiver
    
    iperf Done.

TX1J120 after:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:84
              inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::68b7:a7a3:435a:38fd/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:18627771 errors:0 dropped:0 overruns:0 frame:0
              TX packets:7065839 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:24373140690 (24.3 GB)  TX bytes:13900665875 (13.9 GB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:234 errors:0 dropped:0 overruns:0 frame:0
              TX packets:234 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:17815 (17.8 KB)  TX bytes:17815 (17.8 KB)
              
    wlan0     Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:82
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

TX1DEV After:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:5a:ec:56
              inet addr:192.168.1.29  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::204:4bff:fe5a:ec56/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:5458206 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2169169 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:5481488008 (5.4 GB)  TX bytes:5670394266 (5.6 GB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:11938 (11.9 KB)  TX bytes:11938 (11.9 KB)

Connected to TX2 on Auvidea J140 (realtek port):

TX2J140 before:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:8d:46:88
              inet addr:192.168.1.37  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::204:4bff:fe8d:4688/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:3662 errors:0 dropped:0 overruns:0 frame:0
              TX packets:845 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:723092 (723.0 KB)  TX bytes:171853 (171.8 KB)
              Interrupt:42
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:11938 (11.9 KB)  TX bytes:11938 (11.9 KB)

TX1DEV before:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:84
              inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::68b7:a7a3:435a:38fd/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:8955188 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2601581 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:13159965433 (13.1 GB)  TX bytes:1603308787 (1.6 GB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:234 errors:0 dropped:0 overruns:0 frame:0
              TX packets:234 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:17815 (17.8 KB)  TX bytes:17815 (17.8 KB)
              
    wlan0     Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:82
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Result:

    $ iperf3 -c tegra-ubuntu -p 12345 -i 10 -t 60 -b 1G
    Connecting to host tegra-ubuntu, port 12345
    [  4] local 192.168.1.37 port 40746 connected to 192.168.1.26 port 12345
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-10.00  sec   806 MBytes   676 Mbits/sec  538    253 KBytes
    [  4]  10.00-20.00  sec   811 MBytes   681 Mbits/sec  587    276 KBytes
    [  4]  20.00-30.00  sec   804 MBytes   675 Mbits/sec  474    284 KBytes
    [  4]  30.00-40.00  sec   817 MBytes   685 Mbits/sec  472    270 KBytes
    [  4]  40.00-50.00  sec   788 MBytes   661 Mbits/sec  458    264 KBytes
    [  4]  50.00-60.00  sec   791 MBytes   663 Mbits/sec  505    281 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-60.00  sec  4.70 GBytes   673 Mbits/sec  3034             sender
    [  4]   0.00-60.00  sec  4.70 GBytes   673 Mbits/sec                  receiver
    
    iperf Done.
    $ iperf3 -c tegra-ubuntu -p 12345 -i 10 -t 60 -b 1G -R
    Connecting to host tegra-ubuntu, port 12345
    Reverse mode, remote host tegra-ubuntu is sending
    [  4] local 192.168.1.37 port 40750 connected to 192.168.1.26 port 12345
    [ ID] Interval           Transfer     Bandwidth
    [  4]   0.00-10.00  sec  1.05 GBytes   903 Mbits/sec
    [  4]  10.00-20.00  sec  1.06 GBytes   908 Mbits/sec
    [  4]  20.00-30.00  sec  1.04 GBytes   896 Mbits/sec
    [  4]  30.00-40.00  sec  1.07 GBytes   918 Mbits/sec
    [  4]  40.00-50.00  sec  1.10 GBytes   941 Mbits/sec
    [  4]  50.00-60.00  sec  1.08 GBytes   932 Mbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-60.00  sec  6.40 GBytes   917 Mbits/sec  360             sender
    [  4]   0.00-60.00  sec  6.40 GBytes   916 Mbits/sec                  receiver
    
    iperf Done.

TX2J140 after:

    $ /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:8d:46:88
              inet addr:192.168.1.37  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::204:4bff:fe8d:4688/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:6491798 errors:0 dropped:0 overruns:0 frame:0
              TX packets:691697 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:7302589896 (7.3 GB)  TX bytes:10984071550 (10.9 GB)
              Interrupt:42
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:11938 (11.9 KB)  TX bytes:11938 (11.9 KB)

TX1DEV after:

    /sbin/ifconfig
    eth0      Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:84
              inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::68b7:a7a3:435a:38fd/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:13023329 errors:0 dropped:0 overruns:0 frame:0
              TX packets:4775648 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:18478440579 (18.4 GB)  TX bytes:8620981145 (8.6 GB)
              
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:65536  Metric:1
              RX packets:234 errors:0 dropped:0 overruns:0 frame:0
              TX packets:234 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1
              RX bytes:17815 (17.8 KB)  TX bytes:17815 (17.8 KB)
              
    wlan0     Link encap:Ethernet  HWaddr 00:04:4b:a1:dd:82
              UP BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

dmesg output

(...)
[    6.710355] systemd[1]: Created slice User and Session Slice.
[    6.718499] systemd[1]: Listening on Syslog Socket.
[    6.729052] systemd[1]: Listening on udev Control Socket.
[    6.736717] systemd[1]: Reached target User and Group Name Lookups.
[    6.745338] systemd[1]: Listening on Journal Audit Socket.
[    6.753056] systemd[1]: Listening on udev Kernel Socket.
[    6.760560] systemd[1]: Listening on Journal Socket (/dev/log).
[    6.768643] systemd[1]: Listening on LVM2 poll daemon socket.
[    6.776543] systemd[1]: Created slice System Slice.
[    6.783477] systemd[1]: Reached target Slices.
[    6.789878] systemd[1]: Reached target Swap.
[    6.796030] systemd[1]: Reached target Encrypted Volumes.
[    6.803360] systemd[1]: Listening on /dev/initctl Compatibility Named Pipe.
[    6.812234] systemd[1]: Listening on LVM2 metadata daemon socket.
[    6.820318] systemd[1]: Created slice system-serial\x2dgetty.slice.
[    6.832016] systemd[1]: Listening on Journal Socket.
[    6.839530] systemd[1]: Started Braille Device Support.
[    6.847829] systemd[1]: Starting Create list of required static device nodes for the current kernel...
[    6.853306] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    6.868239] systemd[1]: Starting Journal Service...
[    6.879745] systemd[1]: Mounting Debug File System...
[    6.889222] systemd[1]: Starting Remount Root and Kernel File Systems...
[    6.898194] systemd[1]: Reached target Remote File Systems (Pre).
[    6.906746] systemd[1]: Reached target Remote File Systems.
[    6.914497] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[    6.923956] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
[    6.941051] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    6.952485] systemd[1]: Starting Set console keymap...
[    6.968416] systemd[1]: Starting Load Kernel Modules...
[    6.977348] systemd[1]: Started Create list of required static device nodes for the current kernel.
[    6.989988] systemd[1]: Started Remount Root and Kernel File Systems.
[    7.014442] systemd[1]: Starting udev Coldplug all Devices...
[    7.023545] systemd[1]: Starting Load/Save Random Seed...
[    7.032354] systemd[1]: Starting Create Static Device Nodes in /dev...
[    7.045984] systemd[1]: Mounted Debug File System.
[    7.055938] systemd[1]: Started Load Kernel Modules.
[    7.065118] systemd[1]: Started Load/Save Random Seed.
[    7.073213] systemd[1]: Started Journal Service.
[    7.135448] systemd-journald[207]: Received request to flush runtime journal from PID 1
[    7.283307] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    7.368054] xhci-tegra 70090000.xusb: cannot find firmware....retry after 1 second
[    7.619213] dhd_module_init in
[    7.619407] found wifi platform device bcmdhd_wlan
[    7.620444] Power-up adapter 'DHD generic adapter'
[    7.620463] wifi_platform_set_power = 1
[    7.705613] random: nonblocking pool is initialized
[    7.717811] tegra-pcie 1003000.pcie-controller: link 0 down, retrying
[    7.727973] tegra-pcie 1003000.pcie-controller: link 0 down, ignoring
[    7.823313] wifi_platform_bus_enumerate device present 1
[    7.857626] wifi_platform_bus_enumerate device present 0
[    7.881198] F1 signature read @0x18000000=0x17214354
[    7.939060] F1 signature OK, socitype:0x1 chip:0x4354 rev:0x1 pkg:0x2
[    7.939789] DHD: dongle ram size is set to 786432(orig 786432) at 0x180000
[    7.939864] wifi_platform_prealloc: failed to alloc static mem section 7
[    7.939872] wifi_platform_get_mac_addr
[    7.953641] CFG80211-ERROR) wl_setup_wiphy : Registering Vendor80211
[    7.955888] wl_create_event_handler(): thread:wl_event_handler:210 started
[    7.956003] CFG80211-ERROR) wl_event_handler : tsk Enter, tsk = 0xffffffc07b601a70
[    7.963796] dhd_attach(): thread:dhd_watchdog_thread:213 started
[    7.963941] dhd_attach(): thread:dhd_dpc:21c started
[    7.963985] dhd_attach(): thread:dhd_rxf:21d started
[    7.963990] dhd_deferred_work_init: work queue initialized
[    7.964253] Dongle Host Driver, version 1.201.82 (r)
               Compiled in drivers/net/wireless/bcmdhd on Jul 20 2017 at 00:39:01
[    7.964622] tegra_sysfs_register
[    7.964670] Register interface [wlan0]  MAC: 00:04:4b:a1:dd:82

[    7.964673] dhd_prot_ioctl : bus is down. we have nothing to do
[    7.965278] sdhci-tegra sdhci-tegra.1: Tuning already done, restoring the best tap value : 85
[    7.966329] wifi_platform_set_power = 0
[    8.148414] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    8.391952] xhci-tegra 70090000.xusb: Firmware timestamp: 2016-11-24 02:31:08 UTC, Version: 50.18 release
[    8.411542] xhci-tegra 70090000.xusb: xHCI Host Controller
[    8.419375] xhci-tegra 70090000.xusb: new USB bus registered, assigned bus number 1
[    8.430631] xhci-tegra 70090000.xusb: hcc params 0x0184f525 hci version 0x100 quirks 0x00010810
[    8.441958] xhci-tegra 70090000.xusb: irq 319, io mem 0x70090000
[    8.450252] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    8.450256] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    8.450259] usb usb1: Product: xHCI Host Controller
[    8.450261] usb usb1: Manufacturer: Linux 4.4.38-tegra xhci-hcd
[    8.450263] usb usb1: SerialNumber: 70090000.xusb
[    8.450693] hub 1-0:1.0: USB hub found
[    8.450718] hub 1-0:1.0: 5 ports detected
[    8.479353] xhci-tegra 70090000.xusb: xHCI Host Controller
[    8.479362] xhci-tegra 70090000.xusb: new USB bus registered, assigned bus number 2
[    8.479532] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[    8.479535] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    8.479538] usb usb2: Product: xHCI Host Controller
[    8.479540] usb usb2: Manufacturer: Linux 4.4.38-tegra xhci-hcd
[    8.479542] usb usb2: SerialNumber: 70090000.xusb
[    8.484705] hub 2-0:1.0: USB hub found
[    8.484742] hub 2-0:1.0: 4 ports detected
[    8.549455] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    8.795807] usb 2-1: new SuperSpeed USB device number 2 using xhci-tegra
[    8.812575] usb 2-1: New USB device found, idVendor=0955, idProduct=09ff
[    8.812579] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=6
[    8.812581] usb 2-1: Product: USB 10/100/1000 LAN
[    8.812584] usb 2-1: Manufacturer: Nvidia
[    8.812586] usb 2-1: SerialNumber: 000001000000
[    8.813257] xhci-tegra 70090000.xusb: tegra_xhci_mbox_work mailbox command 6
[    8.815503] xhci-tegra 70090000.xusb: tegra_xhci_mbox_work mailbox command 6
[    8.816048] xhci-tegra 70090000.xusb: tegra_xhci_mbox_work mailbox command 6
[    8.848965] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[    8.849070]
               Dongle Host Driver, version 1.201.82 (r)
               Compiled in drivers/net/wireless/bcmdhd on Jul 20 2017 at 00:39:01
[    8.849072] wl_android_wifi_on in
[    8.849075] wifi_platform_set_power = 1
[    8.928225] usb 2-1: reset SuperSpeed USB device number 2 using xhci-tegra
[    8.961316] tegra-pcie 1003000.pcie-controller: link 1 down, retrying
[    8.971314] tegra-pcie 1003000.pcie-controller: link 1 down, ignoring
[    8.978718] tegra-pcie 1003000.pcie-controller: PCIE: no end points detected
[    8.987556] tegra-pcie 1003000.pcie-controller: PCIE: Disable power rails
[    9.120684] mmc1: queuing unknown CIS tuple 0x80 (5 bytes)
[    9.206034] sdhci-tegra sdhci-tegra.1: Tuning already done, restoring the best tap value : 85
[    9.219304] F1 signature read @0x18000000=0x17214354
[    9.229684] F1 signature OK, socitype:0x1 chip:0x4354 rev:0x1 pkg:0x2
[    9.236662] DHD: dongle ram size is set to 786432(orig 786432) at 0x180000
[    9.295782] dhdsdio_write_vars: Download, Upload and compare of NVRAM succeeded.
[    9.344664] dhd_bus_init: enable 0x06, ready 0x06 (waited 0us)
[    9.351819] wifi_platform_get_mac_addr
[    9.361085] Firmware up: op_mode=0x0005, MAC=00:04:4b:a1:dd:82
[    9.372653] dhd_preinit_ioctls pspretend_threshold for HostAPD failed  -23
[    9.384015] Firmware version = wl0: Sep 14 2016 11:38:27 version 7.35.221.18 (r657725) FWID 01-9001dfb5
[    9.395888] dhd_interworking_enable: failed to set WNM info, ret=-23
[    9.402439] tegra_sysfs_on
[    9.463800] r8152 2-1:1.0 eth0: v2.03.3 (2015/01/29)
[    9.469156] r8152 2-1:1.0 eth0: This product is covered by one or more of the following patents:
               		US6,570,884, US6,115,776, and US6,327,625.

[    9.523478] CFGP2P-ERROR) wl_cfgp2p_add_p2p_disc_if : P2P interface registered
[    9.546776] WLC_E_IF: NO_IF set, event Ignored
[    9.820592] cfg80211: World regulatory domain updated:
[    9.829773] cfg80211:  DFS Master region: unset
[    9.834176] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[    9.843989] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[    9.852019] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[    9.860035] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[    9.868059] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[    9.877550] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[    9.887057] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[    9.895167] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[    9.903182] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[   10.671092] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   10.763495] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   10.932103] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   10.984605] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.018068] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.063481] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.071824] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.080787] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.120737] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.129083] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.148850] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   11.186197] Setting pll_a = 45158400 Hz clk_out = 11289600 Hz
[   13.637089] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   14.549199] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   52.772777] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.
[   54.745951] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   61.314019] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   68.172535] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   69.313281] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   78.228185] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   79.573457] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[   97.844008] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000
[  127.032820] tegra_soctherm 700e2000.soctherm: soctherm: trip temperature -2147483647 forced to -127000

linuxdev · February 15, 2018, 9:04pm

I see “ifconfig” with the “before”. Is there any “ifconfig” which is after things have finished which show any kind of error?

EDIT: Forgot to ask, have you run the “~ubuntu/jetson_clocks.sh” program to bring up core speed before testing?

QUams · February 16, 2018, 1:53pm

There is an ifconfig output from at the beginning and one after the test, directly behind the iperf3 output(s) for each of the two examples…

Ooops, I definitely should have written that! The primary receiver, Jetson TX1 on the Nvidia Development board, was running on full speed. The other two weren’t.

linuxdev · February 16, 2018, 3:06pm

I see no errors with the network itself. You will probably want everything running at full speed (unless you are profiling for some other mode).

If you run “dmesg --follow” do you see the interface going up and down during the test? I didn’t see a note of that on the dmesg, but this is something which would probably occur only during the test and I don’t know if the dmesg was from after the test or before the test.

QUams · February 16, 2018, 3:50pm

The other two Jetson don’t have sufficient cooling at the moment, that’s why I kept them at the slower speed yesterday.

The dmesg output attached to my post above is basically the ouput over the complete time I ran the tests. I just skipped the first 7 s of the startup, because the forum didn’t like posts that long. But I copy pasted it while writing my post.

The point that there is no indication whatsoever from the network stack itself is what puzzles m most. In my experience a bad cable, bogus negotiations (duplex, speed etc.) between the NICs involved or miss-configurations almost always produce transmission errors on packet level. The only indication that something is off, is the extreme high retry counts from iperf3. And that seems to be an indication of something being wrong between the network stack and userland (iperf3).

Two hosts, connected directly to the same switch with no further network load, running clean interfaces and nothing than housekeeping besides the iperf3 almost always generate around 950MBits/sec with no retries on a GigE network.

I would really like to have something in the range of 900MBits/Sec to 950Mbits/sec on the Jetsons. The recorded speed won’t necessarily brake my application, but I fear the existence of a bigger problem buried deeper in the network, that will come back to haunt me at the most inappropriate moment.

So I would really prefer if I could get down to the root of what is happening…

linuxdev · February 16, 2018, 5:01pm

Normally I wouldn’t think a local network would have retries, but I also don’t know how iperf generates this…i.e., whether this is a TCP retry or if it is something part of iperf itself. Since there are no ifconfig errors it isn’t obvious. Certainly a retry from TCP would make me very curious, but if this is some iperf code on top of UDP it would drastically change the questions (I am not an iperf guru). FYI, adding a reliability layer on top of UDP at each end point which is exactly like the nagle algorithm in TCP does not result in the same behavior as TCP…in TCP each hop on the route (including the switch) has an understanding of TCP and the rules of retransmit…for a UDP version intermediate hops will have no such knowledge. Perhaps if both ends are directly connected and have no switch in between reliability added to UDP would be equivalent to TCP/nagle.

Does anyone here know whether iperf has its own retry mechanism, or if instead it is simply monitoring TCP retries? If it is TCP, then there are parameters which can be changed in “/proc” to test against.

I am curious if you can post the output of “ethtool ” on each of the involved interfaces? This might point out differences such as half-duplex/full-duplex.

Also, do you know if everything in the tests involved were purely IPv4 and none of the hosts/clients are using IPv6 in the communications chain? I suspect a “traceroute” from both ends would show purely IPv4 addresses, but I want to be sure.

QUams · February 16, 2018, 5:18pm

iperf3 uses tcp in the way I use it. You have to specifically switch to udp via the command-line if you want to use it.

The retries are re-send tcp segments which get lost due to congestion or corruption (see [url]https://github.com/esnet/iperf/issues/343[/url]).

It is purely IPv4, but I will test again nevertheless! Currently I am out of the office, but I will follow up as soon as I am back on the weekend.

linuxdev · February 16, 2018, 7:50pm

If the issue was with corruption I think you’d see other errors. Congestion would not show up as an error.

I’m thinking you may be running into memory buffer limits set in “/proc/sys/net/ipv4/tcp_mem”. For information on files there see:
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

On a terminal you can view as the perf test runs see how queue sizes are going up at moments when you think it might be doing a retransmit:

sudo -s
watch -n 1 "ss -m -n | egrep '(tcp|Netid)'"
exit

If possible try to associate a particular line of the “ss” socket stat output to what the perf command is using. If it turns out that you are hitting memory constraints you might be able to increase the tcp_mem and at least lower the retries. I don’t know of a better way to watch memory used by a particular socket, but it might still point to something which can be tuned.

QUams · March 5, 2018, 5:01pm

I was out of office for a while and could only return to the problem right now.

I had a look at an incoming iperf3 connection using the following command:

ss -imn -A tcp -o state established '( sport = 5201 )'

It is somewhat hard to track, but since I never have an iperf3 connection without retries, it is a bit hard to see what is happening. But the usage is going up on the receive site as soon as bigger retry numbers pass by. An example is:

Recv-Q Send-Q Local Address:Port               Peer Address:Port
1448   0       ::ffff:192.168.1.86:5201                ::ffff:192.168.1.37:58002
         skmem:(r2304,rb4311000,t28,tb87040,f722688,w0,o0,bl0) ts sack cubic wscale:7,7 rto:204 rtt:1.913/0.956 ato:40 mss:1448 cwnd
:10 bytes_received:4760066269 segs_out:1655256 segs_in:3287342 send 60.6Mbps lastsnd:60036 lastack:112 pacing_rate 121.1Mbps rcv_rtt
:7.5 rcv_space:682008

I significantly raised the memory available to the kernel for network tcp related stuff:

sysctl -w net.core.rmem_max=8388608
sysctl -w net.core.wmem_max=8388608
sysctl -w net.core.rmem_default=65536
sysctl -w net.core.wmem_default=65536
sysctl -w net.ipv4.tcp_rmem='4096 87380 8388608'
sysctl -w net.ipv4.tcp_wmem='4096 65536 8388608'
sysctl -w net.ipv4.tcp_mem='8388608 8388608 8388608'

But there is no perceivable change, I still “only” have 600MBits/s with around 300 Retries for a 60s burst between two Jetsons.

linuxdev · March 5, 2018, 10:45pm

Sorry, this is really long :P

I’m just adding some observations…not necessarily anything in particular or in any order. Not all tests seem consistent. There is no real conclusion in this, but there are a lot of tests which may be surprising. It shows a few cases with the obvious possibilities are not the problem.

My TX1 is running a fully updated L4T R28.1.

I am running as root with “sudo -s”.

Some of my testing leads to questions I can’t answer. As a very basic test I tried to see more basic results…I ran a flood ping while jetson_clocks.sh was at full speed (“sudo ping -f whereever”) for 30 seconds from host to Jetson, and then from Jetson to host. Both resulted in no loss and right around 50000 packets. This does not involve TCP or UDP (it’s ICMP) and is more closely related to the physical layer (and ARP) working correctly (without this being correct TCP and UDP would both inherit a faulty environment). No error, drop, overrun, collision, etc., ever occurs from flood ping. This tends to place issues with the higher level protocol stacks (hardware drivers work in lower levels on CPU0, software drivers implement stacks on any CPU core…stacks are limited by throughput of data feeding them or being consumed).

I see that the “-b 1G” argument to iperf3 is not actually listed as valid in the man page, but “iperf3 --help” does show this. I tried iperf3 with “-b 1X” just to see if it showed an error, and it does not (I consider it a bug that an invalid argument is not an error). This calls into question whether the 1G bitrate is really doing as expected. I don’t have a network analyzer so I couldn’t say. Probably 1G is supported…but then again, perhaps it is supported just on arm64 or just on x86_64. I don’t know.

So I did something not yet done in order to isolate where limitations are coming from. I ran iperf3 as both server and client on the TX1 (which also implies the 1G speed is guaranteed to be the same for behavior of both client or server mode…there is no arm64/x86_64 difference possible). I used localhost address 127.0.0.1. This avoids going through the Realtek driver and hardware. This still uses protocol stacks (keep in mind ping doesn’t care much about protocol stacks, UDP and TCP do). I had a throughput of approximately 999Mbits/sec, with no retries (this verifies “-b 1G” works as expected). This loopback interface can bypass CPU0 since no hardware drivers are involved. I’d say protocol stacks (iperf3 is using both UDP and TCP) and purely software side is at full performance (at least when limited to 1G speed…cutting out hardware seems to reach theoretical maximum). I tend to favor saying there is an issue with either the Realtek driver or the time the Realtek driver is having available for running (the latency before a hardware IRQ begins service or the time used during service of the IRQ is implied as the limitation). It’s hard to know without profiling, and I have no way to hardware profile.

I then ran a flood ping on the TX1 to 127.0.0.1 for 30 seconds (a ping going to itself without touching the NIC). Approximately 900000 packets were serviced without loss. The network software, when not going through network hardware, is about 18:1 faster.

Next I ran a 30 second flood ping to the address of the local NIC on the TX1 (both send and receive are serviced by the same NIC and driver…for me this is 192.168.2.30). I actually got about 910000 packets…slightly better throughput…and this is without jetson_clocks.sh. With jetson_clocks.sh the throughput did not seem to change. Unless network software is doing something smart and not actually routing through the NIC hardware (and it might actually be that smart, I don’t know) this also implies that the driver, when running, does what it should if not talking to the protocol stacks (I don’t consider the work of ICMP significant enough to compare to a TCP stack). Perhaps it is the throughput between Realtek driver and protocol stack which is bottlenecked.

Each of the following are between x86_64 host and TX1:

Here are some client/server side commands used (I reverse which side’s address is involved if I reverse roles):

sudo iperf3 -c 192.168.2.2 -p 12345 -t 60 -i 10 -b 1G -R
sudo iperf3 -s -p 12345

No jetson_clocks.sh, server on host:
I see no retries and roughly 492 Mbits/sec.

With jetson_clocks.sh, server on host:
I see no retries and roughly 492 Mbits/sec.

Implies: jetson_clocks.sh makes no difference on speed, and no retries needed either way.

No jetson_clocks.sh, server on TX1:
I see lots of retries, and roughly 639Mbits/sec.

With jetson_clocks.sh, server on TX1:
I see lots of retries, and roughly 652Mbits/sec.

Implies: Marginal throughput improvement. Retries did not significantly change.

ifconfig errors:
When testing is done I see no errors, drops, overruns, etc., on the TX1 side. I see a very large number of dropped RX packets on the host, but no outright errors.

Note that a dropped packet is correct behavior for UDP during congestion, or just from sending faster than the packets can be used (this isn’t a software error per se, but it is a weak link in the chain if something is bottlenecking). TCP also can have dropped packets, but it would retry. It doesn’t mean something isn’t wrong, but it does mean that within its abilities the network is behaving as it should if the retries were a case of congestion. iperf3 is essentially trying to congest the network and measure congestion.

I rebooted the TX1 and re-ran both client and server sides while monitoring ifconfig. No jetson_clocks.sh was used. I got no drops on the TX1.

I used jetson_clocks.sh on the TX1. I re-ran (no reboot) both client and server side again on the TX1. Still, the TX1 does not show any drops. Apparently it is only the host side which is seeing RX drops.

To bring things together from the real world I decided to copy data over the network via netcat. This will simply send as fast as it can and receive as fast as it can. I didn’t want to depend on disk read speed or write speed, so I’m using other sources and destinations.

If you run this command it will read the rootfs partition and redirect it to “/dev/null” and show a time measurement:

# time dd if=/dev/mmcblk0p1 bs=512 > /dev/null
29859840+0 records in
29859840+0 records out
15288238080 bytes (15 GB, 14 GiB) copied, 70.6154 s, <b>216 MB/s</b>

real    1m10.619s
user    0m6.728s
sys     0m28.024s

…216 MB/s (1728Mbit/s). This exceeds gigabit. The important thing is to know this contains 15288238080 bytes.

To simplify this:

# time cat /dev/mmcblk0p1 > /dev/null

real    1m9.582s
user    0m0.024s
sys     0m8.576s

…cutting out dd shows 219715416 bytes/s, or 209.5MB/s (1676Mbit/s…also exceeding gigabit). So we know however we read the raw mmcblk0p1 we get enough throughput to exceed gigabit.

To use netcat read from port 12345 I do this (it saves into “/dev/null”…in other words, it just discards bytes):

nc -p 12345 -l > /dev/null

…restart this after each send completes.

To send mmcblk0p1 over port 12345 without touching the Realtek NIC I use:

# time nc -q 0 127.0.0.1 12345 < /dev/mmcblk0p1

real    1m10.492s
user    0m3.196s
sys     0m38.904s

…this is only one second longer than without netcat. Everything associated with networking, when purely in software, is quite good.

Now lets do this again over the NIC (192.168.2.30 for me is the NIC):

# time nc -q 0 192.168.2.30 12345 < /dev/mmcblk0p1

real    1m10.976s
user    0m2.960s
sys     0m39.224s

…this appears to have almost no overhead when running through the NIC. Once again though, I do not know if the kernel is optimizing when it knows it is local traffic.

So I’ll do this between host and TX1 where I send from TX1 to host (adjust addresses and where the listener runs as required):

# time nc -q 0 192.168.2.2 12345 < /dev/mmcblk0p1

real    3m16.854s
user    0m2.912s
sys     0m47.676s

…clearly, talking to the outside world has a dramatic penalty even when the two are directly connected on the same switch. The actual throughput here is approximately 77684136 bytes/s (around 593Mbit/s).

It happens that there is another reason why I used mmcblk0p1. My host already has this file on it as the system.img.raw. So I can copy the same number of bytes back to the Jetson in the reverse process. Keep in mind that the first time you read a file on a system with lots of RAM it may cache it, and the second time would be faster. Regardless, the rate with or without cache will far exceed gigabit, so it should be a good repeatable test.

So I run the listen on the Jetson this time, and send system.img.raw to the Jetson this time (I don’t use the “-q 0” on host because it is Fedora and doesn’t use this):

# time nc 192.168.2.30 12345 < system.img.raw

real    4m7.859s
user    0m9.581s
sys     0m52.392s

…clearly the TX1 receives slower than it sends when there is a remote host involved. The loss of throughput is real. The problem is that when doing all of this directly on the Jetson the same loss of throughput is not seen. The problem isn’t with the Realtek driver, nor with the TCP stack, nor is it with how the driver is running. Something else is getting in the way and is perhaps an interaction between two parts of the software which does not show up on individual software or driver tests. An example is that additional ARP and negotiations go on between a remote host versus localhost or the NIC on the local machine.

In no case did the ifconfig on eth0 of the Jetson ever show any drops or errors of any kind. I suspect the previously seen drops were from UDP. So now I’ll force UDP.

Listing on the TX1:

# nc -p 12345 -l -u > /dev/null

Sending on the host to the TX1:

# time nc -u 192.168.2.30 12345 < system.img.raw

real    3m44.611s
user    0m8.875s
sys     0m49.094s

…this works out to about 64.9MB/s, or 519Mbit/s. Sending from host to Jetson is slower than the other direction, but it isn’t as dramatic as what shows up under iperf3.

I believe someone needs to throw a network analyzer between the outside host and Jetson and run either netcat or iperf3 to see where any inefficiencies are. It gets too complicated without this and there is no clear single cause. Perhaps it is something simple like MTU/MRU behavior or an interaction from two things occurring simultaneously being an issue, yet not being an issue one at a time.

WayneWWW · March 6, 2018, 3:50am

Hi guys,

Sorry for late reply. This thread looks so long that I may not clearly know current status.

Does this issue reproduce on tx1 devkit? I saw there are some Auvidea J120 carrier boards.

QUams · March 6, 2018, 10:36am

Does this issue reproduce on tx1 devkit? I saw there are some Auvidea J120 carrier boards.

I first stumbled over that problem using the auvidea boards, but all meassurements from my side are done with a TX1 sitting on his dev board and an updated L4T R28.1. The Auvidea boards are just one of the possible recipients during my tests.

linuxdev · March 6, 2018, 7:48pm

My tests on the dev kit with TX1 R28.1 match his tests.

WayneWWW · March 15, 2018, 8:47am

I will try it with my devices too. Thanks.

WayneWWW · March 16, 2018, 6:44am

Hi linuxdev and QUams,

I am not sure if I miss anything. My test is based on first comment of this thread, and the uplink/downlink is around 700/900Mb/s, which is not that low value as 350.

linuxdev · March 16, 2018, 2:48pm

To reproduce be sure to test both directions…once with server on PC, once with server on Jetson (results are explainable as to difference depending on whether server runs on PC versus Jetson):

sudo iperf3 -c 192.168.2.2 -p 12345 -t 60 -i 10 -b 1G -R
sudo iperf3 -s -p 12345

Results change depending on whether server is run on Jetson versus running on host (results should not be asymmetric). A low throughput without retries occurs in one direction…a higher throughput with retries occurs in the other direction (I believe this is just the wording of how the software is reporting the same thing…the side which knows there is a dropped packet might word things differently than the other side).

Actual throughput testing verifies the asymmetric throughput. Netcat can be used to transfer a large amount of data from one end to the other, and at the other end discard the data. I have a rootfs (system.img.raw) on my host which matches the size of mmcblk0p1 on my Jetson…this is my test data. Adjust IP address for your case, but it goes something like this (the requirement for “-q 0” depends on which netcat version you use…Ubuntu and Fedora differ…use this if a completed send does not stop…try with a small file before using a large file…a failure to end the program after data is done sending probably means you need “-q 0”):

# On sender (mmcblk0p1 is if Jetson, system.img.raw is if PC):
time nc ip_address_to_send_to 12345 < /dev/mmcblk0p1
# On receiver:
nc -p 12345 -l -u > /dev/null

In terms of actual throughput, as well as from iperf3, you should see a significant difference in speed depending on which side is sending versus receiving. The Jetson never marks anything as dropped, the PC does.