Performance of multiple GigE cameras through TX2 dev kit board

I am looking at the TX2 and carrier boards for a prototype requiring three GigE cameras with highest achievable throughput, specifically these:
1x Allied Vision Mako G-503C
2x Allied Vision Mako G-030C

MTU for each camera is 8228, and they will ideally be pushing through up to 14-15K packets/s with minimal loss with sustained traffic.

I have looked at ConnectTech’s Cogswell board, which looks like it may work - I am trying to get real world performance test results from them.

I also would like to know if it’s possible with the board that comes with the dev kit. I would think we could use the dev kit along with a PCIe quad PoE NIC like this one: https://www.neousys-tech.com/en/product/application/machine-vision/pcie-poe354at

And perhaps a PCIe riser if needed.

Has anyone driven a similar scenario or concurrent throughput test through the TX2 or 1 dev kit and survived to tell the tale?

Thanks for your help!

Jeremy

1 Like

Just a comment: PCIe x4 is a good idea, but PoE, when the power comes directly from the PCIe slot, could complicate your life. I would look for solutions which allow power to come from something other than the PCIe slot.

1 Like

Thanks - would you mind expanding on this? On previous attempt with ODROID XU4 we did end up with a separate power PCB, but given that we had to replace the board anyway, I was a bit excited to have PoE integrated for the simplicity - less space, less wires, etc.

And any reason this would be different for the Cogswell? Specifically marketed for PoE on 4 ports that go through x4 PCIe switch. Is there something specific about the nvidia board that’s stinky in this regard?

Power delivery on the Jetson is not as beefed up as on a desktop PC. It is quite possible the board can provide enough power, but it just seems like asking for trouble considering up to four devices might need power. This would of course depend on the carrier board, but testing would be needed in every case…external power removes one part of compatibility from the test requirements.

Having four root ports on a PCIe x4 would be far better in the sense of data throughput versus a 4-port switch, which I like. I do not have any recommendation as to which PCIe multi-port cards would work well though, or work without adding a driver module (have only a single port Realtek to look at).

Thanks for your insight.

I did end up ordering the TX2 dev kit and the PoE NIC I mentioned above (https://www.neousys-tech.com/en/product/application/machine-vision/pcie-poe354at). I will test it out with the 3 cameras and report back my findings. If it can’t power it, the NIC also has a 12V input we will use to at least standardize the power (since we can).

Will let you know…

1 Like

The three cameras have been capturing on the NIC for hours straight at full bandwidth. So far powering by the bus alone has been successful. (This also means the whole bit can likely be powered directly by 4S LiPo without any additional regulation for separate PoE)

bmon:

Interfaces                     x RX bps       pps     %x TX bps       pps     %
  enp1s0f0                     x 115.40MiB  14.71K     x    240B        4
    qdisc none (mq)            x      0         0      x    240B        4
      class :1 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 814.91KiB  14.08K
      class :2 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 398.90KiB   6.85K
      class :3 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 365.88KiB   6.58K
      class :4 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x   3.11MiB  53.15K
      class :5 (mq)            x      0         0      x     65B        1
        qdisc none (pfifo_fast)x      0         0      x     65B        1
      class :6 (mq)            x      0         0      x    174B        3
        qdisc none (pfifo_fast)x      0         0      x    174B        3
      class :7 (mq)            x      0         0      x      0         0
      class :8 (mq)            x      0         0      x      0         0
  enp1s0f1                     x 112.96MiB  14.73K     x    225B        3
    qdisc none (mq)            x      0         0      x    262B        4
      class :1 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 814.32KiB  14.07K
      class :2 (mq)            x      0         0      x     57B        0
        qdisc none (pfifo_fast)x      0         0      x 398.62KiB   6.84K
      class :3 (mq)            x      0         0      x    167B        2
        qdisc none (pfifo_fast)x      0         0      x 365.67KiB   6.58K
      class :4 (mq)            x      0         0      x     37B        0
        qdisc none (pfifo_fast)x      0         0      x   3.11MiB  53.11K
      class :5 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x     51B        0
      class :6 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x    173B        2
      class :7 (mq)            x      0         0      x      0         0
      class :8 (mq)            x      0         0      x      0         0
  enp1s0f2                     x 112.84MiB  14.72K     x    262B        4
    qdisc none (mq)            x      0         0      x    262B        4
      class :1 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 814.32KiB  14.07K
      class :2 (mq)            x      0         0      x    153B        2
        qdisc none (pfifo_fast)x      0         0      x 398.62KiB   6.84K
      class :3 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x 365.67KiB   6.58K
      class :4 (mq)            x      0         0      x    109B        1
        qdisc none (pfifo_fast)x      0         0      x   3.11MiB  53.11K
      class :5 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x     51B        0
      class :6 (mq)            x      0         0      x      0         0
        qdisc none (pfifo_fast)x      0         0      x    173B        2
      class :7 (mq)            x      0         0      x      0         0
      class :8 (mq)            x      0         0      x      0         0

For reference, I pinned the soft interrupt handling for rx queues to separate cores - f0 to core 3, f1 to 4, f2 to 5, and assigned receiving application instances to same cores.

I also set rx-usec to 1000 (it defaulted to 3). I may experiment with higher. Latency is not very important in this case. I do still get occasional incomplete frame, but no more continuously growing backlog of packets.

I did try to set smp_affinity, and that fails with Input/output errors (??). From reading (maybe even a post from you @linuxdev?), I’m wondering if this is a limitation of ARM for hardware interrupt balancing. In any case CPU 0 gets hit pretty hard, but apparently not overwhelmed. Having the applications and ksoftirq’s on separate cores may be helping with that - I’ve only tried happy path at this point trying to get positive results as fast as possible.

So if you need to prototype using GigE Vision I/O like this (at least with these AvT cameras), you can probably make due with the dev kit carrier.

Thanks for your help. I’ll update if it makes me cry.

It sounds like you are getting best use of all of the cores…your combination of affinity for soft irq and user space applications would be correct.

Anything needing a hardware IRQ is likely to fail on any CPU core other than core 0. If there is no wiring to another core, then that core cannot handle knowledge of that interrupt (core 0 could have a driver which in turn generates a software IRQ sending the remainder of work to a new core, but initial data from talking to hardware must go through core 0). On an Intel CPU there is a chip known as the I/O APIC (asynchronous programmable interrupt controller). This is responsible for receiving an interrupt and distributing to cores other than core 0. Intel cores have some hardware support for interacting with the I/O APIC. On an AMD CPU the cores themselves have a bit different topology, and directly support distributing hardware IRQs to different cores.

For ARM to use different cores there would need to be the equivalent of an I/O APIC, but none of the Tegra devices have this. I do not know if it is even possible…as mentioned above an Intel core has some hardware support to make it possible to add the I/O APIC, and the architecture under AMD directly supports this. If a Tegra SoC had this the performance would be leaps and bounds ahead of current technology without even modifying current cores to be faster…I/O and latency becoming closer to a high quality soft real time would drastically improve under high load. Current Tegra/ARM devices are essentially still single core when it comes to handling random I/O from the outside world.

If you were to watch “/proc/interrupts” via something like “watch -n 1 /proc/interrupts” you would see a lot of interrupts going to core 0. If from an outside Linux host you were to run “sudo ping -f <address_of_jetson>” to hit the Jetson with maximum rate only core 0 will be handling the interrupts from the ethernet. If you were to do the equivalent with a highly threaded user space program making use of software drivers (e.g., protocol features), and if that program had no need of hardware access, I believe you would start seeing other cores servicing many more interrupts. Anything user space, and anything in kernel space not requiring hardware access can use those other cores. Some special hardware access may be wired to other cores, but this is specific case and not generally available for I/O.

I do sometimes see people wanting to convert a software-only process to use all cores, but other than offloading away from core 0, this may not be the best thing to do in some cases. The scheduler is doing the correct thing in these cases to use a single core with a particular process. A single-threaded app will get cache hits when going to a single core, but as soon as that one process starts migrating to different cores you will no longer have an ability to get a cache hit unless you are very lucky (e.g., perhaps an unrelated process accessed the same memory and the memory is not exclusive to the process…this is rare, not common). It is a good idea to keep core affinity in cases where cache might be of benefit; it is a good idea to distribute across cores when multiple threads will not use the same data (and thus there is no possibility of a cache hit when on a single core). I suspect your use of affinity is correct and best practice where affinity is possible.

Do I have to install any packages on Jetson TX2 for 4-Port x4 PCI-E Gigabit Power over Ethernet Frame Grabber Card

Probably you would need a driver, but it depends on which chipset is used. Desktop distributions typically install more modules than do embedded systems (e.g., I have NVIDIA video hardware, yet I still get AMD modules…getting either on a Jetson would be a bit silly since the GPU is hard wired to the memory controller and can’t use a PCI driver for internal GPU). Basically you need to find out what driver is used for the desktop PC version, and then see if the driver can be built for a Jetson. In most cases, if the driver works on a PC, then you can build it for a Jetson.

I am using same PoE NIC mentioned above (https://www.neousys-tech.com/en/product/application/machine-vision/pcie-poe354at) by jeremy0mxxp. I am using GIGE cameras from the imaging source. (DFK 33GX236 - GigE color industrial camera)

Just wanted to check with jeremy0mxxp, if he can share some reference materials to connect GIGE cameras with POE and POE to Jetsont TX2.

I want to connect a camera with ethernet controller through pci-e on jetson tx2. but it is not working although i installed enough driver

You would have to give a lot more detail on (a) what the PCIe hardware is, e.g., an ethernet controller or a USB controller, and (b) how you are trying to access the camera (for example, if you have software designed for a USB camera and “/dev/video0”, then this won’t work for an ethernet camera and the driver produces a different kind of access).

so is the frame grabber necessary when using PoE cameras

You will probably want to file a new topic for that question since (although both involve PoE cameras) the actual question is quite different from the original topic. Incidentally, PoE is just power over ethernet, and the camera (or any other device talking over ethernet) is an independent topic.