TX2 USB3 controller bug

Hi,

I’m an engineer from company XIMEA, manufacturer of industrial cameras,
and we have stumbled into an issue that has to do with USB3
controller in TX2. It is somewhat specific to XIMEA cameras but we are
fairly confident that bug lies in TX2, not our device.

Description of the problem goes as follows:
XIMEA USB cameras have 2 bulk endpoints (one IN for streaming image
data, and one IN/OUT for configuration and status readout) that are
used at the same time. After some time (less than a minute with the
test-case we created) one of the IN USB transfer on second endpoint
timeouts. We have used USB analyzer and found out that camera didn’t
send any data because host failed to send ACK packet for the IN
transaction. Camera notified host of the fact that data is available
using ERDY packet and incoming USB transfer was queued using libusb.
Screenshot from USB bus dump is attached. I can send data file itself
also if needed, it is quite big though. To reproduce this issue on
your side you will probably need our camera (any USB3 model will do),
it can arranged. It’s not common for devices to have more than 1 IN/OUT
bulk endpoint, so I wasn’t been able to find any other means to trigger
the issue. Source code of test-case is attached in case you already have
access to a XIMEA camera. Some additional notes:

  • issue is not present on TX1 (or with any other USB3 controller for
    that matter);
  • both L4T 27.1 and 28.1 are affected;
  • problem is reproducible on both Jetson development kit from TX1 and
    our custom carrier;
  • introduction of USB hub between camera and TX2 stops this issue from
    appearing (or at least it’s not easily triggerred anymore).

Regards,
Igor.
xiSample.cpp (1.74 KB)

I am curious, bulk can interrupt and pause, and was not really meant as a control interface. Miscellaneous is usually used for control…is there a specific reason why control is with miscellaneous, e.g., large data sets for control?

Also, before the error hits, what do you see from this particular device’s “lsusb -vvv”? Do you use any custom control software, or is it entirely default Linux software?

What impact does this have on communications with the camera after the timeout? Does it result in lower frame rate, or is the end state more critical (e.g. reset required)?

I am curious before the timeout. As to what effect it might have on the camera, it depends on what the control commands are intended to accomplish, and what might happen if the pause is in the middle of sending a command. Does the command at the camera end proceed with part of the command? Does it wait for the full command?

Added note: When and how pauses occur might change depending on system resources.

Hi parafin,
When the issue happens, do you see anything suspicious in dmesg?

DaneLLL, no, no errors there. Only “tegra_xhci_mbox_work mailbox command 6” messages (increase memory frequency command), but they appear before the issue hits and are there even when everything works. Also want to note that doing “sudo ./jetson_clocks.sh” doesn’t help.

linuxdev, yes, some commands are big, that’s one of the reasons bulk endpoint is used. In any case it’s not an illegal configuration. As for pause on the endpoint - there is no ACK even after several minutes, so I wouldn’t call it a pause, more like a stop;) Used software is proprietary userspace driver (API library) which is using libusb for communication with the device.

AaronL, software assumes that if camera doesn’t answer, then there is an unrecoverable error and resets the device.

lsusb -vvv output for one of the affected devices:

Bus 004 Device 004: ID 20f7:3001  
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0         9
  idVendor           0x20f7 
  idProduct          0x3001 
  bcdDevice            0.00
  iManufacturer           1 XIMEA
  iProduct                2 www.ximea.com
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength           70
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           4
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst              15
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0400  1x 1024 bytes
        bInterval               0
        bMaxBurst               0
Binary Object Store Descriptor:
  bLength                 5
  bDescriptorType        15
  wTotalLength           22
  bNumDeviceCaps          2
  USB 2.0 Extension Device Capability:
    bLength                 7
    bDescriptorType        16
    bDevCapabilityType      2
    bmAttributes   0x00000002
      HIRD Link Power Management (LPM) Supported
  SuperSpeed USB Device Capability:
    bLength                10
    bDescriptorType        16
    bDevCapabilityType      3
    bmAttributes         0x00
    wSpeedsSupported   0x000c
      Device can operate at High Speed (480Mbps)
      Device can operate at SuperSpeed (5Gbps)
    bFunctionalitySupport   3
      Lowest fully-functional device speed is SuperSpeed (5Gbps)
    bU1DevExitLat           0 micro seconds
    bU2DevExitLat           0 micro seconds
Device Status:     0x0000
  (Bus Powered)

@parafin
I have the exact same issue with deploying the ximea camera on my TX2 board.
I haven’t tried the USB Hub workaround but at the moment the unknown ack error pops up for me as well.
@DaneLLK
If this is indeed a tx2 usb problem, please advise.

Below is the error I get:

xiAPI: u3VrReqBulk u3VrReq timeout
xiAPI: u3VrReq error: unknown ack. Reset endpoints.
xiAPI: USB transfer type 2 endpt 129 length 327680 (out of 1048576) failed with status 5!
ReadI2C: VR_READ_I2C err:0

Thanks.

There is an issue about realsense SR300 on r28.1:
[url]https://devtalk.nvidia.com/default/topic/1024056/jetson-tx2/tx2-external-depth-camera-realsense-sr300/post/5210561/#5210561[/url]

It turns out to be some formats are not supported and need to pick commits from upstream uvcvideo drivers. Is it possibly same to ximea cameras?

Hi parafin, so the issue is not seen on r28.1/TX1? Or r24.2.1/TX1?

@DaneLLL
Do you have any suggestions to fixing this issue?

We need more information to do further check. There are many cameras https://www.ximea.com/ Which one is yours? Do you have others to give it a try?

The camera I have is the XiQ USB 3.0 Series. I have two of these units.

DaneLLL, XIMEA cameras aren’t related to uvcvideo driver, as I said API library just uses libusb, which in turn uses usbfs in kernel. This issue is not about camera support, but about USB controller misbehaving with particular USB device configuration. TX1 is not affected with any version of L4T. Any USB3 XIMEA camera can be used for reproduction of the issue, specifically from xiQ and xiC families.

@Parafin
I sent you a separate email about usb hub recommendations as a workaround. Which hubs do u think work reliably for this issue?

Hi parafin,

  • problem is reproducible on both Jetson development kit from TX1 and
    our custom carrier;

Is it a typo?

DaneLLL, no, it’s not. I meant Jetson carrier board from TX1 devkit with TX2 module installed instead of TX1. So P2597 with P3310.

gotenkscha, sorry, I didn’t write down the model of USB hub, once I find it, I will reply to you.

Hi parafin,
Please help give more information:
All XIMEA cameras with 2 bulk endpoints do not work and all with 1 bulk endpoint work on TX2?
All your cameras are working on r28.1/TX1?
All your cameras do go through uvcvideo drivers?
Do you know any other vendor having cameras with 2 bulk endpoints?

All XIMEA USB3 cameras have 2 bulk endpoints. They do work on TX1. No XIMEA cameras go through uvcvideo driver. No, I don’t know any other devices with 2 bulk endpoints.

Hi parafin,
You have mentioned

  • issue is not present on TX1 (or with any other USB3 controller for
    that matter);

Do the cameras work on TX1/r28.1?

Yes, they do.