PCIe silicon bug to cause denial of service on Tegra X1 and X2 ?
THIS IS SEPARATE ISSUE FROM PCIe HANGS THE SoC bug described here: https://devtalk.nvidia.com/default/topic/1030004/jetson-tx2/pcie-silicon-bug-to-immediately-hang-a-tegra-x1-and-x2-/ We have found another strange behavior of the PCIe root complex on the Tegra platform. Lets connect a FPGA, with a simple debug core which can issue MRd32 requests and receive completions. Reading of 0x80000000 address initially works on both TX1 and TX2 for indefinite time. Now fire a read request to 0x08000000 (TX2) or 0x00800000 (TX1) - the trigger address - this will not return, a completion timeout happens (we have 30usec timeout). Consequent reads of the original working address (0x80000000) work until a certain point - each 128-th read (TX2) or 64-th read (TX1) will not be completed by the Tegra. Furthermore, after about 12000 reads (TX2) or 2700 reads (TX1) the PCIe host is in a blocked state, not allowing to issue further MRd32 commands. This was tested on R28.1. The address which triggers the issue is not listed in the /proc/iomem file. I would be happy to see a completion timeout on such non-assigned region, but that it would break the whole PCIe subsystem, that is unexpected. [b]Is anywhere an errata for the Tegra SoCs to see if that is duplicate or new bug?[/b]
THIS IS SEPARATE ISSUE FROM PCIe HANGS THE SoC bug described here:
https://devtalk.nvidia.com/default/topic/1030004/jetson-tx2/pcie-silicon-bug-to-immediately-hang-a-tegra-x1-and-x2-/

We have found another strange behavior of the PCIe root complex on the Tegra platform.

Lets connect a FPGA, with a simple debug core which can issue MRd32 requests and receive completions.

Reading of 0x80000000 address initially works on both TX1 and TX2 for indefinite time. Now fire a read request to 0x08000000 (TX2) or 0x00800000 (TX1) - the trigger address - this will not return, a completion timeout happens (we have 30usec timeout). Consequent reads of the original working address (0x80000000) work until a certain point - each 128-th read (TX2) or 64-th read (TX1) will not be completed by the Tegra. Furthermore, after about 12000 reads (TX2) or 2700 reads (TX1) the PCIe host is in a blocked state, not allowing to issue further MRd32 commands. This was tested on R28.1.

The address which triggers the issue is not listed in the /proc/iomem file.
I would be happy to see a completion timeout on such non-assigned region, but that it would break the whole PCIe subsystem, that is unexpected.

Is anywhere an errata for the Tegra SoCs to see if that is duplicate or new bug?

#1
Posted 02/14/2018 05:32 PM   
Enabling SMMU for PCIe (R28.2 has it) would result in SMMU error when an unassigned address is being accessed by PCIe. We will check on your observations and get back to you. Since real end points (off the shelf devices available in market I meant) don't go on issuing further reads when completion for previous read is not received, this issue is not seen earlier.
Enabling SMMU for PCIe (R28.2 has it) would result in SMMU error when an unassigned address is being accessed by PCIe. We will check on your observations and get back to you. Since real end points (off the shelf devices available in market I meant) don't go on issuing further reads when completion for previous read is not received, this issue is not seen earlier.

#2
Posted 02/14/2018 06:29 PM   
Scroll To Top

Add Reply