TX1 pci-tegra hotplug doesn't work

Dear Sir or Madam,
we have a custom HW based on Jetson TX1 where a Cyclon V Soc is connected over PCIe to the TX1. Our SW is based now on the L4T 28.1 with kernel 4.4. We would like to use the PCIe hotplug feature provided by the pci-tegra driver, the gpio NFC_INT PI1 is prepared for this purpose.
Our expectation is, that the kernel loads the pci-tegra driver asap but our PCIe device driver (CycV) will be loaded only after the hotplug gpio is getting active. If we let the pci-tegra driver run before the hotplug gpio is active the driver releases his own driver structure and if later the hotplug gpio triggers the isr the kernel generates panic.
Detailed sequence in the pci-tegra.c:
-pcie->num_ports remains 0 in the tegra_pcie_check_ports() because right after OS boot no PCIe endpoint is found
-because pcie->num_ports == 0 the pcie_delayed_detect() releases the pcie driver structure
-after the hotplug gpio is getting active the gpio_pcie_detect_isr will be called
-the parameter “arg” passed in the isr contains random memory content, which can cause kernel panic in the scheduled work
Is our expectation about the load sequence of the drivers proper?
Thank you for your investigation

Hi attila,
We don’t support pcie hotplug in Linux For Tegra SW release. The developer kit also does not have PRSNT pin routing.

However, it may work if there is GPIO implementation in your HW board. You may grep the two keywords in pci-tegra.c:

tegra_pcie_prsnt_map_override()
"nvidia,presence-detection-gpio"

It may work by adding

pci@1,0 {
	nvidia,num-lanes = <4>;
	status = "okay";
+	nvidia,presence-detection-gpio = <&_YOUR_GPIO_PIN_>;
};

Once again, please kindly notice that the result is not guaranteed.

Although some code is present for hot plug, it is not supported.
I think in this case, what you need (based on my understanding) is deferral of PCIe host controller probe function.
If you know the time by which it needs to be deferred, you can input that in DT through ‘nvidia,boot-detect-delay’.
If you want to make it dynamic i.e. based on some other event, you may have to modify the code to start probing only after that event.
You can use the same framework, i.e. in pcie_delayed_detect() API, you can have wait_for_completion() before proceeding further and the ISR registered for the GPIO which is indicating end point device readiness would send complete()

Hello,
adding the “nvidia,presence-detection-gpio” doesn’t prevent the kernel panic. I tried the “nvidia,boot-detect-delay” option, but this leads again to a kernel panic, at least with another root cause:

[ 7.183674] PC is at reset_control_deassert+0x0/0x38
[ 7.188650] LR is at tegra_pcie_restore_device+0xb0/0x138 [pci_tegra]
[ 7.195075] pc : [] lr : [] pstate: 60000145
[ 7.202453] sp : ffffffc0f96d7bf0
[ 7.205758] x29: ffffffc0f96d7bf0 x28: ffffffc0f96d4000
[ 7.211075] x27: 0000000000000000 x26: ffffffc0f96d4000
[ 7.216391] x25: ffffffbffc00c1c8 x24: 0000000000000001
[ 7.221706] x23: ffffffc0f8442500 x22: 00000001a894a048
[ 7.227018] x21: 0000000000000000 x20: 0000000000000000
[ 7.232331] x19: ffffffc0f9710418 x18: 0000007fe3a3d5c0
[ 7.237644] x17: 000000000042b2f0 x16: ffffffc00011e484
[ 7.242958] x15: 0000007fb471db98 x14: 0ffffffffffffffd
[ 7.248271] x13: 0000000000000038 x12: 0101010101010101
[ 7.253583] x11: 0000000000000005 x10: 0000000000000870
[ 7.258896] x9 : ffffffc0f96d79b0 x8 : ffffffc07993ae50
[ 7.264210] x7 : 0000000000000001 x6 : 000000000e3b20dc
[ 7.269522] x5 : 0000000000000000 x4 : 00000001a8af601e
[ 7.274835] x3 : 0000000000001608 x2 : 0000000000000000
[ 7.280148] x1 : ffffffc0f9710648 x0 : 0000000000000000
[ 7.285460]
[ 7.286946] Process rmmod (pid: 857, stack limit = 0xffffffc0f96d4020)
[ 7.293456] Call trace:
[ 7.295899] [] reset_control_deassert+0x0/0x38
[ 7.301894] [] pm_generic_runtime_resume+0x30/0x50
[ 7.308237] [] pm_genpd_default_restore_state+0xa4/0xc0
[ 7.315011] [] genpd_restore_dev+0x4c/0x64
[ 7.320657] [] pm_genpd_runtime_resume+0xfc/0x1c0
[ 7.326910] [] __rpm_callback+0x38/0x68
[ 7.332296] [] rpm_callback+0x68/0x90
[ 7.337509] [] rpm_resume+0x360/0x568
[ 7.342721] [] __pm_runtime_resume+0x70/0x94
[ 7.348543] [] __device_release_driver+0x38/0xd8
[ 7.354710] [] driver_detach+0x8c/0xc0
[ 7.360010] [] bus_remove_driver+0x90/0xb8
[ 7.365658] [] driver_unregister+0x44/0x50
[ 7.371306] [] platform_driver_unregister+0x10/0x18
[ 7.377753] [] tegra_pcie_exit_driver+0x10/0x38 [pci_tegra]
[ 7.384877] [] SyS_delete_module+0x11c/0x1b8
[ 7.390698] [] __sys_trace_return+0x0/0x4
[ 7.396258] —[ end trace 6a0424cb7498ef71 ]—

I would use the dynamic detection with wait_for_completion() but it seems something goes wrong around the power management (sequence) before the pcie_delayed_detect() is called.

This is the only change you need to do

diff --git a/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi b/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
index 8322ffec5626..2dd4f1378373 100644
--- a/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
+++ b/kernel-dts/tegra210-soc/tegra210-soc-base.dtsi
@@ -1691,6 +1691,8 @@
 
         iommus = <&smmu TEGRA_SWGROUP_AFI>;
 
+        nvidia,boot-detect-delay = <10000>;    /* delay probe by 10 sec */
+
         bus-range = <0x00 0xff>;
         #address-cells = <3>;
         #size-cells = <2>;

Is this what you have done too?

I modified the code where the “nvidia,boot-detect-delay” value is parsed rather than the device tree, but it must have the same effect.

Hello,
I tried the dynamic delayed probing of the pci-tegra based on wait_for_completion() again and it seems now to work.
Thanks and regards