Ubuntu PCIe driver port to L4T gets unhandled context fault

Hello. We have a PCIe driver that works on Unbuntu 14.04 on a PC that we’d like to get running on the TX2. After compiling against the L4T 32.1 header files and finding the need to disable ASPM, we’re running into this unhandled context fault:

Apr  8 00:59:54 TX2 kernel: [17690.030234] arm-smmu 12000000.iommu: Unhandled context fault: iova=0x8f702000, fsynr=0x280001, cb=21, sid=17(0x11 - AFI), pgd=0, pud=0, pmd=0, pte=0
Apr  8 00:59:54 TX2 kernel: [17690.183971] irq 66: nobody cared (try booting with the "irqpoll" option)
Apr  8 00:59:54 TX2 kernel: [17690.190669] CPU: 0 PID: 11121 Comm: XA-160M DAQ Tainted: G           O    4.9.140-tegra #1
Apr  8 00:59:54 TX2 kernel: [17690.190671] Hardware name: quill (DT)
Apr  8 00:59:54 TX2 kernel: [17690.190673] Call trace:
Apr  8 00:59:54 TX2 kernel: [17690.190681] [<ffffff800808bdb8>] dump_backtrace+0x0/0x198
Apr  8 00:59:54 TX2 kernel: [17690.190685] [<ffffff800808c37c>] show_stack+0x24/0x30
Apr  8 00:59:54 TX2 kernel: [17690.190689] [<ffffff800845baa0>] dump_stack+0x98/0xc0
Apr  8 00:59:54 TX2 kernel: [17690.190694] [<ffffff800812595c>] __report_bad_irq+0x3c/0xf8
Apr  8 00:59:54 TX2 kernel: [17690.190696] [<ffffff8008125dc0>] note_interrupt+0x2c8/0x318
Apr  8 00:59:54 TX2 kernel: [17690.190699] [<ffffff8008122c58>] handle_irq_event_percpu+0x50/0x60
Apr  8 00:59:54 TX2 kernel: [17690.190700] [<ffffff8008122cb8>] handle_irq_event+0x50/0x80
Apr  8 00:59:54 TX2 kernel: [17690.190703] [<ffffff8008126a80>] handle_fasteoi_irq+0xc8/0x1b8
Apr  8 00:59:54 TX2 kernel: [17690.190705] [<ffffff80081219d4>] generic_handle_irq+0x34/0x50
Apr  8 00:59:54 TX2 kernel: [17690.190707] [<ffffff80081220b8>] __handle_domain_irq+0x68/0xc0
Apr  8 00:59:54 TX2 kernel: [17690.190709] [<ffffff8008080d44>] gic_handle_irq+0x5c/0xb0
Apr  8 00:59:54 TX2 kernel: [17690.190711] [<ffffff8008082be8>] el1_irq+0xe8/0x18c
Apr  8 00:59:54 TX2 kernel: [17690.190714] [<ffffff80080bb298>] irq_exit+0xd0/0x118
Apr  8 00:59:54 TX2 kernel: [17690.190716] [<ffffff80081220bc>] __handle_domain_irq+0x6c/0xc0
Apr  8 00:59:54 TX2 kernel: [17690.190718] [<ffffff8008080d44>] gic_handle_irq+0x5c/0xb0
Apr  8 00:59:54 TX2 kernel: [17690.190720] [<ffffff8008082be8>] el1_irq+0xe8/0x18c
Apr  8 00:59:54 TX2 kernel: [17690.190721] handlers:
Apr  8 00:59:54 TX2 kernel: [17690.192990] [<ffffff8008c39bf0>] tegra_mcerr_hard_irq threaded [<ffffff8008c39da0>] tegra_mcerr_thread
Apr  8 00:59:54 TX2 kernel: [17690.202298] Disabling IRQ #66
Apr  8 00:59:54 TX2 kernel: [17690.205312] mc-err: (255) csr_afir: EMEM address decode error
Apr  8 00:59:54 TX2 kernel: [17690.211126] mc-err:   status = 0x2032700e; addr = 0x3ffffffc0
Apr  8 00:59:54 TX2 kernel: [17690.216989] mc-err:   secure: yes, access-type: read

The driver uses DMA transfers to an ADC/DAC PCIe card. The TX2 is running L4T 32.1 from JetPak 4.2. Since the driver works on an Intel PC with Ubuntu, I thought there might be an endian issue. Code inspection has not found an issue yet, but I may have missed something. Is this an smmu or iommu setup issue?

Note the iova address is the beginning of one of the two buffers allocated for DMA.

Unfortunately the driver author is no longer with the company and I’m new to the project and kernel level debugging. I’d appreciate any tips on debugging this problem. Is there information available on how the TX2 hardware differs from a typical PC (specifically regarding memory management and how to manage DMA buffers)?

Thanks in advance.

Whoops. Forgot to mark this as resolved. We disabled the SMMU for PCIe using the .dtsi file change shown in comment #7 in https://devtalk.nvidia.com/default/topic/1026334/pcie-dma-problem-between-tx2-amp-fpga/