What is the current status of PCIe DMA?

Hi

We have a board which has an FPGA connected via PCIe.

We have a working driver and DMA solution on R23.2

We have updated to R24.2 and now our DMA doesn’t work

I have searched the forum for PCIe related posts and there are so many that there looks to have been a monumental screw up with the PCIe DMA/SMMU. Can somebody please explain to me exactly what patches are required or when nvidia plan to fix this?

Thanks

We have enabled SMMU for PCIe in R24.2 and that should be transparent w.r.t PCIe endpoint’s driver.
May I know what is the issue you are facing exactly and would you be able to share the logs? (you may probably need to enable higher log level ‘dmesg -n 8’ before loading your driver)

Hi Vidyas

I have checked back and it turns out that I was using R24.1 not R23.2, howerver the issue still remains when upgrading from R24.1 to R24.2.1 - my PCIe driver does not work for some reason :-(

Apologies in advance for the long post but I will try to make things as clear as I can to help debug the issue.

During development we were using a Jetson TX1 board and a Topic Miami Xilinx Zynq 7030 PCIe card in the x4 PCIe slot and we were using Jetpack 2.2 with 64 bit userspace which uses kernel 24.1 from May 2016

I’ve just gone back to the Jetson TX1 setup and verified that all was still working and have supplied the output correct output in the paragraphs below…

I will try to describe the way the system works a little bit…

We have an FPGA implementation that is based on Xilinx application note xapp1052 which is for a bus mastering DMA endpoint. This utilises the hardware PCIe core on the Xilinx Zynq 7030 to present an endpoint that can bus master the TX1 memory. Our FPGA implementation accepts raw input video frames from the TX1 over the PCIe which are analysed and the results are returned back to the TX1 over PCIe (or simple 32-bit word inverting for test purposes)

The driver is very simple. It gets the PCIe device context from the vendor and device id that our endpoint reports.

R24.1

sudo lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Memory controller: Xilinx Corporation Device 7028

After some initialisation the driver mallocs some contiguous memory using pci_alloc_consistent() for the endpoint to DMA in and out of.

Some debug output of my driver during loading

[  149.771331] xbmd: Init: Base hw val 13000000
[  149.771346] xbmd: Init: Base hw len 2048
[  149.771483] xbmd: Init: Virt HW address FFFFFF800AD76000
[  149.771491] xbmd: Init: Device IRQ: 130
[  149.771506] xbmd: Init: Initialize Hardware Done..
[  149.771514] xbmd: ISR Setup..(Forcing to 130)
[  149.771550] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[  149.771576] xbmd: Read Buffer Allocation: FFFFFFC05E800000->DE800000
[  149.771586] xbmd: Write Buffer Allocation: FFFFFFC05EC00000->DEC00000
[  149.771599] xbmd: Init: module registered
[  149.771605] xbmd driver is loaded

The FPGA implementation is quite simple, for reading you set a read address and a length in the FPGAs PCIe config registers then trigger the read to start by setting another bit in a another register. Any write() calls into the driver simply fill the contents of the dma coherent write buffer with the input data and any read() calls read the contents of the dma coherent read buffer into output data buffer. ioctl() calls are used to start a transfer once the write buffer is filled and also to get the return data back from the FPGA before reading the read buffer.

In our test example we send a frame of I420 YUV 1080p (4MB) video over PCIe to the Zynq FPGA in 4096 byte chunks. Once the whole frame is received the FPGA inverts the first 32-bit data value ( xors with 0xffffffff ) of each chunk then sends the whole frame back to the TX1 via the PCIe in 4096 byte chunks. The video data is not actually real video data but a completely blank frame (all 0x00s) but with an incrementing count value inserted every 4096 bytes. This means that the first four bytes of any 4096 byte PCIe data block can be printed in the driver to make sure the data is correct.

For example the first blocks out will be

0x00 0x00 0x00 0x00 0x00 0x00 0x00 ...
0x01 0x00 0x00 0x00 0x00 0x00 0x00 ...
0x02 0x00 0x00 0x00 0x00 0x00 0x00 ...
etc etc

On the way back the sequence bytes are inverted so the data is

0xff 0xff 0xff 0xff 0x00 0x00 0x00 ...
0xfe 0xff 0xff 0xff 0x00 0x00 0x00 ... 
0xfd 0xff 0xff 0xff 0x00 0x00 0x00 ... 
etc etc

I added dmesg output from a working driver during transfer to validate the PCIe transfers

Write (TX1->FPGA)

[  175.080223] Doing a block write of 4149248 bytes from 0xde800000 in 4096 byte chunks from X1 into FPGA endpoint
[  175.094384] (0000) DONE 4096 bytes from X1 to FPGA at 0xde800000 - 0x00 0x00 0x00 0x00
[  175.102513] (0001) DONE 4096 bytes from X1 to FPGA at 0xde801000 - 0x01 0x00 0x00 0x00
[  175.110611] (0002) DONE 4096 bytes from X1 to FPGA at 0xde802000 - 0x02 0x00 0x00 0x00
[  175.118704] (0003) DONE 4096 bytes from X1 to FPGA at 0xde803000 - 0x03 0x00 0x00 0x00
...
[  183.320414] (1009) DONE 4096 bytes from X1 to FPGA at 0xdebf1000 - 0xf1 0x03 0x00 0x00
[  183.328859] (1010) DONE 4096 bytes from X1 to FPGA at 0xdebf2000 - 0xf2 0x03 0x00 0x00
[  183.336996] (1011) DONE 4096 bytes from X1 to FPGA at 0xdebf3000 - 0xf3 0x03 0x00 0x00
[  183.345077] (1012) DONE 4096 bytes from X1 to FPGA at 0xdebf4000 - 0xf4 0x03 0x00 0x00
[  183.353029] DONE 4149248 bytes from X1 to FPGA at 0x003f5000 in in 4096 byte chunks

In the FPGA I can verify that the count sequence is received by inspecting the data in the debugger

Read (FPGA->TX1)

[  183.360723] Doing a block read of 4149248 bytes from 0xdec00000 in 4096 byte chunks from FPGA endpoint to X1
[  183.370746] (0000) DONE 4096 bytes read into X1 at 0xdec00000 - 0xff 0xff 0xff 0xff
[  183.378596] (0001) DONE 4096 bytes read into X1 at 0xdec01000 - 0xfe 0xff 0xff 0xff
[  183.386776] (0002) DONE 4096 bytes read into X1 at 0xdec02000 - 0xfd 0xff 0xff 0xff
[  183.394807] (0003) DONE 4096 bytes read into X1 at 0xdec03000 - 0xfc 0xff 0xff 0xff
...
[  191.342456] (1009) DONE 4096 bytes read into X1 at 0xdeff1000 - 0x0e 0xfc 0xff 0xff
[  191.350307] (1010) DONE 4096 bytes read into X1 at 0xdeff2000 - 0x0d 0xfc 0xff 0xff
[  191.358155] (1011) DONE 4096 bytes read into X1 at 0xdeff3000 - 0x0c 0xfc 0xff 0xff
[  191.366007] (1012) DONE 4096 bytes read into X1 at 0xdeff4000 - 0x0b 0xfc 0xff 0xff
[  191.373654] DONE 4149248 bytes read into X1 at 0xdec00000 in in 4096 byte chunks

As you can see the data is being received in the FPGA, the counter value is inverted and data is arriving back in the TX1 memory all in good order.

R24.2.1

So I reflashed the MB1 board with Jetpack 2.3.1

# R24 (release), REVISION: 2.1, GCID: 8028265, BOARD: t210ref, EABI: aarch64, DATE: Thu Nov 10 03:51:59 UTC 2016
sudo lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Memory controller: Xilinx Corporation Device 7028

Driver output during load

[  282.946485] xbmd: Init: Base hw val 13000000
[  282.946495] xbmd: Init: Base hw len 2048
[  282.946746] xbmd: Init: Virt HW address FFFFFF8009FFE000
[  282.946753] xbmd: Init: Device IRQ: 130
[  282.946762] xbmd: Init: Initialize Hardware Done..
[  282.946766] xbmd: ISR Setup..(Forcing to 130)
[  282.946795] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[  282.946812] xbmd: Read Buffer Allocation: FFFFFFC07E000000->FE000000
[  282.946818] xbmd: Write Buffer Allocation: FFFFFFC07E400000->FE400000
[  282.946826] xbmd: Init: module registered
[  282.946830] xbmd driver is loaded

So the physical address of the coherent dma buffer is the only real change I guess

Write (TX1->FPGA)

[  396.510420] Doing a block write of 4149248 bytes from 0xfe000000 in 4096 byte chunks from X1 into FPGA endpoint
[  396.524091] (0000) DONE 4096 bytes from X1 to FPGA at 0xfe000000 - 0x00 0x00 0x00 0x00
[  396.524093] smmu_dump_pagetable(): fault_address=0x00000000fe000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  396.524098] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[  396.524100] mc-err:   status = 0x6000000e; addr = 0xfe000000
[  396.524104] mc-err:   secure: no, access-type: read, SMMU fault: nr-nw-s
[  396.562349] smmu_dump_pagetable(): fault_address=0x00000000fe001000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[  396.562467] (0001) DONE 4096 bytes from X1 to FPGA at 0xfe001000 - 0x01 0x00 0x00 0x00
[  396.562643] (0002) DONE 4096 bytes from X1 to FPGA at 0xfe002000 - 0x02 0x00 0x00 0x00
[  396.562817] (0003) DONE 4096 bytes from X1 to FPGA at 0xfe003000 - 0x03 0x00 0x00 0x00
[  396.562991] (0004) DONE 4096 bytes from X1 to FPGA at 0xfe004000 - 0x04 0x00 0x00 0x00

I guess alarm bells should start ringing because of the page faults but I don’t really understand what is happening here.

In the FPGA instead of the count sequence all of the bytes are 0xFFFFFFFF

Because of this I changed the FPGA code to regenerate the count sequence instead of just inverting the received buffer so that I could send something I could verify on the TX1 side

Read (FPGA->TX1)

[  399.572657] Doing a block read of 4149248 bytes from 0xfe400000 in 4096 byte chunks from FPGA endpoint to X1
[  399.572903] (0000) DONE 4096 bytes read into X1 at 0xfe400000 - 0x00 0x00 0x00 0x00
[  399.573140] (0001) DONE 4096 bytes read into X1 at 0xfe401000 - 0x00 0x00 0x00 0x00
[  399.573422] (0002) DONE 4096 bytes read into X1 at 0xfe402000 - 0x00 0x00 0x00 0x00
[  399.573674] (0003) DONE 4096 bytes read into X1 at 0xfe403000 - 0x00 0x00 0x00 0x00

So in the TX1 instead of the count sequence all I read back is zeros. :-(

So…

With kernel R24.1 my driver works perfectly.

With Kernel R24.2.1 my driver does not work with DMA reads or writes.

Any suggestions?

Robert

Can I just add that if I comment out the following line from the pcie-controller section of the device tree in R24.2.1 it all starts working again. :-)

iommus = <&smmu TEGRA_SWGROUP_AFI>;

So I guess either that’s the end of things or some explanation of how to get my driver working with the pcie controller enabled in the SMMU?

Robert

Removing “iommus = <&smmu TEGRA_SWGROUP_AFI>;” from PCIe node disables SMMU for PCIe which is equivalent to R23.2 as far as PCIe is concerned. So, no wonder that is working :-)
Even with SMMU enabled, ideally, there shouldn’t be any issue.
Can you please confirm the following?
When memory is allocated using dma_alloc_coherent(), you have to give bus address to PCIe device to be used for data read/writes i.e. ‘dma_handle’, third argument in API.
From your update in #3, it appears to me that you might be using physical address instead of bus address. (NOTE:- if there is no SMMU enabled for PCIe, both physical address and bus address will be same, whereas, with SMMU enabled for PCIe, end point should access TX1’s system memory using bus address which in turn gets converted to physical address by SMMU)
Another reason why I have this doubt is because, once SMMU gets enabled for PCIe, bus address (which is given to end point’s DMA) will always be in 0x80000000~0xFFF00000 region, whereas, in your case, it is 0xFE000000 and 0xFE400000, which are although within that region, but wondering why would allocation happens towards the end of the region when there is no other big-chunk allocations happening in the system… that tells me, that you might be giving physical address to DMA instead of bus address.

Hi Vidyas

I’m pretty sure I’m using the correct address - here is a snippet of the code

// Allocate the read buffer with size BUF_SIZE and return the starting address
  gReadBuffer = pci_alloc_consistent(gDev, BUF_SIZE, &gReadHWAddr);
  if (NULL == gReadBuffer) {
    printk(KERN_CRIT"%s: Init: Unable to allocate gBuffer.\n",gDrvrName);
    return (CRIT_ERR);
  }
  // Print Read buffer size and address to kernel log
  printk(KERN_INFO"%s: Read Buffer Allocation: %llX->%llX\n", gDrvrName, (u64)gReadBuffer, (u64)gReadHWAddr);

  // Allocate the write buffer with size BUF_SIZE and return the starting address
  gWriteBuffer = pci_alloc_consistent(gDev, BUF_SIZE, &gWriteHWAddr);
  if (NULL == gWriteBuffer) {
    printk(KERN_CRIT"%s: Init: Unable to allocate gBuffer.\n",gDrvrName);
    return (CRIT_ERR);
  }
  // Print Write buffer size and address to kernel log  
  printk(KERN_INFO"%s: Write Buffer Allocation: %llX->%llX\n", gDrvrName, (u64)gWriteBuffer, (u64)gWriteHWAddr);

If you look at the debug you will see gReadHWAddr and gWriteHWAddr are passed as dma_handle and it’s these that I am using.

What is the difference between physical addresses and bus addresses? As far as I was aware there are virtual kernel addresses e.g. 0xFFFFFFC07E000000 and their corresponding physical address 0xFE000000 for the read buffer shown below?

[  282.946812] xbmd: Read Buffer Allocation: FFFFFFC07E000000-><b><u>FE000000</u></b>
[  282.946818] xbmd: Write Buffer Allocation: FFFFFFC07E400000-><b><u>FE400000</u></b>

I didn’t mention this before but I also pass coherent_pool=16M on the kernel command line. Would this explain anything?

Hi mrbmcg,
Your code looks fine. Can you also confirm you are passing a valid ‘gDev’ here? that is key for creating a SMMU mapping for PCIe and I don’t think coherent_pool=16M is playing any role here at this point in time

Hi Vidyas

I’m pretty sure the gDev is valid.

# lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Memory controller: Xilinx Corporation Device 7028

# lspci -n
00:01.0 0604: 10de:0fae (rev a1)
01:00.0 0580: <b><u>10ee:7028</u></b>

The code below is used to obtain it

// Defines the Vendor ID.  Must be changed if core generated did not set the Vendor ID to the same value
#define PCI_VENDOR_ID_XILINX      <b><u>0x10ee</u></b>

// Defines the Device ID.  Must be changed if core generated did not set the Device ID to the same value
#define PCI_DEVICE_ID_XILINX_PCIE <b><u>0x7028</u></b>

...

  // Find the Xilinx EP device.
  gDev = pci_get_device (PCI_VENDOR_ID_XILINX, PCI_DEVICE_ID_XILINX_PCIE, gDev);
  if (NULL == gDev) {

    // If a matching device or vendor ID is not found, return failure and update kernel log. 
    // NOTE: In fedora systems, the kernel log is located at: /var/log/messages
    printk(KERN_WARNING"%s: Init: Hardware not found.\n", gDrvrName);
    return (CRIT_ERR);
  }

...

Robert

Sorry for the delay in response.
Can you please dump IOVA info after your driver is loaded? using

cat /sys/kernel/debug/70019000.iommu/as008/iovainfo

before dumping please check as008 is the correct one for PCIe. Just going into that directory and seeing the entries would tell us which module it is representing.

Hi Vidyas

I’m working on another project for a day or so but I will get back to you when I set the board up again.

Robert

Hi

This is for when the SMMU is enabled I guess?

My driver is called xbmd so it is shown loaded below…

# lsmod
Module                  Size  Used by
xbmd                   15394  0
GobiSerial             13068  0
GobiNet                81896  0
bcmdhd               7464928  0
cfg80211              450459  1 bcmdhd
bluedroid_pm           11196  0
# cd /sys/kernel/debug/70019000.iommu/
# cd as008/
# ls
0000:00:01.0             iova_dump                iovainfo
1003000.pcie-controller  iova_to_phys
# cat /sys/kernel/debug/70019000.iommu/as008/iovainfo
---[ 0x0000:0000 ]---
---[ 0x4000:0000 ]---
---[ 0x8000:0000 ]---
---[ 0xc000:0000 ]---

Edited to add…

With the SMMU disabled

# cat /sys/kernel/debug/70019000.iommu/as008/iovainfo
---[ 0x0000:0000 ]---
---[ 0x4000:0000 ]---
---[ 0x8000:0000 ]---
0x80000000-0x80001000           4K RW-
0x80002000-0x80003000           4K RW-
---[ 0xc000:0000 ]---
#

—[ 0x0000:0000 ]—
—[ 0x4000:0000 ]—
—[ 0x8000:0000 ]—
—[ 0xc000:0000 ]—
Above looks like the case of SMMU being disabled and
—[ 0x0000:0000 ]—
—[ 0x4000:0000 ]—
—[ 0x8000:0000 ]—
0x80000000-0x80001000 4K RW-
0x80002000-0x80003000 4K RW-
—[ 0xc000:0000 ]—
seems to be the case of SMMU being enabled and mappings are created @ 0x80000000 and 0x80002000 each of 4K size.

Mappings present in SMMU records don’t seem to match with the error address observed here i.e. @ 0xfe000000 and 0xfe001000
Would it be possible to share your driver code offline?

Hi Vidyas

Your last post quotes the SMMU with the pcie-controller disabled in the SMMU.

When the pcie-controller is enabled in the SMMU I get nothing in the iovainfo ???

I’ve sent the code via PM but it will not load without the PCIe hardware present

Cheers
Robert

I’ve modified your driver fixing the way it gets registered as a PCIe driver with the sub-system and the way it gets pdev handle. I’ve sent a PM with modified driver. Please check that and let us know the result.

Hi mrbmcg,

Have you tried the suggestion?
Please let us know the result.

Thanks

No, not yet sorry

Hi mrbmcg,

Have you managed to try the suggestion?
Any further assistance required?

Thanks

Hi

I haven’t unfortunately but you can close this as I believe another engineer here has verified that it worked.

Robert

Hello,

I think we are having a similar problem, could you publish the parts that were changed?

Thanks,
kabraham

sorry… I don’t have that history.
@mrbmcg, can you please update the changes?