Hi Vidyas
I have checked back and it turns out that I was using R24.1 not R23.2, howerver the issue still remains when upgrading from R24.1 to R24.2.1 - my PCIe driver does not work for some reason :-(
Apologies in advance for the long post but I will try to make things as clear as I can to help debug the issue.
During development we were using a Jetson TX1 board and a Topic Miami Xilinx Zynq 7030 PCIe card in the x4 PCIe slot and we were using Jetpack 2.2 with 64 bit userspace which uses kernel 24.1 from May 2016
I’ve just gone back to the Jetson TX1 setup and verified that all was still working and have supplied the output correct output in the paragraphs below…
I will try to describe the way the system works a little bit…
We have an FPGA implementation that is based on Xilinx application note xapp1052 which is for a bus mastering DMA endpoint. This utilises the hardware PCIe core on the Xilinx Zynq 7030 to present an endpoint that can bus master the TX1 memory. Our FPGA implementation accepts raw input video frames from the TX1 over the PCIe which are analysed and the results are returned back to the TX1 over PCIe (or simple 32-bit word inverting for test purposes)
The driver is very simple. It gets the PCIe device context from the vendor and device id that our endpoint reports.
R24.1
sudo lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Memory controller: Xilinx Corporation Device 7028
After some initialisation the driver mallocs some contiguous memory using pci_alloc_consistent() for the endpoint to DMA in and out of.
Some debug output of my driver during loading
[ 149.771331] xbmd: Init: Base hw val 13000000
[ 149.771346] xbmd: Init: Base hw len 2048
[ 149.771483] xbmd: Init: Virt HW address FFFFFF800AD76000
[ 149.771491] xbmd: Init: Device IRQ: 130
[ 149.771506] xbmd: Init: Initialize Hardware Done..
[ 149.771514] xbmd: ISR Setup..(Forcing to 130)
[ 149.771550] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[ 149.771576] xbmd: Read Buffer Allocation: FFFFFFC05E800000->DE800000
[ 149.771586] xbmd: Write Buffer Allocation: FFFFFFC05EC00000->DEC00000
[ 149.771599] xbmd: Init: module registered
[ 149.771605] xbmd driver is loaded
The FPGA implementation is quite simple, for reading you set a read address and a length in the FPGAs PCIe config registers then trigger the read to start by setting another bit in a another register. Any write() calls into the driver simply fill the contents of the dma coherent write buffer with the input data and any read() calls read the contents of the dma coherent read buffer into output data buffer. ioctl() calls are used to start a transfer once the write buffer is filled and also to get the return data back from the FPGA before reading the read buffer.
In our test example we send a frame of I420 YUV 1080p (4MB) video over PCIe to the Zynq FPGA in 4096 byte chunks. Once the whole frame is received the FPGA inverts the first 32-bit data value ( xors with 0xffffffff ) of each chunk then sends the whole frame back to the TX1 via the PCIe in 4096 byte chunks. The video data is not actually real video data but a completely blank frame (all 0x00s) but with an incrementing count value inserted every 4096 bytes. This means that the first four bytes of any 4096 byte PCIe data block can be printed in the driver to make sure the data is correct.
For example the first blocks out will be
0x00 0x00 0x00 0x00 0x00 0x00 0x00 ...
0x01 0x00 0x00 0x00 0x00 0x00 0x00 ...
0x02 0x00 0x00 0x00 0x00 0x00 0x00 ...
etc etc
On the way back the sequence bytes are inverted so the data is
0xff 0xff 0xff 0xff 0x00 0x00 0x00 ...
0xfe 0xff 0xff 0xff 0x00 0x00 0x00 ...
0xfd 0xff 0xff 0xff 0x00 0x00 0x00 ...
etc etc
I added dmesg output from a working driver during transfer to validate the PCIe transfers
Write (TX1->FPGA)
[ 175.080223] Doing a block write of 4149248 bytes from 0xde800000 in 4096 byte chunks from X1 into FPGA endpoint
[ 175.094384] (0000) DONE 4096 bytes from X1 to FPGA at 0xde800000 - 0x00 0x00 0x00 0x00
[ 175.102513] (0001) DONE 4096 bytes from X1 to FPGA at 0xde801000 - 0x01 0x00 0x00 0x00
[ 175.110611] (0002) DONE 4096 bytes from X1 to FPGA at 0xde802000 - 0x02 0x00 0x00 0x00
[ 175.118704] (0003) DONE 4096 bytes from X1 to FPGA at 0xde803000 - 0x03 0x00 0x00 0x00
...
[ 183.320414] (1009) DONE 4096 bytes from X1 to FPGA at 0xdebf1000 - 0xf1 0x03 0x00 0x00
[ 183.328859] (1010) DONE 4096 bytes from X1 to FPGA at 0xdebf2000 - 0xf2 0x03 0x00 0x00
[ 183.336996] (1011) DONE 4096 bytes from X1 to FPGA at 0xdebf3000 - 0xf3 0x03 0x00 0x00
[ 183.345077] (1012) DONE 4096 bytes from X1 to FPGA at 0xdebf4000 - 0xf4 0x03 0x00 0x00
[ 183.353029] DONE 4149248 bytes from X1 to FPGA at 0x003f5000 in in 4096 byte chunks
In the FPGA I can verify that the count sequence is received by inspecting the data in the debugger
Read (FPGA->TX1)
[ 183.360723] Doing a block read of 4149248 bytes from 0xdec00000 in 4096 byte chunks from FPGA endpoint to X1
[ 183.370746] (0000) DONE 4096 bytes read into X1 at 0xdec00000 - 0xff 0xff 0xff 0xff
[ 183.378596] (0001) DONE 4096 bytes read into X1 at 0xdec01000 - 0xfe 0xff 0xff 0xff
[ 183.386776] (0002) DONE 4096 bytes read into X1 at 0xdec02000 - 0xfd 0xff 0xff 0xff
[ 183.394807] (0003) DONE 4096 bytes read into X1 at 0xdec03000 - 0xfc 0xff 0xff 0xff
...
[ 191.342456] (1009) DONE 4096 bytes read into X1 at 0xdeff1000 - 0x0e 0xfc 0xff 0xff
[ 191.350307] (1010) DONE 4096 bytes read into X1 at 0xdeff2000 - 0x0d 0xfc 0xff 0xff
[ 191.358155] (1011) DONE 4096 bytes read into X1 at 0xdeff3000 - 0x0c 0xfc 0xff 0xff
[ 191.366007] (1012) DONE 4096 bytes read into X1 at 0xdeff4000 - 0x0b 0xfc 0xff 0xff
[ 191.373654] DONE 4149248 bytes read into X1 at 0xdec00000 in in 4096 byte chunks
As you can see the data is being received in the FPGA, the counter value is inverted and data is arriving back in the TX1 memory all in good order.
R24.2.1
So I reflashed the MB1 board with Jetpack 2.3.1
# R24 (release), REVISION: 2.1, GCID: 8028265, BOARD: t210ref, EABI: aarch64, DATE: Thu Nov 10 03:51:59 UTC 2016
sudo lspci
00:01.0 PCI bridge: NVIDIA Corporation Device 0fae (rev a1)
01:00.0 Memory controller: Xilinx Corporation Device 7028
Driver output during load
[ 282.946485] xbmd: Init: Base hw val 13000000
[ 282.946495] xbmd: Init: Base hw len 2048
[ 282.946746] xbmd: Init: Virt HW address FFFFFF8009FFE000
[ 282.946753] xbmd: Init: Device IRQ: 130
[ 282.946762] xbmd: Init: Initialize Hardware Done..
[ 282.946766] xbmd: ISR Setup..(Forcing to 130)
[ 282.946795] PCI: enabling device 0000:01:00.0 (0140 -> 0142)
[ 282.946812] xbmd: Read Buffer Allocation: FFFFFFC07E000000->FE000000
[ 282.946818] xbmd: Write Buffer Allocation: FFFFFFC07E400000->FE400000
[ 282.946826] xbmd: Init: module registered
[ 282.946830] xbmd driver is loaded
So the physical address of the coherent dma buffer is the only real change I guess
Write (TX1->FPGA)
[ 396.510420] Doing a block write of 4149248 bytes from 0xfe000000 in 4096 byte chunks from X1 into FPGA endpoint
[ 396.524091] (0000) DONE 4096 bytes from X1 to FPGA at 0xfe000000 - 0x00 0x00 0x00 0x00
[ 396.524093] smmu_dump_pagetable(): fault_address=0x00000000fe000000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 396.524098] mc-err: (0) csr_afir: EMEM decode error on PDE or PTE entry
[ 396.524100] mc-err: status = 0x6000000e; addr = 0xfe000000
[ 396.524104] mc-err: secure: no, access-type: read, SMMU fault: nr-nw-s
[ 396.562349] smmu_dump_pagetable(): fault_address=0x00000000fe001000 pa=0xffffffffffffffff bytes=ffffffffffffffff #pte=0 in L2
[ 396.562467] (0001) DONE 4096 bytes from X1 to FPGA at 0xfe001000 - 0x01 0x00 0x00 0x00
[ 396.562643] (0002) DONE 4096 bytes from X1 to FPGA at 0xfe002000 - 0x02 0x00 0x00 0x00
[ 396.562817] (0003) DONE 4096 bytes from X1 to FPGA at 0xfe003000 - 0x03 0x00 0x00 0x00
[ 396.562991] (0004) DONE 4096 bytes from X1 to FPGA at 0xfe004000 - 0x04 0x00 0x00 0x00
I guess alarm bells should start ringing because of the page faults but I don’t really understand what is happening here.
In the FPGA instead of the count sequence all of the bytes are 0xFFFFFFFF
Because of this I changed the FPGA code to regenerate the count sequence instead of just inverting the received buffer so that I could send something I could verify on the TX1 side
Read (FPGA->TX1)
[ 399.572657] Doing a block read of 4149248 bytes from 0xfe400000 in 4096 byte chunks from FPGA endpoint to X1
[ 399.572903] (0000) DONE 4096 bytes read into X1 at 0xfe400000 - 0x00 0x00 0x00 0x00
[ 399.573140] (0001) DONE 4096 bytes read into X1 at 0xfe401000 - 0x00 0x00 0x00 0x00
[ 399.573422] (0002) DONE 4096 bytes read into X1 at 0xfe402000 - 0x00 0x00 0x00 0x00
[ 399.573674] (0003) DONE 4096 bytes read into X1 at 0xfe403000 - 0x00 0x00 0x00 0x00
So in the TX1 instead of the count sequence all I read back is zeros. :-(
So…
With kernel R24.1 my driver works perfectly.
With Kernel R24.2.1 my driver does not work with DMA reads or writes.
Any suggestions?
Robert