DMA transfer between Jetson TK1 and PCIe

frankilgallo · December 9, 2015, 3:18pm

Hello, I am currently writing a driver for a custom card based on the an Artix-7 FPGA board, connected through the mPCIe slot on the Jetson TK1. I am at a point where I can succesfully create a char device and write/read memory from/to the FPGA using the PCIe link.
By doing so I can reach a bandwidth of around 280 MB/s, which is enough for what I need to do. My next goal is to offload the CPU of the memory copying task by using DMA transfers.

In particular what I want to do is to copy a fixed amount of memory (around 1 MB) every 10 milliseconds or so. I have some experience in using DMAs, albeit on bare metal in custom systems, therefore I understand the flow but I have no experience with Linux DMAs.

I have read all the various Linux Device Driver LDD3, the DMA-API.txt, DMA-HOWTO.txt which focus on allocate DMA-able memory in kernel modules, and I am able to allocate a DMA buffer (I have tried both streaming and consistent).

After this I got a little bit lost on how to actually instantiate a DMA channel and use it to start a transfer… I have a reference implementation of doing so from a TI processor which uses eDMA, but I cannot use that on the Jetson.

I realize I have to use the spcific DMA controller on the Tegra, and I a started looking at arch/arm/mach-tegra/apbio.c and lately arch/arm/mach-tegra/include/mach/dma.h, which oddly enough is not included in the kernel downloaded for the R21.4 release on the Nvidia website, which makes me think I am barking the completely wrong tree…

So I am not sure how to go forward, hopefully somebody can give me a hint or direct me towards some useful example or documentation… thanks in advance.

dusty_nv · December 10, 2015, 3:15am

Hi frankilgallo, normally the FPGA’s PCIe endpoint is the device which contains the DMA engine. The FPGA’s PCIe BAR (base address register - mapped region mapped to the PCIe device’s configuration registers) generally contains a ringbuffer of addresses in system RAM which the FPGA’s DMA engine is supposed to access. The Linux kernel driver allocates memory with get_free_pages() and uploads these memory addresses to the FPGA’s ringbuffer. Then another register in the BAR is generally used to kick off the transfers. You might find these additional references useful:

[url]Linux Device Drivers, 2nd Edition: Chapter 13: mmap and DMA
[url]Linux Device Drivers, Second Edition [LWN.net]

frankilgallo · December 10, 2015, 12:22pm

Thanks a lot for the quick reply and for the references dusty_nv. Yes, this is how I understand DMA towards an FPGA should work - the driver allocates a buffer, then writes the address of the buffer to a specific register on the device, kicks off the transfer by writing to another device register, and receives an interrupt when the transfer is done. I am already at the point where I can read and write to BARs using my driver, but at the moment the FPGA design is very simple so they do not contain any useful information.

I’ll give some more background info in order to explain why I am trying to do something different.
We are trying to port on the Jetson a system where the FPGA was working in conjunction with a TI DM3730 processor, and we would like to keep the same FPGA design. As far as we can see, there’s no DMA engine on the original FPGA design, and in the reference original driver all the transfers are handled through the eDMA interface. As I dig more in the documentation, it appears that the eDMA engine is on the DSP of the TI processor, which explains why we cannot find it on the FPGA design.

So I’ll rephrase my question - is there something like the eDMA engine available on the TK1 or another DMA engine available that I can use for this? Or the only solution is to implement a DMA engine on the FPGA?

dusty_nv · December 15, 2015, 3:34am

There is no eDMA engine available on the TK1 and no another on-chip DMA engine’s available. The FPGA will need to implement the DMA engine.

frankilgallo · December 17, 2015, 3:30pm

Thanks a lot, we ended up inserting an AXI CDMA engine in the FPGA design.

I have added an interface to control the CDMA registers with the kernel module and mapped a portion of the FPGA memory on the PC side using a BAR. I then reserved a portion of the Tegra SDRAM at 0xF8000000 to be accessible by the FPGA - I did that using the boot command line, not sure if there is a more proper way to do that by tweaking the device tree - and remapped it with ioremap_nocache. I can then feed these address to the DMA engine and succesfully transfer data between the systems using the CDMA engine.

Thanks for all the help.

dereks443_cp · December 30, 2015, 1:18am

no another on-chip DMA engine’s available.

dusty_nv, thank you for the information. I want to confirm that I understand your post correctly.

QUESTION: Is it possible to use DMA from my custom PCIe card (which happens to be an FPGA) to CPU system memory? I am using a Tegra TX-1.

Your post above makes me think DMA is not possible when using a PCIe card. Please confirm.

I found there is a Linux driver called “tegra-apb” which provides DMA controller support for the AMBA APB bus, but I don’t think that helps when we need to use the PCIe bus.

I also see your other forum post about the DMA on the RTL8111 GigE controller, but I am not using ethernet so it does not help me:

Any advice or assistance is appreciated.

Thank you,
Derek

dereks443_cp · December 30, 2015, 6:08pm

Just to clarify: I meant to ask if the Tegra-X1 has a DMA controller on board that can be used across the PCIe bus.

Similar to frankilgallo, we have an existing FPGA board and do not want to modify it (by adding a DMA controller to it).

dusty_nv · December 31, 2015, 3:37pm

The PCIe endpoint (in your case, the FPGA - in the other case, the GbE controller) should implement the DMA engine.

You are correct, the APB bus is only for peripherals like SPI, I2C, UART ect. The PCIe root complex does not require a bridge and sits directly on the memory interface, but for DMA does require DMA engines to be implemented in the PCI endpoint (which is typical for PCIe, mostly for scalability).