We would like to utilize GPUDirect RDMA between the Xavier GPU and a PCIe card inserted in the PCIe slot. Is this possible with current Jetson AGX Xavier that we just received?
If it is not possible today then where is this functionality placed on the roadmap?
The GPUDirect interface (like in the NVIDIA desktop discrete GPU PCIe driver) isn’t supported yet at this time, however we are tentatively planning it for a future JetPack release. Right now the Jetson’s integrated GPU driver is userspace, not a PCIe kernel module unlike the desktop discrete GPU version.
However since Jetson’s GPU and CPU share the same physical memory, theoretically if you were to allocate mapped memory with cudaHostAlloc(), and then send this to your PCIe device’s driver via an ioctl and were able to translate it from the virtual address to physical address, that may allow you to have your PCIe device DMA to the same memory that is mapped to the GPU (thus accomplishing zero copy between your device and GPU). I am not sure of the functions that perform the virt->physical memory address translation in the recent kernels.
Thanks for the prompt, concise and helpful response. I am grateful to not go into the weekend hoping about this. :-)
I will read your suggested approach again on monday and may…ummm…probably have further questions about how i tackle this. Your jump straight to zero-copy was apropos, because this is exactly why i wanted to be able to do this…
Based on your answer from friday, i need to take a step back from my assumptions and regain a collection of first principals.
Toward that end, is there some documentation and graphics i can peruse that outline how analyzing video frames coming in from a PCIe card is supposed to work?
Hi johnu, the normal way is via your typical Linux PCIe kernel driver with DMA. It typically mmaps the memory into user-space process, at which point the user can send it to the GPU. However, if you were to modify your PCIe driver to accept pointers for it to DMA to, instead of allocating it’s own, then you can provide it with pointers that have already been mapped to the GPU by the CUDA memory APIs.
Which L4T release are you using (only L4T r32.1 is supported), and which Jetson board are you compiling the code on (only Jetson AGX Xavier is supported)?
Please show the complete log from the build command.
You don’t need to use anything in this example to connect two Xaviers over PCIe. However, you likely can use GPUDirect RDMA between two Xaviers, and so portions of this example may be useful in setting that up. The root port Xavier would export its memory using RDMA, and the endpoint Xavier would then access it. You’d need to develop some custom protocol (and drivers) to pass the memory pointers etc. back and forth between the two systems.