nvidia-smi topo SOC

markanders · October 22, 2017, 12:25am

Hi,

I’ve seen 1 similar post where someone couldn’t get nvidia-smi topo -m to have PHB links instead of SOC links. I have the same problem:
nvidia-smi topo -m

"topo"  GPU0    mlx5_0  CPU Affinity
GPU0     X      SOC     0-0,2-2,4-4,6-6,8-8,10-10,12-12,14-14,16-16,18-18 [SNIP]
mlx5_0  SOC      X

My Mellanox EDR card and P100 GPUs are on the same PCIe bus (the other post was solved by discovering that manufacturer documentation incorrectly listing the mappings between CPU and PCIe slots - I’ve tried all possible combinations while ignoring what’s printed on the risers) and my nv_peer_mem module built cleanly and loads correctly.

Could anyone guide me in the direction of either BIOS settings that might affect this (I did check and couldn’t spot anything) or OS configuration issues.

Really anything else that I could try would be much appreciated!
Regards,
MA

markanders · October 22, 2017, 12:43am

I have a little more information on the same problem. I combined 2 GPUs into the same chassis and came out with the following:

[topo] GPU0    GPU1    mlx5_0  CPU Affinity
GPU0     X      SOC     SOC     [SNIP]
GPU1    SOC      X      SOC     
mlx5_0  SOC     SOC      X

GPU0 and GPU1 are both on x16 slots attached to the same CPU.
The SOC on the mlx5_0 card are expected. I’ve taken another wander through the bios and couldn’t find anything obvious. These are Dell R740 nodes running the latest bios (v1.1.7).

Robert_Crovella · October 22, 2017, 2:12pm

Dell R740 is a skylake CPU system. Skylake processors from Intel introduced the possibility for multiple PCIE root complexes on a single CPU. Previously, intel Xeon processors had a single PCIE root complex to which all the PCIE lanes were attached. This is no longer the case (with skylake).

PCIE traffic flowing on a PCIE fabric connected to a single root complex is what is normally required for proper P2P usage and is what the nvidia-smi topo -m tool will report as a PHB link.

Previously, since such a statement was equivalent (because of the 1:1 correspondence between PCIE root complexes and CPU sockets) to saying “PCIE devices attached to a single socket”, that was commonly the heuristic for determining whether P2P traffic would be supported between 2 PCIE endpoint devices.

However, PCIE P2P traffic is not, by definition, supported between PCIE endpoint devices attached to separate PCIE root complexes, whether those root complexes are on separate CPU sockets (as would have been the case historically) or on the same CPU socket (as is now possible with Skylake).

Therefore, at the current time, your nvidia-smi topo -m output is the expected behavior for that system, there are no BIOS settings that can modify that, and you should not expect P2P behavior at the current time between devices attached to separate root complexes.

It’s entirely possible, of course, that a “technological breakthrough” could occur (let’s say between NVIDIA and Intel) that would allow for sufficient information exchange to occur so as to document the method by which P2P PCIE traffic could be reliably supported between PCIE endpoint devices attached to separate root complexes, but such activity has not happened yet, to my knowledge.

You should be able, with sufficient effort and knowledge, using a linux tool like lspci, to confirm the outlines of the above assertion, in particular that the GPUs in question are attached to separate root complexes. You should see lspci enumerate multiple host bridges, and with the tree display form of lspci, identify that the GPUs in question are attached to separate host bridges.

Since this is a Dell platform, I’d also encourage you to get in contact with Dell for confirmation of this and/or further support inquiries.

njuffa · October 22, 2017, 4:34pm

Thanks for pointing that out, that was news to me.

markanders · October 22, 2017, 4:42pm

Hello,

Thank you for the detailed response. I’m following your advice and contacting Dell for confirmation but I can’t imagine, based on the config that I selected (4x16,3x8), that it wasn’t put together as you outline.

If this is the output expected to confirm that the devices share a PCIe root complex:
$ lspci -t
±[0000:80]-±02.0-[81]----00.0
| ±03.0-[82]----00.0
Where 81 and 82 are the identifiers (not mine)

Then mine does not look like that and they are definitely on separate root complexes.

I think that your scenario whereby a “technological breakthrough” could occur is substantially unlikely based on the companies and technologies involved ;)

Thanks again for your time
MA

shaklee3 · January 12, 2018, 12:02am

txbob, do you expect skylake to have a performance decrease due to this? Intel has said SkyLake has improved PCIe performance, so I wasn’t sure if this is better or worse.

njuffa · January 12, 2018, 1:09am

What specifically did Intel say they improved with respect to PCIe on Skylake? I cannot find any relevant press release or reports in the trade press hinting at higher performance. Skylake supports PCIe gen3, same as preceding generations of CPUs. I did however find a post in Intel’s forums from a user who says PCIe performance got worse on Skylake for their use case.

shaklee3 · January 12, 2018, 1:29am

I’ll have to dig it up, but I could be wrong. I’m using Skylake right now, and wasn’t understanding the nvidia-smi output as the original thread said.