I am testing deep learning on multiple GPU cards using TensorFlow. The performance of single GPU is better compared to 2 GPUs.
Following is the result of simpleP2P utility,
/usr/local/cuda/samples/0_Simple/simpleP2P# ./simpleP2P
[./simpleP2P] - Starting…
Checking for multiple GPUs…
CUDA-capable device count: 2
GPU0 = “GeForce GTX 750 Ti” IS capable of Peer-to-Peer (P2P)
GPU1 = “GeForce GTX 750 Ti” IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access…
Peer access from GeForce GTX 750 Ti (GPU0) → GeForce GTX 750 Ti (GPU1) : No
Peer access from GeForce GTX 750 Ti (GPU1) → GeForce GTX 750 Ti (GPU0) : No
Two or more GPUs with SM 2.0 or higher capability are required for ./simpleP2P.
Peer to Peer access is not available amongst GPUs in the system, waiving test.
Following is the result of simpleMultiGPU utility,
/usr/local/cuda/samples/0_Simple/simpleMultiGPU# ./simpleMultiGPU
Starting simpleMultiGPU
CUDA-capable device count: 2
Generating input data…
Computing with 2 GPUs…
GPU Processing time: 13.510000 (ms)
Computing with Host CPU…
Comparing GPU and Host CPU results…
GPU sum: 16777280.000000
CPU sum: 16777294.395033
Relative difference: 8.580068E-07
Could anyone suggest how to enable Peer to Peer memory access?