After installing tensorflow/keras on my Dell XPS/GTX960M (640 Cuda cores) running Ubuntu16.04, I also installed on a Ubuntu 16.04 desktop (i7-3930 with Geforce 1080Ti with 11GB DDR5 and 64GB of DDR3 memory). The execution times for 25 epochs of the mnist demo code takes 0.64 sec on the laptop but twice as long on the desktop with the 1080Ti. Similarly, the execution times for the Boston house-prices takes 2x longer on the 1080Ti.
Initially I thought the problem was which PCIExpress slot the 1080 Ti card was placed, but this is not the reason because after 3 complete re-installations of Ubuntu, the 2x slower performance on the 1080Ti card has not improved. The 1080Ti card now sits in slot 0 (PCIE16_1); the only other card is my wifi card which sits in a PCIE16_2 slot.
In googling performance concerns about Tensorflow, I’ve read that the input may need to be optimized or that one may need to compile Tensorflow from source with additional nvcc options. Since I’m running the exact same R-Keras code, I suspect compiling from scratch may be needed. Can anyone advise on how to do this? There are no such options in the install_keras() function. Alternatively, any other suggestions to boost the performance of the 1080Ti is welcome.
Below I’ve pasted the first few lines of the output from the R-Keras code - my naive conclusion is that the 1080Ti card is not running as fast as it could be (highlighted in orange). Note: nvidia-smi out shows the “Volatile GPU-Util” on the laptop will hit 58% but the 1080 Ti card never runs over 10%, but I don’t know if this is a true reflection of the load.
Laptop with 960M:
2018-04-16 20:02:06.924282: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-16 20:02:06.992276: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-16 20:02:06.992639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.0975
pciBusID: 0000:01:00.0
totalMemory: 1.96GiB freeMemory: 1.59GiB
2018-04-16 20:02:06.992653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-16 20:02:07.451438: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-16 20:02:07.451460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-16 20:02:07.451485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-16 20:02:07.451681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1351 MB memory) → physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
60000/60000 [==============================] - 2s 36us/step
- loss: 0.2539 - acc: 0.9266
Epoch 2/25
60000/60000 [==============================] - 1s 23us/step - loss: 0.1053 - acc: 0.9692
Desktop with 1080Ti
Epoch 1/25
2018-04-16 20:04:48.824036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-16 20:04:48.824472: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.39GiB
2018-04-16 20:04:48.824493: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-16 20:04:49.075027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-16 20:04:49.075071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-16 20:04:49.075081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-16 20:04:49.075346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10058 MB memory) → physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
60000/60000 [==============================] - 3s 56us/step
- loss: 0.2586 - acc: 0.9253
Epoch 2/25
60000/60000 [==============================] - 3s 45us/step
######## Also when i run tensorflow on the GPU vs the CPU on both the laptop or desktop with 1080Ti