GPU functioning only at 16% with CUDA and cuDNN installed (Geforce GTX 750 Ti)

Hi,

I have python 3.6.4 and tensorflow-gpu installed with CUDA 9.0 and cuDNN 7.0.5.

After installing all the necessary and checking if tensforflow recognises the GPU all goes good.

However, when I train a ANN (it has 2 hidden layers), it goes very slow; it only uses the 16% of the GPU which is a NVIDIA GeForce GTX 750 Ti.

Why can be that? It should go like a rocket.

I see that the dedicated memory of the GPU is 2GB, but I have 16GB of RAM also; which is not fully used when I’m running the script. Could you help me with this?

It should go much faster running in the GPU, but it doesn’t.

There it goes the code that appears after checking and an attachment of the process:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16959792111093298775
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 168158002
locality {
  bus_id: 1
}
incarnation: 1201157812246464469
physical_device_desc: "device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0"
]

Screenshot.pdf (278 KB)

Why do you expect more than 16%?

I’m currently writing custom CUDA code which gets 100% utilization WHEN the kernel is executing, but that is somewhat unusual and TF is not going to keep the GPU nearly that busy.

16% is actually quite good for non-custom CUDA driven by Python which is relatively slow compared to C/C++.

Hi robosmith and thank for answering.

The matter was that I tried it in another pc (laptop) which was only using an older CPU and the training was faster in that pc.

That being the case, I doubted that mine was using the GPU correctly.

However, when we increased artificially the NN architecture to x10 layers and a higher batch size, my GPU use went up to 38% and was very fast. In the case of the other laptop, it just crashed; it was very very slow due to the overload on the CPU.

Could you confirm my following thoughts about this? I think that the GPU only uses the resources available and the context in which out-stands the CPU is when the batch sizes increases and architecture increases because of the capability of doing more parallel calculus. As I increase the batch size and the architecture, the load of the GPU goes up and up.

Thanks in advance,

In TensorFlow, the GPU is mostly only used for layer tensor operations such as sum of products, etc.

That is why there is more use, the more layers which are added to your network.

Hi again,

Alright, thanks for the knowledge.

Is there any way to use the full power of the GPU, in all the computations and in all the situations? Not only in some of them.

I will appreciate a lot if you could give me some hints in this.

Thanks in advance,

I have the sam problem… Any suggestions?