tesla m40 runs extremely slowly

jamesben6688 · November 26, 2017, 5:36am

My keras code with tensorflow backend runs extremely slowly on my Tesla M40 GPUs. I doubt that there are some bugs in my code. However, when I run the same code on another 1080ti GPU, it runs very fast. I test the same code on 3 Tesla M40 GPU group and another single 1080ti, however, the single 1080ti runs much faster than 3 Tesla M40. Sometimes, the Volatile GPU-Util was 100% without any running process. Furthermore, the power usage is always no more than 100W. Is this a hardware problem?

fgardiner · November 26, 2017, 3:32pm

The Nvidia GPU Cloud images are only intended to be used with Pascal or Volta GPUs. The M40 is an earlier generation and it is not supported by NGC while the 1080ti, being a Pascal based GPU is supported. NGC only supports Pascal and Volta GPUs because they are far better suited to Deep Learning workloads and would be expected to be much faster.

Unless your M40 is showing some other symptoms, it is unlikely to be a hardware issue

Robert_Crovella · November 26, 2017, 8:42pm

My guess is that this is a cross-posting of your similar question here:

[url]https://stackoverflow.com/questions/47483099/tesla-m40-run-extremely-slowly[/url]

Unless you are using an NGC container, posting that question here may give rise to confusion.

Apart from that subject, I would expect, in general, for a DL framework or code to run faster on a 1080ti than on a single Tesla M40. The 1080ti has more compute throughput as well as more memory bandwidth, than that older GPU.

Whether or not a code might run faster on 3 Tesla M40’s compared to a single 1080ti will have a lot to do with the code itself. Unless your keras/tensorflow code is written to automatically use multiple GPUs, running it on a machine with multiple GPUs may not give any benefit (over a single GPU).

jamesben6688 · December 1, 2017, 1:25pm

Thanks for your reply. I am using multiple GPUs with official keras API in [url]https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py[/url], so I do not think it is about my code issue. In addition, one epoch’s training which takes half a hour in one 1080ti should take about 4 hours in 3 multiple GPUs, and the power usage is no more than 100W. As your can see, the GPU state at[url]https://stackoverflow.com/questions/47483099/tesla-m40-run-extremely-slowly[/url] is really strange, so the large performance difference(0.5 hour per epoch on 1080ti VS 4 hours per epoch on 3 Tesla M40) is just because of the GPU architecture ?