The performance of GTX Titan Z is abnormal.

The performance of GTX Titan Z is abnormal.

I have a two kinds of CUDA device. K40m on Server machine and TitanZ on my PC.
I expected the performance of two device is the similar, because the hardware specification is similar.

According to the below benchmark result, the performance is similar.
http://ambermd.org/gpus/benchmarks.htm

But for my setting, the performance of TitanZ is almost half of K40m’s.
I guess TitanZ setting have a problem.
What can I check to solve this problem?

My setting for “GeForce GTX TITAN Z”

  • Ubuntu 14.04.1
  • Installed driver by apt-get
    “$ sudo apt-get install nvidia-346 -y”
  • Download sdk of cuda_6.5.14_linux_64.run and installed.
    “chmod 755 cuda_6.5.14_linux_64.run”
    “./cuda_6.5.14_linux_64.run -extract=/downloads/nvidia_installers;”
    “$ cd /downloads/nvidia_installers;”
    “$ sudo ./cuda-linux64-rel-6.5.14-18749181.run”

deviceQuery result:

Device 0: “Tesla K40m”
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 11520 MBytes (12079136768 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Clock rate: 745 MHz (0.75 GHz)
Memory Clock rate: 3004 Mhz

Device 0: “GeForce GTX TITAN Z”
CUDA Driver Version / Runtime Version 7.0 / 6.5
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 6144 MBytes (6442254336 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Clock rate: 876 MHz (0.88 GHz)
Memory Clock rate: 3505 Mhz

What can I do ?

What specifically are you running which shows the performance difference?

Is it a memory bound or compute bound application? For a memory bound application the Titan-Z should be faster in most cases(K40-> 288 GBs, Titan-Z → 336 GBs per GPU for 672 GB total).

Is it operating on 32 bit or 64 bit values? I believe the Titan-Z has to be put into ‘DP’ mode for faster 64 bit operations(though I do not have any experience with those 2-in-1 GPUs, so I am not sure if what is true for a ‘regular’ Titan applies to this situation).

Are you using only one of the two GPUs in the Titan-Z, while expecting the performance of two concurrent GPUs?
I would imagine that the Amber folks know how to split up work between the two GPUs, while you may not have made that code adjustment.

More specific information is needed to answer your question. Even than only someone who has experience with that GPU will be able to comment.

  1. My code is Deep learning operation for image data. Most of this operation is using Matrix Multiplication with cuBlas. I Think it is close to computation bound work.

  2. I tested for 64bit(double type).
    How can I change the Titan-Z to ‘DP’ mode ?

  3. I am using only one device of the two GPUs in Titan-z.

What package(s) are you using? If it’s anything based off of Theano, I don’t believe 64-bit is even accelerated. At least it wasn’t when I looked into a while back, perhaps things have changed. Might want to double check before putting in the effort for DP mode.

On linux you can use the linux control panel (nvidia-settings) to enable DP mode on a titan product.

(On windows you can use the windows control panel to enable DP mode on a titan product.)

I haven’t used a titan in quite a while, but I believe you can modify the product (on linux) via nvidia-settings, and the modification should continue even if you don’t run X on the next reboot.

This is a long winded way to address the question “how can I run nvidia-settings on linux if I don’t have X running or if I don’t have an X-server running on the GPU?” One possible solution is to temporarily enable an X-server on the titan GPU, then run nvidia-settings, then make the modification for DP mode. After that, the DP mode selection should be persistent AFAIK, even if you reboot or subsequently disable the X-server.

Thank you so much.

After enabling DP mode on a titan, The performance of it exceeds the K40m’s.

I solved my problem. Thank you.