Final word on Titan X and TCC?

Ailleur · September 23, 2015, 4:20pm

Through various threads, it seems that some people can enable TCC on the Titan X, and some can’t. Is there a final say as to if it is possible or not to run Titan X in TCC and, most importantly to me, to run calculations through remote desktop?

Thanks!

CudaaduC · September 23, 2015, 5:50pm

Got this to work with the Titan X Windows 7 x64.

Open a command prompt as administrator, go to C:\Program Files\NVIDIA Corporation\NVSMI and type

nvidia-smi -g 1 -dm 1

This assumes that another GPU has the video out and the Titan X is in slot #1 (based on PCI-E location rather than compute capability).

Have not noticed any performance benefit with TCC mode yet, but it does work for the Titan X.

robosmith · September 23, 2015, 6:27pm

Does TCC/Titan X only work with v7.5?

Cause I definitely need to use TCC drivers with my Titan X.

Ailleur · September 23, 2015, 7:28pm

CUDA toolkit version and TCC behavior should not be linked. The TCC property is dictated by the driver only as far as I know.
In other words, if it works for 7.5 it should work for 6.5, as long as the driver allows it.

Correct me if I’m wrong!

–edit, to make myself clearer, it could be that only the driver that shipped with cuda 7.5 enables TCC for the Titan X, I do not know.

Also, based on http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/, it would seem that all (?) cards now allow CUDA applications to be run through remote desktop, even GeForces. This would make the 980Ti a cheaper alternative to the Titan X if you don’t need 12GB of vram or other TCC features.

CudaaduC · September 23, 2015, 8:11pm

after switching to TCC mode and rebooting I ran my usual benchmarks compiled against CUDA 6.5 and CUDA 7.5.

Like before for integer heavy applications CUDA 6.5 is faster than 7.5, and the TCC driver made no difference in running time. If anything it seems to be a bit slower now, but not sure if that is due to the updated driver or the TCC mode(which is unlikely).

For some floating point applications CUDA 7.5 is faster, but overall a mixed bag. I am still more concerned about the long cudaMalloc times associated with the later Nvidia drivers, as that really is a bigger issue.

robosmith · September 23, 2015, 9:01pm

So TCC “mode” is not the same as TCC drivers for a Tesla card.

Disappointing cause we’re getting much better performance on 1/2 x K80 than on Titan X for some functions (all float). I assume it is due to drivers.

You were able to run TCC mode in CUDA v6.5?

CudaaduC · September 23, 2015, 9:13pm

When I said TCC “mode” I meant that this was the driver used by the Titan X.

I have both CUDA 6.5 and CUDA 7.5 on my Windows 7 system, and the driver dll version as shown by CUDA-Z is for 7.5.

That said when I compile code I can link against either CUDA 6.5 or CUDA 7.5 and both executeables will run using the TCC driver for the Titan X.

No sure if that answers your question. Will be playing with this more today and if I find anything interesting or useful will report.

Robert_Crovella · September 24, 2015, 1:21am

The ability to set TCC mode on a Titan family product is dependent on having a new (enough) driver. The driver that ships with CUDA 7.5 production release should support this capability. It should not be dependent on CUDA toolkit version.

CudaaduC · October 1, 2015, 4:41pm

One thing I noticed when I made the switch the the TCC driver for the Titan X is that the pageable memory copy speed both directions increased by about 50% in both directions (device-host and host-device).

This measurement was made via CUDA-Z and for the Titan X in my current system;

[url]http://imgur.com/f5HEGQB[/url]

The pinned number did not change but the pageable numbers went up from ~5000 each direction up to 7000-8000.

A positive result assuming it is being measured correctly. Any obvious reason for this increase?

njuffa · October 1, 2015, 5:09pm

I assume you have made sure that these are not measurement artifacts (e.g. running with a dual socket system without specifying CPU and memory affinity, so the CPU memory used could either be attached to the near or the far CPU)? How big are the individual transfers used to measure the transfer speed? Blocks of 16 MB or more are usually needed to measure close to peak rates.

I do not have a good understanding of WDDM. I am under the impression that it virtualizes GPU memory, creating a backing store in CPU memory that then allows the OS to demand page GPU memory. One could imagine that such virtualization causes additional overhead when doing CPU/GPU transfers, e.g. by breaking up a transfer into multiple smaller DMA operations.

A transfer using pinned memory benefits from the fact that there is a single contiguous mapping of virtual to physical memory, meaning a transfer requires a single DMA operation and nothing more. So it makes sense that WDDM vs TCC doesn’t matter for those transfers.

mirkoj · January 30, 2016, 10:40pm

hello,
does anyone know if there is any way to control fan speed when Titan or titan x is in tcc mode?

right now I’m able to turn on tcc mode but when I do GPU rendering cards gets hot soon, fans are only up to 40% and throttling starts really fast.
any option there please?
thanks

sobe118 · April 29, 2016, 5:32am

I noticed that everyone mentioned they used windows 7 for TCC.
I attempted to use windows 8.1 and on boot, with TCC enabled on 3 out of 4 GPU, the computer crashed about 9 times before it started up stable. (Only reason I tried the ~9 times was because I didn’t find the safemode boot option)

I found a page that tells the versions of windows that it runs on.
http://www.nvidia.com/object/software-for-tesla-products.html

It doesn’t mention windows 8 versions, just wondering if anyone had success on windows 8 with TCC?

Robert_Crovella · April 29, 2016, 1:39pm

windows 7, 8.1, and 10 should all be supported.

The specific support matrix is covered in the windows installation guide:

[url]Installation Guide Windows :: CUDA Toolkit Documentation

Many people use CUDA successfully on windows 8.1.

sobe118 · April 30, 2016, 1:46am

My question was about the TCC mode of CUDA specifically.
I see I left that out of the question my fault, but this thread is about TCC.

TCC mode on the 3 GPU is working well in windows 8.1.
The code is about 50% faster as some one else noticed. (The code was previously bound by PCI-E transfers)
I^m just scared to restart my PC as it was hard to get a stable boot.

CudaaduC · April 30, 2016, 1:55am

While I have not done this myself with Windows 8.1 , it should be the same process as Windows 7.

Keep in mind that NVSMI orders the GPUs by PCI-E slot rather than compute capability, so the number that you may see in the device query can be different than the actual PCI-E layout which is used by NVSMI.

That may or may not be your issue. One way to check if it worked is to use the CUDA-Z utility and it will specify the ‘Driver Version’ for each Titan X, and if that particular Titan X is in ‘TCC’ mode you will see (TCC) after the driver version.

CudaaduC · April 30, 2016, 2:04am

Another point would be that if all the GPUs are of the same type, the CUDA numbering seems favor the GPU in TCC mode (or at least that is what happens on my system).

On my Windows 7 system I have two GTX Titan X GPUs. in PCI-E slot 0 is the one connected to the display and in WDDM mode, while the Titan X in PCI-E slot 1 I put in TCC mode.
When I see the device query it numbers the Titan X in TCC mode as “GPU #0” and the other Titan X as “GPU 1” even though that does not match the PCI-E numbering. This also shows up the same way in the CUDA-Z output.

wangyu099508 · September 3, 2018, 9:56am

Sorry to bother you guys.I have a titan v on ubuntu 16.04,the card driver is 387.34.I want to put the titan v into tcc mode by using “nvidia-smi -i 0 -dm 1”.But the system showed:

Changing driver models is not supported for GPU 00000000:65:00.0 on this platform.

I tried to type the command on root user,but it still will not work.I use titan v for deep learning not for game.Please help me.

cbuchner1 · September 3, 2018, 12:38pm

TCC is a Windows-only feature. It is meant for OSes that have made the WDDM driver model mandatory for graphics drivers (Windows 8 and later AFAIK). This means Windows manages the GPU resources (including memory), which comes with some implicit performance problems and restrictions on the largest GPU memory allocations.

TCC is a Windows driver that does not enable graphics output and is only compatible with some higher end GPUs (Tesla, some Titan cards).

The closest thing you can do on Linux is to simply not run an X server on the specific GPU. This should disable any kernel launch timeouts.