Through various threads, it seems that some people can enable TCC on the Titan X, and some can’t. Is there a final say as to if it is possible or not to run Titan X in TCC and, most importantly to me, to run calculations through remote desktop?
CUDA toolkit version and TCC behavior should not be linked. The TCC property is dictated by the driver only as far as I know.
In other words, if it works for 7.5 it should work for 6.5, as long as the driver allows it.
Correct me if I’m wrong!
–edit, to make myself clearer, it could be that only the driver that shipped with cuda 7.5 enables TCC for the Titan X, I do not know.
Also, based on http://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/, it would seem that all (?) cards now allow CUDA applications to be run through remote desktop, even GeForces. This would make the 980Ti a cheaper alternative to the Titan X if you don’t need 12GB of vram or other TCC features.
after switching to TCC mode and rebooting I ran my usual benchmarks compiled against CUDA 6.5 and CUDA 7.5.
Like before for integer heavy applications CUDA 6.5 is faster than 7.5, and the TCC driver made no difference in running time. If anything it seems to be a bit slower now, but not sure if that is due to the updated driver or the TCC mode(which is unlikely).
For some floating point applications CUDA 7.5 is faster, but overall a mixed bag. I am still more concerned about the long cudaMalloc times associated with the later Nvidia drivers, as that really is a bigger issue.
The ability to set TCC mode on a Titan family product is dependent on having a new (enough) driver. The driver that ships with CUDA 7.5 production release should support this capability. It should not be dependent on CUDA toolkit version.
One thing I noticed when I made the switch the the TCC driver for the Titan X is that the pageable memory copy speed both directions increased by about 50% in both directions (device-host and host-device).
This measurement was made via CUDA-Z and for the Titan X in my current system;
I assume you have made sure that these are not measurement artifacts (e.g. running with a dual socket system without specifying CPU and memory affinity, so the CPU memory used could either be attached to the near or the far CPU)? How big are the individual transfers used to measure the transfer speed? Blocks of 16 MB or more are usually needed to measure close to peak rates.
I do not have a good understanding of WDDM. I am under the impression that it virtualizes GPU memory, creating a backing store in CPU memory that then allows the OS to demand page GPU memory. One could imagine that such virtualization causes additional overhead when doing CPU/GPU transfers, e.g. by breaking up a transfer into multiple smaller DMA operations.
A transfer using pinned memory benefits from the fact that there is a single contiguous mapping of virtual to physical memory, meaning a transfer requires a single DMA operation and nothing more. So it makes sense that WDDM vs TCC doesn’t matter for those transfers.
hello,
does anyone know if there is any way to control fan speed when Titan or titan x is in tcc mode?
right now I’m able to turn on tcc mode but when I do GPU rendering cards gets hot soon, fans are only up to 40% and throttling starts really fast.
any option there please?
thanks
I noticed that everyone mentioned they used windows 7 for TCC.
I attempted to use windows 8.1 and on boot, with TCC enabled on 3 out of 4 GPU, the computer crashed about 9 times before it started up stable. (Only reason I tried the ~9 times was because I didn’t find the safemode boot option)
My question was about the TCC mode of CUDA specifically.
I see I left that out of the question my fault, but this thread is about TCC.
TCC mode on the 3 GPU is working well in windows 8.1.
The code is about 50% faster as some one else noticed. (The code was previously bound by PCI-E transfers)
I^m just scared to restart my PC as it was hard to get a stable boot.
While I have not done this myself with Windows 8.1 , it should be the same process as Windows 7.
Keep in mind that NVSMI orders the GPUs by PCI-E slot rather than compute capability, so the number that you may see in the device query can be different than the actual PCI-E layout which is used by NVSMI.
That may or may not be your issue. One way to check if it worked is to use the CUDA-Z utility and it will specify the ‘Driver Version’ for each Titan X, and if that particular Titan X is in ‘TCC’ mode you will see (TCC) after the driver version.
Another point would be that if all the GPUs are of the same type, the CUDA numbering seems favor the GPU in TCC mode (or at least that is what happens on my system).
On my Windows 7 system I have two GTX Titan X GPUs. in PCI-E slot 0 is the one connected to the display and in WDDM mode, while the Titan X in PCI-E slot 1 I put in TCC mode.
When I see the device query it numbers the Titan X in TCC mode as “GPU #0” and the other Titan X as “GPU 1” even though that does not match the PCI-E numbering. This also shows up the same way in the CUDA-Z output.
Sorry to bother you guys.I have a titan v on ubuntu 16.04,the card driver is 387.34.I want to put the titan v into tcc mode by using “nvidia-smi -i 0 -dm 1”.But the system showed:
Changing driver models is not supported for GPU 00000000:65:00.0 on this platform.
I tried to type the command on root user,but it still will not work.I use titan v for deep learning not for game.Please help me.
TCC is a Windows-only feature. It is meant for OSes that have made the WDDM driver model mandatory for graphics drivers (Windows 8 and later AFAIK). This means Windows manages the GPU resources (including memory), which comes with some implicit performance problems and restrictions on the largest GPU memory allocations.
TCC is a Windows driver that does not enable graphics output and is only compatible with some higher end GPUs (Tesla, some Titan cards).
The closest thing you can do on Linux is to simply not run an X server on the specific GPU. This should disable any kernel launch timeouts.