Two Quadro P4000. deviceProp.cooperativeLaunch one true if both in TCC. Windows 10, Cuda 9.2

Hi,

my computer has two Quadro P4000 connected, so far both in WDDM. I successfully use the second one to compute using Cuda using cudaSetDevice(1). Also, NSight VS Edition confirms me that Cuda only runs on the 2nd device.
Now I would like to use grid synchronization, for which, as far as I understand, the device needs to be in TCC (which is also reported to reduce latency, which my program would profit from). I switched the second device to TCC by running

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi -g 1 -dm 1

and rebooting. I then compile the conjugateGradientMultiBlockCG example using Visual Studion 2017 and add

devID = 1;
	checkCudaErrors(cudaSetDevice(devID));

after line 396 to use device one. I however get the following error:
Selected GPU (1) does not support Cooperative Kernel Launch, Waiving the run
caused by

!deviceProp.cooperativeLaunch

Now, out of curiosity I also tried with both GPU in TCC mode using my onboard GPU for graphics. In that case, the conjugateGradientMultiBlockCG runs successfully out of the box on the first, and on the second GPU by adding the two lines mentioned above. Since I also need graphical output from one of the GPU, having both in TCC is not feasible.

Is this expected behavior? Is there a way to enable TCC one only one GPU and have it support Cooperative Kernel Launch?

Also, I noticed that with TCC enabled on the second GPU, performance is worse than having both in WDDM. Nsight was looking like something is being run on the second GPU and breaking the tighter packing of my kernels there. Could that be caused by DirectX being used on the first GPU in WDDM?

Some more details:

I noticed that nvidia-smi reports using Cuda 10.2, and infoRom corruption:

C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi
Tue Jul 16 16:39:49 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 431.02       Driver Version: 431.02       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000       WDDM  | 00000000:01:00.0  On |                  N/A |
| 47%   45C    P0    29W / 105W |    428MiB /  8192MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P4000        TCC  | 00000000:03:00.0 Off |                  N/A |
| 47%   40C    P8     5W / 105W |      0MiB /  8117MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1504    C+G   Insufficient Permissions                   N/A      |
|    0      4772    C+G   ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A      |
|    0      8800    C+G   ...11411.0_x64__8wekyb3d8bbwe\Video.UI.exe N/A      |
|    0      9200    C+G   ...48.51.0_x64__kzf8qxf38zg5c\SkypeApp.exe N/A      |
|    0      9572    C+G   C:\Windows\explorer.exe                    N/A      |
|    0     10436    C+G   ....451.0_x64__8wekyb3d8bbwe\YourPhone.exe N/A      |
|    0     10468    C+G   ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A      |
|    0     13640    C+G   ...0076.0_x64__8wekyb3d8bbwe\onenoteim.exe N/A      |
|    0     13868    C+G   ...4.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A      |
|    0     14304    C+G   ...hell.Experiences.TextInput.InputApp.exe N/A      |
+-----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:01:00.0

However I am pretty sure I uninstalled Cuda 10 using a Windows System Restore point.
E.g. the Visual Studio project configuration reports a CUDA C/C++ > Command Line of

# Driver API (NVCC Compilation Type is .cubin, .gpu, or .ptx)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits0\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio017\Professional\VC\Tools\MSVC4.16.27023\bin\HostX86\x64" -x cu -rdc=true -I./ -I../../common/inc   -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -o x64/Debug/%(Filename)%(Extension).obj "%(FullPath)"

# Runtime API (NVCC Compilation Type is hybrid object or .c file)
set CUDAFE_FLAGS=--sdk_dir "C:\Program Files (x86)\Windows Kits0\"
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\nvcc.exe" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio017\Professional\VC\Tools\MSVC4.16.27023\bin\HostX86\x64" -x cu -rdc=true -I./ -I../../common/inc   -G   --keep-dir x64\Debug -maxrregcount=0  --machine 64 --compile -cudart static  -g   -DWIN32 -Xcompiler "/EHsc  /nologo  /FS /Zi  /MTd " -o x64/Debug/%(Filename)%(Extension).obj "%(FullPath)"

You might need to set devID to zero. Changing from WDDM to TCC may cause a modification of the CUDA enumeration order.

I wouldn’t be surprised if your TCC device is at devID 0 now, and your WDDM device has moved to 1. (note this is not necessarily the same as the order in nvidia-smi, which is PCI enumeration order)

You can confirm CUDA enumeration order using deviceQuery sample code.