Hi,
we are currently working on a migration of our platform from AWS to Azure and have run into an issue with multiple contexts on Azure M60s using Cuda 7.5. All Cuda code we have tried, even simple ‘Hello Worlds’ hangs when the 10th or greater context is opened.
The test case is a hello world which opens a context then pauses. The first 8 initialise OK and show as using 73MB on the card and run OK. The 10th and higher block and show as using 1MB. If the first 8 processes exit the blocked processes do not recover. I can find no resource limitation that could be causing this. All suggestions welcome !
Sun Mar 26 14:57:33 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 89AE:00:00.0 Off | Off |
| N/A 43C P8 15W / 150W | 2MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 8C45:00:00.0 Off | Off |
| N/A 43C P0 39W / 150W | 601MiB / 8123MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 12387 C ./a.out 73MiB |
| 1 12396 C ./a.out 73MiB |
| 1 12403 C ./a.out 73MiB |
| 1 12412 C ./a.out 73MiB |
| 1 12419 C ./a.out 73MiB |
| 1 12424 C ./a.out 73MiB |
| 1 12431 C ./a.out 73MiB |
| 1 12438 C ./a.out 73MiB |
| 1 12445 C ./a.out 1MiB |
| 1 12480 C ./a.out 1MiB |
| 1 12652 C a.out 1MiB |
| 1 12659 C a.out 1MiB |
+-----------------------------------------------------------------------------+