Single GPU local kernel debug

I tried to debug an example project (matrixmul) using cuda toolkit 5 and nsight 3 (VS2012, GTS450 driver 310.90). VS can connect to nsight monitor correctly and execution stops at the breakpoint set in the kernel. However, step by step execution does not work: CPU usage is 100% and nothing happens. stopping the debugging makes the system non-responsive (probably because TDR is turned off).
Has anyone experience with single gpu local debugging?

Do you see the same issue when debugging the CUDA samples? And I’m assuming you are using VS2010?

Yes, I use VS2010, sorry.

I tried the matrixmul sample from the cuda toolkit and from the nsight samples zip file. Debugging fails for both. I also tried the same SW setup with a dual-GPU system (Intel + Optimus GT540M) where everything works correctly. I will also try a different single-GPU system soon.

What brand is your GTS 450, and which version and bitness of Windows are you running?

My GTS450 is made by Gigabyte, I use Win7 32 bit. I tried to reinstall everything, including the OS, still does not work.

I also tried it on my home desktop which has a GTX460 (integrated Intel GPU disabled) and Win7 64 bit, debugging works fine there.

Update: kernel debug works on Win7 64 bit with the same computer & SW versions.

Hi Tessier,
sorry you are running into this issue.
So you think you narrowed it down to the following:
Config:

  • Win 7 32-bit
  • GTS 450 (only card on the system)
  • Driver: 310.90

Repro: any debug application (32-bit of course) hits the breakpoint you set (anywhere specific in the code?) without a problem, but single stepping fails (system hang).

Can you confirm?

Yes, breakpoint is hit, I can watch variable values, etc.
Clarification: after pressing single step, nothing happens, but in my latest tests only execution “hangs” - CUDA debugging can be stopped and the system remains usable. I could not systematically reproduce the complete system hang. After OS reinstall TDR was not disabled, but it did not seem to be activated, so the complete hang may not be a real issue.

I used the simple matrixmul example, breakpoint set in the inner for loop (line 103, Csub += As[ty][k] * Bs[k][tx];). I commented the unroll pragma, just in case. I am using cuda toolkit 5 and nsight 3.

I had a similar problem. I could set breakpoints but couldn’t step. I could see variable values, but if I stepped, the program would end.

Now it’s worse - when I hit a breakpoint, my screen goes black and I can’t see anything until TDR kicks in. When the screen comes back I see that I’m at the CUDA breakpoint I set, but it’s not possible to really do anything, as the GPU has been reset by Windows at that point.

Profiling seems to work ok, though - just can’t debug.

System info:

  • Win7 Enterprise 64-bit, SP1
  • GeForce GTX 480 (from GTC 2012)
  • Driver version 314.07
  • Nsight Visual Studio Edition 2.2.0.12313
  • Visual Studio 2010 Pro, SP1

Never mind. I just realized that Nsight 2.2 is not compatible with CUDA 5.0. I’m installing Nsight 3.0 now…