Nsight VSE 5.3 cannot debug in VS2015
Hello,I have Win7 Home SP1 x64 Visual Studio Community 2015 (14.0.25431.01 Update 3) CUDA 8.0 Nsight VSE 5.3 (build 5.3.0.17162) CPU: i5-3570K @3.4GHz Prim. GPU: Intel HD 4000 (on CPU) Sec. GPU : GeForce GTX 660 (2GB) Driver: NVIDIA 384.76 = (22.21.13.8476) I'm trying CUDA Debugger tutorial: http://docs.nvidia.com/nsight-visual-studio-edition/5.3/Content/Using_CUDA_Debugger.htm => matrixMul_vc100.vcpxroj I did all steps incl. - Rebuild matrixMul (Debug, win32) - set breakpoints - Ex3: Start Nsight Monitor - Start CUDA Debugging Output from: Nsight [color="green"]CUDA context created : 0053e8d8 CUDA module loaded: 058d0e08 matrixMul.cu [/color] Output in CMD window: [color="orange"][Matrix Multiply Using CUDA] - Starting... GPU Device 0: "GeForce GTX 660" with compute capability 3.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel...[/color] and nothing happend for 10 minutes. If I run it w/o Debugger, it immediatelly shows: [color="orange"]done[/color] and after few seconds it shows the rest: [color="orange"]Performance= 42.83 GFlop/s, Time= 3.060 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block Checking computed result for correctness: OK Note: For peak performance, please refer to the matrixMulCUBLAS example. Press any key to continue . . . [/color] Any hint what to do to be able debug the cuda code? To see Locals etc?
Hello,I have
Win7 Home SP1 x64
Visual Studio Community 2015 (14.0.25431.01 Update 3)
CUDA 8.0
Nsight VSE 5.3 (build 5.3.0.17162)
CPU: i5-3570K @3.4GHz
Prim. GPU: Intel HD 4000 (on CPU)
Sec. GPU : GeForce GTX 660 (2GB)
Driver: NVIDIA 384.76 = (22.21.13.8476)

I'm trying CUDA Debugger tutorial:
http://docs.nvidia.com/nsight-visual-studio-edition/5.3/Content/Using_CUDA_Debugger.htm
=> matrixMul_vc100.vcpxroj

I did all steps incl.
- Rebuild matrixMul (Debug, win32)
- set breakpoints
- Ex3: Start Nsight Monitor
- Start CUDA Debugging

Output from: Nsight

CUDA context created : 0053e8d8
CUDA module loaded: 058d0e08 matrixMul.cu


Output in CMD window:

[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 660" with compute capability 3.0

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...


and nothing happend for 10 minutes.

If I run it w/o Debugger, it immediatelly shows:

done

and after few seconds it shows the rest:

Performance= 42.83 GFlop/s, Time= 3.060 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: OK

Note: For peak performance, please refer to the matrixMulCUBLAS example.
Press any key to continue . . .


Any hint what to do to be able debug the cuda code? To see Locals etc?

Beginner in CUDA

#1
Posted 08/19/2017 12:45 PM   
Hi, Sorry for the delay, does your issue still exist? matrixMul is fully tested in our system. Best Regards Harry
Hi,

Sorry for the delay, does your issue still exist? matrixMul is fully tested in our system.

Best Regards
Harry

#2
Posted 08/29/2017 02:20 AM   
Hi, The issue is w/ Nsight. When I did "Start CUDA Debugging", it started app but it looked like something wrong w/ debugging - after 10 minutes no breakpoint reached , no debug info like Locals shown in VS. After 14-16 minutes it reset GPU (like TDR) but TDR delay is 3600s. How the screen should look like? Is there another example for Nsight Debugging? Some screen shots would be appreciated. Thanks. Martin
Hi,
The issue is w/ Nsight. When I did "Start CUDA Debugging", it started app but it looked like something wrong w/ debugging - after 10 minutes no breakpoint reached , no debug info like Locals shown in VS.
After 14-16 minutes it reset GPU (like TDR) but TDR delay is 3600s.
How the screen should look like?
Is there another example for Nsight Debugging? Some screen shots would be appreciated.
Thanks.
Martin

Beginner in CUDA

#3
Posted 08/29/2017 08:43 AM   
Looks like you are using the cuda samples in nsight folder, could you have a try on cuda samples in cuda sdk, which is located at "c:\programdata\nvidia corporation\cuda samples\v8.0". I will try to repo it on the samples in nsight folder tomorrow
Looks like you are using the cuda samples in nsight folder, could you have a try on cuda samples in cuda sdk, which is located at "c:\programdata\nvidia corporation\cuda samples\v8.0".

I will try to repo it on the samples in nsight folder tomorrow

#4
Posted 08/29/2017 10:06 AM   
Here is what it should look like, I can debug matrixMul in nsight samples on my GTX 660. I'm not sure, but could you try to disable the intel gpu in bios, interl gpu may interrupt the debugging
Here is what it should look like, I can debug matrixMul in nsight samples on my GTX 660.

I'm not sure, but could you try to disable the intel gpu in bios, interl gpu may interrupt the debugging
Attachments

Untitled.png

#5
Posted 08/30/2017 07:04 AM   
I tried matrixMul_vs2015.sln from C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\0_Simple\matrixMul but it's the same like before. :(
I tried matrixMul_vs2015.sln from C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\0_Simple\matrixMul but it's the same like before. :(

Beginner in CUDA

#6
Posted 08/31/2017 05:13 PM   
[quote=""]Here is what it should look like, I can debug matrixMul in nsight samples on my GTX 660. I'm not sure, but could you try to disable the intel gpu in bios, interl gpu may interrupt the debugging[/quote] Regarding to disable intel gpu: I set intel gpu as primary b/c if Nvidia was primary (w/ LCD connected), the picture on LCD flickered and Win7 was unresponsive during CUDA code running (until TDR reset GPU). Curretly: Nvidia GPU has no LCD connected. Both LCDs are on intel GPU. Also ref. "Setup Local Headless GPU Debugging" : On Windows 7, it's recommended that users run their CUDA applications on a headless GPU. That's my case. Nvidia is headless.
said:Here is what it should look like, I can debug matrixMul in nsight samples on my GTX 660.

I'm not sure, but could you try to disable the intel gpu in bios, interl gpu may interrupt the debugging


Regarding to disable intel gpu: I set intel gpu as primary b/c if Nvidia was primary (w/ LCD connected), the picture on LCD flickered and Win7 was unresponsive during CUDA code running (until TDR reset GPU).
Curretly: Nvidia GPU has no LCD connected. Both LCDs are on intel GPU.

Also ref. "Setup Local Headless GPU Debugging" :
On Windows 7, it's recommended that users run their CUDA applications on a headless GPU.
That's my case. Nvidia is headless.

Beginner in CUDA

#7
Posted 08/31/2017 05:34 PM   
What about Nsight VSE User setting? Launch - Launch Project or External program?
What about Nsight VSE User setting?
Launch - Launch Project or External program?

Beginner in CUDA

#8
Posted 08/31/2017 06:00 PM   
# sorry, duplicated update. It can be removed.
# sorry, duplicated update. It can be removed.

Beginner in CUDA

#9
Posted 08/31/2017 06:20 PM   
Right click your project and start CUDA debugging, it should work. You can use the 1_Utilities\deviceQuery in cuda sample to find out how many cuda devices you have.
Right click your project and start CUDA debugging, it should work.

You can use the 1_Utilities\deviceQuery in cuda sample to find out how many cuda devices you have.

#10
Posted 09/01/2017 02:19 AM   
The "right click" and "menu Nsight" do the same => open cmd window, show Nsight connected, but nothing more. Output from device query: [color="green"]C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\1_Utilities\deviceQuery\../. ./bin/win64/Debug/deviceQuery.exe Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 660" CUDA Driver Version / Runtime Version 9.0 / 8.0 CUDA Capability Major/Minor version number: 3.0 Total amount of global memory: 2048 MBytes (2147483648 bytes) ( 5) Multiprocessors, (192) CUDA Cores/MP: 960 CUDA Cores GPU Max Clock rate: 1098 MHz (1.10 GHz) Memory Clock rate: 3004 Mhz Memory Bus Width: 192-bit L2 Cache Size: 393216 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Mo del) Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simu ltaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Versi on = 8.0, NumDevs = 1, Device0 = GeForce GTX 660 Result = PASS Press any key to continue . . . [/color]
The "right click" and "menu Nsight" do the same => open cmd window, show Nsight connected, but nothing more.

Output from device query:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\1_Utilities\deviceQuery\../.
./bin/win64/Debug/deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 660"
CUDA Driver Version / Runtime Version 9.0 / 8.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 5) Multiprocessors, (192) CUDA Cores/MP: 960 CUDA Cores
GPU Max Clock rate: 1098 MHz (1.10 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 393216 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536),
3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Mo
del)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Versi
on = 8.0, NumDevs = 1, Device0 = GeForce GTX 660
Result = PASS
Press any key to continue . . .

Beginner in CUDA

#11
Posted 09/01/2017 09:07 AM   
Here are pics from CUDA Info 1:

Beginner in CUDA

#12
Posted 09/01/2017 09:15 AM   
Yeah, according to your picture, the debugger should work, did you build your app with in debug mode? Also only the bp in __global__ and __device__ functions can be hit, you cannot debug the cpu code in nsight.
Yeah, according to your picture, the debugger should work, did you build your app with in debug mode? Also only the bp in __global__ and __device__ functions can be hit, you cannot debug the cpu code in nsight.

#13
Posted 09/01/2017 09:22 AM   
Yes I build it as Debug, x64 - see pic.
Yes I build it as Debug, x64 - see pic.

Beginner in CUDA

#14
Posted 09/01/2017 09:26 AM   
Should I try to reinstall Nsight or CUDA Toolkit? I installed it when Nvidia was primary GPU, then I found out that it must be headless, so changed BIOS setting to make intel Primary and Nvidia secondary.
Should I try to reinstall Nsight or CUDA Toolkit?
I installed it when Nvidia was primary GPU, then I found out that it must be headless, so changed BIOS setting to make intel Primary and Nvidia secondary.

Beginner in CUDA

#15
Posted 09/01/2017 09:29 AM   
Scroll To Top

Add Reply