CUDA-gdb changes SASS code

Hi everyone,

I am trying to debug SASS code with CUDA-gdb but I found out when you add -g -G options for nvcc compiler, the generated SASS code will be changed dramatically. Does anyone know how to debug or trace original SASS without intrusive behavior? Thanks!

Hi, echobrad

cuda-gdb can debug optimized SASS code in a limited fashion, just like gdb can debug optimized cpu applications in a limited fashion. You can still “set cuda break_on_launch application” and single-step instructions with “stepi”, and disassemble instructions with “x/i $pc”. You can also break on line numbers if the application is compiled with the “-lineinfo” option for nvcc.

http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#options-for-altering-compiler-linker-behavior
http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#ptxas-options

Please note: when debugging optimized code with the “-lineinfo” option, symbol information may be inaccurate or unavailable, and line number information may also be inaccurate.

Thanks, Veraj,

I tried debugging with optimized SASS code. It is cool! However, can I also debug library code for optimized binary. I saw “set cuda break_on_launch all” in CUDA-GDB documents to supporting debug for cuda library code such as memset. However, it seems that I cannot do it for optimized binary, while debug binary is fine.

Hi, echobrad

“set cuda break_on_launch all” will break on all kernels launched, including system ones. Try just “set cuda break_on_launch application” as I suggested, and provide a full working example of what you are trying to do.

What is the exact issue you are running into?

Hi Veraj,

I would like to trace all SASS kernels GPU launched including both application and system(most mean drivers). So I tried with “set cuda break_on_launch all”, but I found there is not any other system kernel launched. I tried with vectorAdd and it just break on vectorAdd kernel and I tried simpleCUBLAS and it does not stop on any kernels. I just guess that vectorAdd and simpleCUBLAS should invoke other system(driver) kernels but I did not find them with cuda-gdb.

Hi,
cuDNN and cuBLAS kernels are application kernels, not system kernels. System kernels are kernels like memcpy, memset, etc, mostly provided for tracing. These should not be relied upon for tracing as, depending on the size of the memory transaction, CUDA may take a different path to perform the transaction that does not involve launching a kernel, like using a copy engine. In this case, you won’t see a breakpoint or any notification with cuda-gdb. If you need to trace these calls, I recommend placing a CPU breakpoint or use gdb’s tracepoint feature in cuda-gdb on cudaMalloc, cudaMemset, etc.

That said, I am unable to reproduce the inability to break on kernel with the simpleCUBLAS example 2 listed here:
http://docs.nvidia.com/cuda/pdf/CUBLAS_Library.pdf
Compiled with:
nvcc -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -O3 -Xcompiler -msse -ccbin gcc -m64 -I/usr/local/cuda-8.0/include cublas.cu -o cublas -L/usr/local/cuda-8.0/lib64 -lcublas -lm -lstdc++ -lpthread

(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) r
Starting program: ./cublas
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7ffff29fc700 (LWP 25157)]
[New Thread 0x7ffff21fb700 (LWP 25158)]
[New Thread 0x7ffff19fa700 (LWP 25159)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x0000000000c0ac88 in void scal_kernel_val<float, float, 0>(cublasScalParamsVal<float, float>)<<<(1,1,1),(256,1,1)>>> ()
(cuda-gdb) bt
#0 0x0000000000c0ac88 in void scal_kernel_val<float, float, 0>(cublasScalParamsVal<float, float>)<<<(1,1,1),(256,1,1)>>> ()
(cuda-gdb) x/i $pc
=> 0xc0ac88 <_Z15scal_kernel_valIffLi0EEv19cublasScalParamsValIT_T0_E+8>: MOV R1, c[0x0][0x20]

Please keep in mind that break_on_launch is a performance heavy command that increases your execution time by an amount that scales with the number of kernels compiled for your system. If you plan to use this, please be patient, cuBlas is rather large. We are aware of the issue and are working to fix this in a future release.