cuda-gdb 4.1 no local variables visible can't see anything other than globals at kernel breakpoi

Gosse · April 11, 2012, 7:02pm

Breakpoints can be set in kernels, and the code will stop, however the only things I can inspect are CUDA values like blockDim, and global variables. If one keeps asking for information, or continues in some contexts, the whole cuda-gdb process will eventually crash. A breakpoint set in the kernel “characterize” looks like this:

characterize(unsigned char*, Asset*, MetricsGroup*, MetricsGroup*, bool*)<<<(128,1,1),(8,56,1)>>> (x=warning: Variable is not live at this point. Value is undetermined.
0) at gapms.cu:551

“info locals” at this point tells you that there are no local variables, and trying to print specific variables is the same way.

All the code was compiled with nvcc -g -G

I don’t think debugging has worked since I upgraded to CUDA 4.1 with a clean install of Ubuntu 11.04. It worked fine with 4.0. on Ubuntu 10, in both cases on a Quadro 5000.
I’ve just installed the latest devdrivers as well, but to no effect. I recall that cuda-gdb used to give a warning about garbage values for things that had been optimized out, but this message is different.

Alban_Douillet · April 16, 2012, 5:28pm

A possible reason for the change in behavior is that the variables are not live at the point you are trying to access them. Unlike with CPU debugging, the GPU variable contents are not saved into memory at all times for the debugger to access. It would be taking too much resources and make the program impossible to run in some cases. Instead, the compiler only guarantees that a variable is accessible as long as it is live in the user source code. Are the same variables visible (with info locals or print) later in the same routine?

Also, it is highly recommended to compile the application with the --gencode option to force the generation of SASS code at compile time (and avoid using the JIT compiler). Assuming you are using a NVIDIA card with SM20 compute capability (Fermi), do you compiler your application with:

nvcc -g -G -gencode arch=compute_20,code=sm_20

Gosse · April 16, 2012, 7:01pm

Hello, and thank you for your reply.

I have been using the following compiler arguments for nvcc:

nvcc --use_fast_math -gencode arch=compute_20,code=sm_20 --ptxas-options=-v -g -G --compiler-options "-fopenmp -fexceptions -ffast-math -g -Wall"

and am specifically unable to access local variables at any point in kernel code, even stepping line by line.

This error suggests to me that the value has not simply gone out of scope but that the debugger really doesn’t see the local variables. It is able to print shared and global variables. I just did some more testing and realized it wasn’t restricted to cuda-specific values. Anything in shared memory works, but stepping through and past a variable definition like:

short i = (threadIdx.x * blockDim.y) + threadIdx.y;

still gives the error

No symbol "i" in current context.

A possible reason for the change in behavior is that the variables are not live at the point you are trying to access them. Unlike with CPU debugging, the GPU variable contents are not saved into memory at all times for the debugger to access. It would be taking too much resources and make the program impossible to run in some cases. Instead, the compiler only guarantees that a variable is accessible as long as it is live in the user source code. Are the same variables visible (with info locals or print) later in the same routine?

Also, it is highly recommended to compile the application with the --gencode option to force the generation of SASS code at compile time (and avoid using the JIT compiler). Assuming you are using a NVIDIA card with SM20 compute capability (Fermi), do you compiler your application with:
nvcc -g -G -gencode arch=compute_20,code=sm_20

Alban_Douillet · April 17, 2012, 4:27pm

Are you a registered developer? If yes, did you have the chance to try with early 4.2 release?

Also, if you are a registered developer, could you file a bug against CUDA GDB? If would be very helpful if you could include the source for the function in question, and the output of “cuobjdump -elf” applied to the binary or object file containing the function. This would help determine if the problem comes from missing/corrupted information generated by the compiler or if it is an issue in the debugger itself.

Gosse · April 18, 2012, 12:34am

I’ve just tried it with the 4.2 release candidate and get the same behavior as 4.1. Since this seems to be true of all my kernels, it still seems unlikely to me that there is an outright bug this large in both 4.1 and 4.2.

The output of cuobjdump -elf is attached

object.txt (1.19 MB)

for the kernels I’ve written in the project. Does it appear correct; I don’t know what to make of it myself? There are also many many thrust generated kernel calls in a different object, but I haven’t attempted to debug those, and only included the dump of the object file with my own kernels in it.

Alban_Douillet · April 18, 2012, 6:54pm

I do not see any attached files. Did it upload properly?

Gosse · April 19, 2012, 12:32am

I edited the post with a (re)uploaded file. I didn’t realize that I had to provide a link to the file (the flash uploader didn’t give me the option.)

Thanks again.

Alban_Douillet · April 19, 2012, 5:20pm

Got the file.

The DWARF indicates that variable ‘i’ is live from offset 0x148 to offset 0x2a70 in function _Z12characterizePhP5AssetP12MetricsGroupS3_Pb (demangled name: characterize). The value is stored in hardware register R14 (and PTX register %rs2). So the information generated by the compiler seems to make sense.

Can you print the value of ‘i’ at several points in the function and issue a “x/i $pc” at each step. That output of x/i would show you the offset within the function. If the shown offset is within the range I mentioned above, the value of i should be accessible. If not, I recommend filing a bug against the CUDA GDB team (must a be a registered developer).

Gosse · April 26, 2012, 2:32pm

It does appear that my build is somehow off. The offsets do not line up. I’m still working on making an isolated test case to try to pin it down.