For anyone who is interested, I recently added an interactive debugger to the Ocelot PTX Emulator to help me track down bugs in ocelot. It may also be useful to people doing cuda development so I thought I would mention it here. To try it out, pull the latest revision of ocelot from subversion ( svn checkout https://gpuocelot.googlecode.com/svn/trunk/ gpuocelot ), enter the ptx 2.1 branch (gpuocelot/branches/ocelot-ptx2.1) and build ocelot (libtoolize; aclocal; autoconf; automake; sudo make install)
To enable the debugger, turn it on in the config file (configure.ocelot):
trace: {
.
.
.
debugger: {
enabled: true,
kernelFilter: "testShr",
alwaysAttach: false
},
.
.
.
}
Enabled determines whether or not the debugger is on, kernelFilter specifies the name of a specific kernel to attach the debugger to, and alwaysAttach will cause the debugger to attach to every kernel that is launched.
Here’s an example of using it.
Launch a CUDA program linked against ocelot.
normal@atom:~/checkout/gpuocelot/branches/ocelot-ptx-2.1$ ./TestCudaSequence
It should attach the debugger to a kernel (here I specified the kernel ‘testShr’). You should see something like this
(ocelot-dbg) Attaching debugger to kernel 'testShr'
(ocelot-dbg)
Get a help message:
(ocelot-dbg) Attaching debugger to kernel 'testShr'
(ocelot-dbg)
(ocelot-dbg) h
_ ___ _.--.
\`.|\..----...-'` `-._.-'_.-'`
/ ' ` , __.--'
)/' _/ \ `-_, /
`-'" `"\_ ,_.-;_.-\_ ',
_.-'_./ {_.' ; /
{_.-``-' {_/
This is the Ocelot Interactive PTX Debugger
Commands:
help (h) - Print this message.
jump (j) - Jump current warp to the specified PC.
remove (r) - Remove a breakpoint from a specific PC.
break (b) - Set a breakpoint at the specified PC.
print (p) - Print the value of a memory resource.
asm (a) - Print instructions near the specified PC.
reg (r) - Print the value of a register.
mem (m) - Print the values near an address.
warp (w) - Print the current warp status.
pc - Print the PC of the current warp.
loc (l) - Print the nearest CUDA source line.
step (s) - Execute the next instruction.
continue (c) - Run until the next breakpoint.
quit (q) - Detach the debugger, resume execution.
You can print out the kernel instructions
(ocelot-dbg) print asm
(0) - mov.u32 %r0, __cuda___cuda_local_var_24552_30_storage16
(1) - mov.u32 %r1, %ctaid.x
(2) - mov.u32 %r2, %ntid.x
(3) - mul.lo.u32 %r3, %r1, %r2
(4) - mov.u32 %r4, %tid.x
(5) - add.u32 %r5, %r3, %r4
(6) - mul.lo.u32 %r6, %r5, 4
(7) - ld.param.u32 %r7, [__cudaparm_testShr_B + 4]
(8) - add.u32 %r8, %r7, %r6
(9) - ld.global.s32 %r9, [%r8 + 0]
Step through the program:
(ocelot-dbg) s
(0) - mov.u32 %r0, __cuda___cuda_local_var_24552_30_storage16
(ocelot-dbg) s
(1) - mov.u32 %r1, %ctaid.x
(ocelot-dbg) s
(2) - mov.u32 %r2, %ntid.x
(ocelot-dbg) s
(3) - mul.lo.u32 %r3, %r1, %r2
(ocelot-dbg) s
(4) - mov.u32 %r4, %tid.x
(ocelot-dbg) s
(5) - add.u32 %r5, %r3, %r4
(ocelot-dbg) s
(6) - mul.lo.u32 %r6, %r5, 4
(ocelot-dbg) s
(7) - ld.param.u32 %r7, [__cudaparm_testShr_B + 4]
(ocelot-dbg) s
(8) - add.u32 %r8, %r7, %r6
(ocelot-dbg) s
(9) - ld.global.s32 %r9, [%r8 + 0]
View the state of the current warp
CTA ID: (0, 0, 0)
Warp ID: 0
PC: 23
Context Stack Depth: 3
Active Mask: [10101010101010101010101010101010]
View the current register file contents:
(ocelot-dbg) print r
THREAD 0 THREAD 1 THREAD 2 THREAD 3 THREAD 4
R0 0 0 0 0 0
R1 0 0 0 0 0
R2 20 20 20 20 20
R3 0 0 0 0 0
R4 0 1 2 3 4
R5 0 1 2 3 4
R6 0 4 8 c 10
R7 b3934c0 b3934c0 b3934c0 b3934c0 b3934c0
R8 b3934c0 b3934c4 b3934c8 b3934cc b3934d0
R9 0 2 4 6 8
View memory (X values are out of range)
(ocelot-dbg) p m b3934b0
000000000b3934b0 | XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX
000000000b3934b8 | XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXX
000000000b3934c0 | 0000000200000000 0000000200000000 0000000200000000 0000000200000000 0000000200000000
000000000b3934c8 | 0000000600000004 0000000600000004 0000000600000004 0000000600000004 0000000600000004
000000000b3934d0 | 0000000a00000008 0000000a00000008 0000000a00000008 0000000a00000008 0000000a00000008
000000000b3934d8 | 0000000e0000000c 0000000e0000000c 0000000e0000000c 0000000e0000000c 0000000e0000000c
000000000b3934e0 | 0000001200000010 0000001200000010 0000001200000010 0000001200000010 0000001200000010
000000000b3934e8 | 0000001600000014 0000001600000014 0000001600000014 0000001600000014 0000001600000014
000000000b3934f0 | 0000001a00000018 0000001a00000018 0000001a00000018 0000001a00000018 0000001a00000018
000000000b3934f8 | 0000001e0000001c 0000001e0000001c 0000001e0000001c 0000001e0000001c 0000001e0000001c
Also, the debugger will automatically attach to programs that segfault. For example:
__global__ void segfault(int* value)
{
*value = 0;
}
int main()
{
segfault<<<1, 1>>>((int*)0xdeadbeef);
return 0;
}
Will generate the following message:
normal@atom:~/checkout/gpuocelot/branches/ocelot-ptx-2.1$ ./errors
(ocelot-dbg) Global memory access violation at 0xdeadbeef
(ocelot-dbg) Attaching ocelot debugger.
(ocelot-dbg) Breaking into program now!
This should be in the main ocelot release around september, I’ll also add support for inspecting other memory spaces and the function call stack relatively soon. Any suggestions or feature requests would be welcome.