attach cuda-gdb to a running process failed

Hello,

I have a problem with attaching cuda-gdb to a running process.
It did attach to the process, but I couldn’t do any regular gdb commands like bt, p, etc. It’ll print out this error:

cuda-gdb/7.12/gdb/cuda-coords.c:1093: internal-error: cuda_current_device: Assertion `cuda_focus_is_device ()’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

I’ve tried on both my own application and cuda samples that comes with the release and all failed on this error. How can I do with it? The normal “launch program under cuda-gdb” way works but I need to attach it dynamically.

Thanks!

And I’m running on redhat and have no problem using gdb to attach a running process. The CUDA version is 9.0, and the single GPU I’m using is k20c

Hi, hzhang86

I can not reproduce the issue using Tesla K80+Ubuntu16.04

ubuntu@ip-172-31-40-48:/usr/local/cuda/samples/6_Advanced/c++11_cuda$ ./c++11_cuda &
[1] 6131
ubuntu@ip-172-31-40-48:/usr/local/cuda/samples/6_Advanced/c++11_cuda$
ubuntu@ip-172-31-40-48:/usr/local/cuda/samples/6_Advanced/c++11_cuda$
ubuntu@ip-172-31-40-48:/usr/local/cuda/samples/6_Advanced/c++11_cuda$ cuda-gdb GPU Device 0: “Tesla K80” with compute capability 3.7

Read 3223503 byte corpus from ./warandpeace.txt
–pid=6131
NVIDIA (R) CUDA Debugger
9.1 release
Portions Copyright (C) 2007-2017 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”.
Attaching to process 6131
[New LWP 6136]
[New LWP 6137]
Reading symbols from /usr/local/cuda-9.1/samples/6_Advanced/c++11_cuda/c++11_cuda…done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1…(no debugging symbols found)…done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0…(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2…(no debugging symbols found)…done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libstdc++.so.6…(no debugging symbols found)…done.
Reading symbols from /lib/x86_64-linux-gnu/libgcc_s.so.1…(no debugging symbols found)…done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6…(no debugging symbols found)…done.
Reading symbols from /lib64/ld-linux-x86-64.so.2…(no debugging symbols found)…done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6…(no debugging symbols found)…done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libcuda.so.1…(no debugging symbols found)…done.
Reading symbols from /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.387.26…(no debugging symbols found)…done.
0x00007ffdc2f6dbdf in clock_gettime ()
$1 = -193949872

Thread 1 “c++11_cuda” received signal SIGURG, Urgent I/O condition.
[Switching focus to CUDA kernel 0, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 12, warp 1, lane 0]
0x00000000022fcf40 in xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}::operator()(char) const (this=0x3fff9e803fff9eb, c=32 ’ ') at c++11_cuda.cu:84
84 if (c == x) return true;
(cuda-gdb) p
$2 = -193949872
(cuda-gdb) bt
#0 0x00000000022fcf40 in xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}::operator()(char) const (this=0x3fff9e803fff9eb, c=32 ’ ') at c++11_cuda.cu:84
#1 0x000000000263a858 in thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>::operator*() (this=0x3fff9e0)
at /usr/local/cuda-9.1/bin/…//include/thrust/system/cuda/detail/util.h:252
#2 thrust::system::detail::sequential::reduce<thrust::detail::seq_t, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, long, thrust::plus >(thrust::system::detail::sequential::execution_policythrust::detail::seq_t&, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, thrust::system::detail::sequential::execution_policythrust::detail::seq_t&, thrust::plus, thrust::plus) (begin=…, end=…, init=, binary_op=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/system/detail/sequential/reduce.h:61
#3 0x00000000023f0d40 in thrust::reduce<thrust::detail::seq_t, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, long, thrust::plus >(thrust::detail::execution_policy_basethrust::detail::seq_t const&, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, thrust::detail::execution_policy_basethrust::detail::seq_t const&, thrust::plus, thrust::plus) (exec=, first=…, last=…, init=, binary_op=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/detail/reduce.inl:71
#4 0x000000000227ba90 in thrust::cuda_cub::reduce_n<thrust::cuda_cub::par_t, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, long, long, thrust::plus >(thrust::cuda_cub::execution_policythrust::cuda_cub::par_t&, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, long, thrust::plus, thrust::plus) (policy=, first=…, num_items=, init=, binary_op=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/system/cuda/detail/reduce.h:981
#5 0x00000000022e6bb8 in thrust::cuda_cub::count_if<thrust::cuda_cub::par_t, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>(thrust::cuda_cub::execution_policythrust::cuda_cub::par_t&, thrust::iterator_traits, thrust::iterator_traits, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}) (policy=, first=0x3fff9ec “”, last=0x3fff9eb “w”, unary_pred=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/system/cuda/detail/count.h:55
#6 0x00000000023116d0 in thrust::count_if<thrust::cuda_cub::par_t, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>(thrust::detail::execution_policy_basethrust::cuda_cub::par_t const&, thrust::iterator_traits, thrust::iterator_traits, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}) (exec=, first=0x3fff9ec “”, last=0x3fff9eb “w”, pred=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/detail/count.inl:51
#7 0x0000000002473a58 in xyzw_frequency_thrust_device<<<(1,1,1),(1,1,1)>>> (count=0x12062c0000,
text=0x12052c0000 “The Project Gutenberg EBook of War and Peace, by Leo Tolstoy\n\nThis eBook is for the use of anyone anywhere at no cost and with\nalmost no restrictions whatsoever. You may copy it, give it away or\nre-u”, n=3223503) at c++11_cuda.cu:82
(cuda-gdb) n
85 return false;
(cuda-gdb) n
thrust::system::detail::sequential::reduce<thrust::detail::seq_t, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, long, thrust::plus >(thrust::system::detail::sequential::execution_policythrust::detail::seq_t&, thrust::cuda_cub::transform_input_iterator_t<long, char*, xyzw_frequency_thrust_device(int*, char*, int)::{lambda(char)#1}>, thrust::system::detail::sequential::execution_policythrust::detail::seq_t&, thrust::plus, thrust::plus) (begin=…, end=…, init=0, binary_op=…)
at /usr/local/cuda-9.1/bin/…//include/thrust/system/detail/sequential/reduce.h:61
61 result = wrapped_binary_op(result, *begin);

Hi, hzhang86

Please check if any X running “ps -ef|grep X”

Also please check using 4_Finance/binomialOptions to attach and paste the whole output from the launch of the sample and then the attach procedure. Thanks !

./binomialOptions &

It will be great if you can also check on other GPU if you have.

Hello, veraj

I tried on another GPU machine with installed v7.5 cuda toolkit and it works(also k20c)! But on my machine, it doesn’t. Attaching to ./binomialOptions succeeds and it gives me backtrace, but I found that the stacktrace is always on the CPU side, not the GPU side (it’s probably because GPU part runs too fast so when I finished typing pid, it’s already in the CPU part code, and that’s pretty much like attaching gdb to a cpu only program, which did work). Running ps -ef|grep X returns nothing but “hzhang86 19392 16611 0 16:05 pts/1 00:00:00 grep --color=auto X”.
Here’s the whole process of failure with c++11_cuda example:

[hzhang86@nexcor:~/samples/6_Advanced/c++11_cuda]$GPU Device 0: “Tesla K20c” with compute capability 3.5

Read 3223503 byte corpus from ./warandpeace.txt
cuda-gdb --pid=19394
NVIDIA (R) CUDA Debugger
9.0 release
Portions Copyright (C) 2007-2017 NVIDIA Corporation
GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”.
Attaching to process 19394
[New LWP 19398]
[New LWP 19399]
Reading symbols from /home/hzhang86/samples/6_Advanced/c++11_cuda/c++11_cuda…done.
Reading symbols from /lib64/librt.so.1…(no debugging symbols found)…done.
Reading symbols from /lib64/libpthread.so.0…(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
Reading symbols from /lib64/libdl.so.2…(no debugging symbols found)…done.
Reading symbols from /lib64/libstdc++.so.6…(no debugging symbols found)…done.
Reading symbols from /lib64/libm.so.6…(no debugging symbols found)…done.
Reading symbols from /lib64/libgcc_s.so.1…(no debugging symbols found)…done.
Reading symbols from /lib64/libc.so.6…(no debugging symbols found)…done.
Reading symbols from /lib64/ld-linux-x86-64.so.2…(no debugging symbols found)…done.
Reading symbols from /lib64/libcuda.so.1…(no debugging symbols found)…done.
Reading symbols from /lib64/libnvidia-fatbinaryloader.so.384.81…(no debugging symbols found)…done.
0x00007fff8eb9b7c2 in clock_gettime ()
$1 = -309646848

Thread 1 “c++11_cuda” received signal SIGURG, Urgent I/O condition.
[Switching focus to CUDA kernel 0, grid 2, block (0,0,0), thread (0,0,0), device 0, sm 12, warp 1, lane 0]
0x00000000034adbf0 in thrust::raw_pointer_cast<long*> (ptr=0x3fffa28)
at /usr/local/packages/cuda-9.0/bin/…//include/thrust/detail/raw_pointer_cast.h:29
29 return thrust::detail::pointer_traits::get(ptr);
(cuda-gdb) bt
#0 0x00000000034adbf0 in thrust::raw_pointer_cast<long*> (
cuda-gdb/7.12/gdb/cuda-coords.c:1093: internal-error: cuda_current_device: Assertion `cuda_focus_is_device ()’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it. For instructions, see:
http://www.gnu.org/software/gdb/bugs/.

cuda-gdb/7.12/gdb/cuda-coords.c:1093: internal-error: cuda_current_device: Assertion `cuda_focus_is_device ()’ failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n)

Hi, hzhang86

I happened to reproduce the same issue today.
I will report an interval bug to dev and get back to you ASAP.

Thanks !

Hi, hzhang86

I double checked the issue and find it is easily produced with cuda9.0.
But can not reproduce using cuda9.1.

I think this issue already fixed in cuda9.1
Please wait the 9.1 publish.
It should be ready soon.

Hello, Veraj

Cool. Thank you very much for the help!

Hello, Veraj

While I’m waiting for 9.1, which older version would you suggest to try for now?

Before 9.0, there is CUDA 7.5 and CUDA 8.0.
You can have a try.

And you mentioned you have worked properly with CUDA7.5 in previous comments.
I think it is OK to use this.

Hello, Veraj

I have a question about the kernel ID (when I do like info cuda kernels) when I was using cuda-gdb. Is it unique for each kernel? If the same kernel was called twice, will it have two different IDs? Will those IDs be reused if a kernels is finished? Can we know the ID of a particular kernel on the CPU side (right before it’s launched)?

thanks