Then the remote connection will be closed, and the gdbserver will print some error as follows:
Process csr_bfs created; pid = 114385
Listening on port 2345
Remote debugging from host 127.0.0.1
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested
Spec:
OS: Ubuntu 16.04 / Ubuntu 18.04
CUDA toolkits: 9.2
CPU: Intel(R) Xeon(R) Gold 6140 CPU / Silver 4108
GPU: Titan V / V100 PCIE 16G
We have several machines with the above specs will face such error.
When I further explore this issue, I can confirm that it is related to gdb (which means that it is not introduced by cuda-gdb).
As my side, gdb version lower than 8.1 will reproduce this issue on my machine. I highly suspect it is related to the Xeon® Scalable Processors family.
I have tried to build gdb 7.12, 8.0 and the latest 8.11. Only the latest 8.11 do NOT has this issue.
So, please try to reproduce this issue on Xeon® Scalable Processors family. If it can be reproduced, it is appreciated to fix it asap.
But I am not sure whether ymm0h and pkru is one issue. Since current cuda-gdb bases on gdb 7.12 (and reports error on ymm0h), I am not sure whether this bug is related to what we are discussing.
cuda-gdbserver :5000 test_particles
Process test_particles created; pid = 18774
Listening on port 5000
Remote debugging from host 192.168.10.86
cuda-gdb/7.12/gdb/gdbserver/regcache.c:264: A problem internal to GDBserver has been detected.
Unknown register ymm0h requested
Just an update: We are actively working on the underlying issue that is causing this problem and, while not imminent, it a fix is expected in an upcoming release (200439277)
I got the same error, and i have three machines, one is host machines, another two are target, and when i using remote debug, one target machines is fine, but another not, and then console show the same msg as :unknown register ymm0h requested
my host and target machines’ configuration as follows:
OS: ubuntu 16.04
cuda Toolkit; cuda 10.0
GPU : tesla V100
cuda-gdb version: 7.12
Unfortunately there isn’t a workaround for this problem on late-model CPUs. The update to resolve this is still pending and will be in an upcoming release.