the lazy debugger: catching and debugging incurrences of cudbgReportDriverInternalError()

this is not the first time this occurs
this is not the first time that this is causing me much frustration and plenty of wasted effort
and i am beginning to wonder why one should even bother with sound development practices and use the debugger, when the debugger is lazy

why does the debugger not halt/ terminate on internal errors; in my opinion this is fatal - as fatal as a segmentation error

look at the stack:

poll() at 0x7ffff6a8c8ad	
cudbgApiDetach() at 0x7ffff23cbc22	
cudbgReportDriverInternalError() at 0x7ffff23c6180	
cudbgReportDriverInternalError() at 0x7ffff23c7cff	
cuMemGetAttribute_v2() at 0x7ffff232cf8a	
cudbgGetAPIVersion() at 0x7ffff24485d8	
cudbgGetAPIVersion() at 0x7ffff2448a78	
cuMemGetAttribute_v2() at 0x7ffff2368a04	
cuMemGetAttribute_v2() at 0x7ffff2331c7c	
cuMemGetAttribute_v2() at 0x7ffff2332278	
cuMemGetAttribute_v2() at 0x7ffff22a13f2	
cuMemGetAttribute_v2() at 0x7ffff22a6685	
cuMemcpyDtoDAsync_v2() at 0x7ffff2283759	
cudart::cudaApiMemcpyAsync() at 0x43a47a
cudaMemcpyAsync() at 0x463526

did the debugger halt on registering a driver internal error? No
Should the debugger have halted? Best
Why did the debugger not halt? Good question

below, another one
i am truly fortunate to even pick these up: at present, i suspend the debugger when execution takes longer than expected, knowing that i would likely be greeted by a cudbgReportDriverInternalError()
and it is fatal: it destabilizes the device, and causes erroneous results
the conventional, like cudaGetLastError(), seems to ignore cudbgReportDriverInternalError()
the debugger surely does not stop
i am not sure how one is supposed to debug the causes of cudbgReportDriverInternalError(), when cudbgReportDriverInternalError() is hardly reported

poll() at 0x7ffff6a8c8ad
cudbgApiDetach() at 0x7ffff23cbc22
cudbgReportDriverInternalError() at 0x7ffff23c6180
cudbgReportDriverInternalError() at 0x7ffff23c7cff
cuMemGetAttribute_v2() at 0x7ffff232cf8a
cuMemGetAttribute_v2() at 0x7ffff2349896
cuVDPAUCtxCreate() at 0x7ffff229dc26
cuVDPAUCtxCreate() at 0x7ffff229de43
cuLaunchKernel() at 0x7ffff2286cad
cudart::cudaApiLaunch() at 0x43f2a8
cudaLaunch() at 0x468523

Hi little_jimmy,

Did you solve this? I’m running into a similar problem:

“Error: Internal error reported by CUDA debugger API (error=10). The application cannot be further debugged.”

Which renders cuda-gdb unusable for me.

I’m working on Ubuntu 14.04 with cuda-gdb version 6.5. My code is compiled with the CUDA 7.0 toolkit and my driver version is:

NVRM version: NVIDIA UNIX x86_64 Kernel Module 346.82 Wed Jun 17 10:37:46 PDT 2015
GCC version: gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

thanks,
Thomas Loockx

no, not yet

i made effort to also raise this internally with nvidia, but, honestly, nvidia’s software development team seems ‘over-stretched’ at present

i have found some correlation between stream races (races at the stream level) and this occurrence, to the extent that, whenever i encountered such an instance, i would double check my code for potential stream races

also, the debugger seems to be building and dumping some trace or log in the background; if that grows too large, there is also a tendency for this instance to occur

equally, i have found myself wondering whether the debugger does not seem poorly equipped to handle massively parallel streams involving too many asynchronous synchronization calls