this is not the first time this occurs
this is not the first time that this is causing me much frustration and plenty of wasted effort
and i am beginning to wonder why one should even bother with sound development practices and use the debugger, when the debugger is lazy
why does the debugger not halt/ terminate on internal errors; in my opinion this is fatal - as fatal as a segmentation error
look at the stack:
poll() at 0x7ffff6a8c8ad
cudbgApiDetach() at 0x7ffff23cbc22
cudbgReportDriverInternalError() at 0x7ffff23c6180
cudbgReportDriverInternalError() at 0x7ffff23c7cff
cuMemGetAttribute_v2() at 0x7ffff232cf8a
cudbgGetAPIVersion() at 0x7ffff24485d8
cudbgGetAPIVersion() at 0x7ffff2448a78
cuMemGetAttribute_v2() at 0x7ffff2368a04
cuMemGetAttribute_v2() at 0x7ffff2331c7c
cuMemGetAttribute_v2() at 0x7ffff2332278
cuMemGetAttribute_v2() at 0x7ffff22a13f2
cuMemGetAttribute_v2() at 0x7ffff22a6685
cuMemcpyDtoDAsync_v2() at 0x7ffff2283759
cudart::cudaApiMemcpyAsync() at 0x43a47a
cudaMemcpyAsync() at 0x463526
did the debugger halt on registering a driver internal error? No
Should the debugger have halted? Best
Why did the debugger not halt? Good question
below, another one
i am truly fortunate to even pick these up: at present, i suspend the debugger when execution takes longer than expected, knowing that i would likely be greeted by a cudbgReportDriverInternalError()
and it is fatal: it destabilizes the device, and causes erroneous results
the conventional, like cudaGetLastError(), seems to ignore cudbgReportDriverInternalError()
the debugger surely does not stop
i am not sure how one is supposed to debug the causes of cudbgReportDriverInternalError(), when cudbgReportDriverInternalError() is hardly reported
poll() at 0x7ffff6a8c8ad
cudbgApiDetach() at 0x7ffff23cbc22
cudbgReportDriverInternalError() at 0x7ffff23c6180
cudbgReportDriverInternalError() at 0x7ffff23c7cff
cuMemGetAttribute_v2() at 0x7ffff232cf8a
cuMemGetAttribute_v2() at 0x7ffff2349896
cuVDPAUCtxCreate() at 0x7ffff229dc26
cuVDPAUCtxCreate() at 0x7ffff229de43
cuLaunchKernel() at 0x7ffff2286cad
cudart::cudaApiLaunch() at 0x43f2a8
cudaLaunch() at 0x468523
i made effort to also raise this internally with nvidia, but, honestly, nvidia’s software development team seems ‘over-stretched’ at present
i have found some correlation between stream races (races at the stream level) and this occurrence, to the extent that, whenever i encountered such an instance, i would double check my code for potential stream races
also, the debugger seems to be building and dumping some trace or log in the background; if that grows too large, there is also a tendency for this instance to occur
equally, i have found myself wondering whether the debugger does not seem poorly equipped to handle massively parallel streams involving too many asynchronous synchronization calls