I think this is a problem in combination with NVRTC… Before NVRTC, I used NVCC to compile kernels into PTX and then load in runtime. I’ve switched to NVRTC, but I no longer see line info in cuda-memcheck log. My NVRTC compilation flags for debug are:
const char* options =
{
arch.c_str().AsChar(), //Device architecture string
“–include-path=CUDA” //Directory with some includes
“-std=c++11”,
“-lineinfo”,
“-G”,
};
In the log, I see stuff like:
Invalid shared write of size 4 at 0x00000b18 in someKernel.
But when I did the same via NVCC, it worked. Running on GTX 780Ti, Win10 Pro x64, driver 376.53, Cuda 8.0