Illegal memory access with scalar variables

My cuda program has a strange behaviour.

I detected a memory problem with the tool cuda-memcheck. So I ran the program under cuda-gdb with the cuda memcheck enabled in order to better understand the problem and it appeared the following error:

Program received signal CUDA_EXCEPTION_10, Device Illegal Address.

The thing that I don’t understand is that the instruction which generates this error is a simple addition between a local and a shared scalar variables (no arrays, no pointers).

If I check the stack of kernel, for one of these variable there is the following warning:

warning: Variable is not live at this point. Returning garbage value.

I initialized the local variables in declaration and the shared variables at the begin of kernel with the code:

if(tid==0){

   // initialization

}

__syncthreads();

Can anyone explain me the reason of this behaviour? (or give me some suggestions?)

I use cuda 4.0 on a tesla M2070 (I’m not the administrator of the system so I cannot update it to the version 4.1 or 4.2, but if I want I can use also the cuda 3.2 or 3.1). The operating system is Linux Red Hat 5.5 Enterprise 64 bit.

Why don’t you give the relevant code?

You’re right, but I should post here the whole kernel and I don’t know if my boss allows me to do it.

Now I check if I can post part of it.

Is your code using recursion? You might be running out of stack space.