Illegal memory access with scalar variables
My cuda program has a strange behaviour.
I detected a memory problem with the tool cuda-memcheck. So I ran the program under cuda-gdb with the cuda memcheck enabled in order to better understand the problem and it appeared the following error:
[code]
Program received signal CUDA_EXCEPTION_10, Device Illegal Address.
[/code]
The thing that I don't understand is that the instruction which generates this error is a simple addition between a local and a shared scalar variables (no arrays, no pointers).
If I check the stack of kernel, for one of these variable there is the following warning:
[code]
warning: Variable is not live at this point. Returning garbage value.
[/code]
I initialized the local variables in declaration and the shared variables at the begin of kernel with the code:
[code]
if(tid==0){
// initialization
}
__syncthreads();
[/code]
Can anyone explain me the reason of this behaviour? (or give me some suggestions?)

I use cuda 4.0 on a tesla M2070 (I'm not the administrator of the system so I cannot update it to the version 4.1 or 4.2, but if I want I can use also the cuda 3.2 or 3.1). The operating system is Linux Red Hat 5.5 Enterprise 64 bit.
My cuda program has a strange behaviour.

I detected a memory problem with the tool cuda-memcheck. So I ran the program under cuda-gdb with the cuda memcheck enabled in order to better understand the problem and it appeared the following error:



Program received signal CUDA_EXCEPTION_10, Device Illegal Address.


The thing that I don't understand is that the instruction which generates this error is a simple addition between a local and a shared scalar variables (no arrays, no pointers).

If I check the stack of kernel, for one of these variable there is the following warning:



warning: Variable is not live at this point. Returning garbage value.


I initialized the local variables in declaration and the shared variables at the begin of kernel with the code:



if(tid==0){

// initialization

}

__syncthreads();


Can anyone explain me the reason of this behaviour? (or give me some suggestions?)



I use cuda 4.0 on a tesla M2070 (I'm not the administrator of the system so I cannot update it to the version 4.1 or 4.2, but if I want I can use also the cuda 3.2 or 3.1). The operating system is Linux Red Hat 5.5 Enterprise 64 bit.

#1
Posted 04/22/2012 10:06 PM   
Why don't you give the relevant code?
Why don't you give the relevant code?

#2
Posted 04/23/2012 05:30 AM   
[quote name='pasoleatis' date='23 April 2012 - 07:30 AM' timestamp='1335159023' post='1399691']
Why don't you give the relevant code?
[/quote]
You're right, but I should post here the whole kernel and I don't know if my boss allows me to do it.
Now I check if I can post part of it.
[quote name='pasoleatis' date='23 April 2012 - 07:30 AM' timestamp='1335159023' post='1399691']

Why don't you give the relevant code?



You're right, but I should post here the whole kernel and I don't know if my boss allows me to do it.

Now I check if I can post part of it.

#3
Posted 04/23/2012 08:33 AM   
Is your code using recursion? You might be running out of stack space.
Is your code using recursion? You might be running out of stack space.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 04/23/2012 09:07 AM   
Scroll To Top