Bizzare variable value changes in middle of kernel run?

Greetings. I’m currently finishing off a GPU based kd-tree range search and the Visual Studio Cuda Debugger is suggesting some bizzare errors I don’t understand.

My code takes an element out of an array at the position equal to it’s thread id. Thread 0 will load array[0], thread 1 array[1] ect. The value loaded from the array is then used as a pointer to values in other arrays throughout the kernel.

When running the debugger, about half way through the kernel, the variable which holds the value loaded from the array will magically change to a value from a different array position. So for the first half of the kernel my code will be dealing with the value from array[4] and upon entering a fixed-sized for loop the debugger will state the array value has changed to that of array[32] - without re-hitting any of the code breaks higher up in the kernel.

The first time this happened was in a branching if-statement later in the code. I replaced the entire branch with a single xor statement and now the change happens higher in the code (in a for loop).

Does anyone know if:
a)The cuda debugger ever jumps between different kernel runs mid kernel instead of finishing the kernel being debugged.
b)Any factors which cause variables to modify themselves without any statements which would cause them to do so?
c)Any false assumptions I’ve made about the cuda debugger? The arrays im using reside in global memory and the local variables in each kernel as small enough to fit in the kernels. I can post the code which the error happens in, but its a very long piece to work through :/

Edit: If I instruct the Visual Studio based Cuda Debugger to continue to the next break point, it will swap the array position. If I manually proceed to the next line of code, one line at a time, the code behaves as it should. If anyone knows what this behaviour is, id appreciate it.

You might try right clicking on the Nsight CUDA Info panel and explicitly “freezing” the warps you’re not interested in.

The debugger also has a set of “global freeze” options for controlling stepping through warps.

I’ve found the “Scheduler Locking Resume Warp” option to be quite useful.

“If I instruct the Visual Studio based Cuda Debugger to continue to the next break point, it will swap the array position. If I manually proceed to the next line of code, one line at a time, the code behaves as it should”

so, are you saying when you single-step your code, it works fine, but if you let it run from (break-) point A to break-point B, it misbehaves?

it may be a race, i may think, as you may now interfere with normal execution - scheduling - through single-stepping - single stepping might influence what gets where when, i would think
hence, confirm no races exist

Thanks for responding. Through extensive (clumsy) testing, it appears that my code starts jumping between random threads that the kernel is running (mid-kernel). I found out by repetitively checking the “CUDA Debug Focus”. The odd parts are that

1)The thread I’m interested in does not hit any errors and I can switch back to it with the debug focus everytime it switches to a different thread - which I land up doing extremely often.

2)There are no race conditions or even shared resources in the kernel. One shared resource actually, I keep a counter for I atomically increase so different kernels can get a unique address to print results to if they need to. Other then that, there seems no reason for the constant thread swapping.

Global freeze options seemed to have no effect, though they seem like a nice feature. It turns out that single stepping my code does still cause random swaps on occassion.

What seems to be working for me has been to place an array==value condition on every break in my kernel. This seems to keep the the debugger looking at the thread im interested in. I’ll keep at it and post if I figure out whats causing the jumping about.

A few more suggestions…

(1) Turn on the memory checker:

(2) If you’re not already, switch to a 64-bit CUDA and host application. I only say this because my very recent experience is that 32-bit kernels are not precisely debugging or reporting memory check errors under CUDA 7.5 RC.

(3) Look at the CUDA Info window “warp view” and see what’s happening with your warp lanes.

i find i had to reread, and i find that i am still confused

initially i thought you are noting/ describing a potential bug in your code, more or less resembling undefined behaviour as a thread seems to misbehave
now i find myself led to believe that you might be (simply) noting/ describing ‘difficulties’ with the debugger
you note threads jumping, but now it seems more in the context of the debugger, than the code

if you wish to follow a single thread, and if you have multiple breakpoints, and/ or breakpoints that may potentially be triggered/ reached multiple times, likely because of loops, you may want to disable (not necessarily remove) breakpoints when they are reached, such that other threads do not again trigger the breakpoint later on
the debugger would normally jump to the latest breakpoint being triggered, and the thread that triggered it
the debugger allows you to both disable and remove breakpoints

@little_jimmy: The single sentence answer to the issues I’ve been having would (probably) read: When running multiple threads of a single kernel, any of the threads hitting a breakpoint can update the debugger’s watched values.

When I first posted, I thought I had some error in my code - as the values which were suddenly changing always changed to the same unrelated values. By the time I typed my second post here (and after looking at the “CUDA Debug Focus”), I’ve been pretty sure it was just me trying to use the debugger in a way which doesn’t work.

I’ve managed to (probably) keep the debugger looking at the thread I am interested in by placing a unique value condition on all of my breakpoints - I do still need all the breakpoints active to see what my code is doing. Manually stepping through usually works, unless it hits an actual code error, in which case I think the debugger switches to a still working thread.

Still, I think my question has been fully answered here. You and allanmac really helped me understand how the debugger works- it hadn’t occurred to me that the debugger could also suffer from race conditions. Thanks for both taking the time to reply.