Problem with Threadidx not being set (or always zero)

I am in the process of writing a new Cuda program, and I am seeing a problem I have not
seen before in any of the previous programs I have written.

The symptom is that threadIdx.x, .y and blockIdx.x, .y, etc are always zero. The other builtin
variables seem to be fine, e.g. blockDim, GridDim, warpSize, etc.

Other programs that I have written seem to compile and write fine. My guess is that I am
somehow clobbering something in the kernel, or not initializing something, but its not
obvious. Has anyone else seen this, does anyone have any suggestions ?

I am running Cuda 3.0 on redhat enterprise 5.4 with a GEFORCE GTX 260 (216 core).

Thank you !
MW

Every thread in your kernel gets the same value of threadIdx and blockIdx?

That is correct, both thradIdx.x and blockIdx.x are set to 0 for every thread.

I have tried playing with other related items, such as changing the gridDim and

blockDim and those come over to the thread just fine. I just tried this on a

minimal kernel, which just sets the values of a debug array, which I bring back

and I am getting the same result. So I have to believe that I am doing something

very screwy elsewhere that is causing this. But I first wanted to see if anyone

else has ever seen this symptom.

Thank you to those who have read this and for any replies.

MW

I have never seen this before. Are you sure your kernel is executing at all?

This problem appears to have something to do with using threadIdx, etc to initialize a
variable. If you do a normal assignment to a variable, everything seems to work. This
is Cuda 3.0.

int x = blockIdx.x; // does not work but
int x; x = blockIdx.x; // works.

Thank you again for all your input !

MW

This problem appears to have something to do with using threadIdx, etc to initialize a
variable. If you do a normal assignment to a variable, everything seems to work. This
is Cuda 3.0.

int x = blockIdx.x; // does not work but
int x; x = blockIdx.x; // works.

Thank you again for all your input !

MW

Could you post a small code example that has the same problem as your application? Device as well as host code would be useful here, as it does sound like the kernel itself is being optimized out.