Hello,
I wrote the following (very simple) kernel:
global void MyKernel(float *out)
{
//<<<gridDim.x , blockDim.x>>
// Orientation
int id = blockIdx.x * blockDim.x + threadIdx.x;
// Compute results
out[id] = id;
}
#define N_VECTORS 5
#define N_SAMPLES 7
Then I run the kernel with:
cudaMalloc((void**)&devOut, N_VECTORS * N_SAMPLES * sizeof(float));
MyKernel<<<N_VECTORS,N_SAMPLES>>>(devOut);
I also ran it with:
MyKernel<<<N_SAMPLES,N_VECTORS>>>(devOut);
Can you tell why in both cases I’m getting the same output ?
Best regards,
Z.V