What's the problem in my code?

I wrote a simple CUDA code to describe the problem. The followed code run and give the correct result (65535 65534…), but if I uncomment the line 26 (sdata[y] = 5.0f;), the results become wrong (0 1 2…).
What’s wrong of this syntax setting value to shared memory?
Thank you.

#include <cuda_runtime.h>
#include <cstdio>
#include <cstdlib>

struct SharedMemory{
	__device__ inline operator float *()
	{
		extern __shared__ int __smem[];
		return (float *)__smem;
	}

	__device__ inline operator const float *() const
	{
		extern __shared__ int __smem[];
		return (float *)__smem;
	}
};

__global__ void kTestFunc(int* arr){
	unsigned int x = threadIdx.x;
	unsigned int y = blockIdx.x;
	unsigned int w = blockDim.x;
	unsigned int offset = w*y + x;

	float* sdata = SharedMemory();
	//sdata[y] = 5.0f;

	arr[offset] = 65535 - arr[offset];
}

int main(void){
	int* hData = new int[65536];
	int* dData;
	cudaMalloc((void**)&dData, sizeof(int) * 65536);
	for (int i = 0; i < 65536; i++)
		hData[i] = i;

	cudaMemcpy(dData, hData, 65536 * sizeof(int), cudaMemcpyHostToDevice);
	kTestFunc << <256, 256 >> >(dData);
	cudaMemcpy(hData, dData, 65536 * sizeof(int), cudaMemcpyDeviceToHost);

	for (int i = 0; i < 10; i++)
		printf("%d ", hData[i]);

	cudaFree(dData);
	delete[] hData;

	return 1;
}

When using dynamically allocated shared memory:

extern __shared__ int __smem[];

You need to allocate the necessary size for it using the 3rd kernel launch configuration parameter:

kTestFunc << <256, 256, ???????? >> >(dData);

This is covered in the programming guide, and there are a great many sample codes that demonstrate proper usage of dynamically allocated shared memory.

If you used proper cuda error checking and ran your code with cuda-memcheck, you would have discovered that errors are being thrown. Doing these basic debug steps before asking for help on a public forum is good practice. If you don’t know what proper cuda error checking is, please google “proper cuda error checking” and take the first hit, and read it, and apply it to your code. If you don’t know what cuda-memcheck is, please google it or refer to the cuda-memcheck documentation available at docs.nvidia.com

Even if you don’t understand the error output using these basic debug steps (cuda-memcheck and error checking), the error output will be useful for others trying to help you.

Oh, yes, I forgot set the size for shared memory… Thank you! Also thanks for the error checking experience.