Matrix multiplication does not work

Hello. I’m trying to use CUDA to perform multiplica- tion of width-matrices. But I’m encountering the following error:
I have two square arrays with dimension 2632x2632. When I try to multiply them, the code does not perform the multiplication, and the answer matrix simply goes blank. I am using shared memory, after some tests I think the problem may be in that part of the code:
for (k = 0; k <blocksize; k ++) value + = a_sub [ty] [k] * b_sub [k] [tx];
What happens is that when I use the “for” loop the code ignores the multiplication operation, so the response matrix is empty. When I change the loop, and use the “while”, the multiplication operation until is performed, but the response matrix is incorrect.
Would anyone know what could be happening?
Thank you!
PS: Sorry for my English, I used Google Translate. =)