[SOLVED]memory leak issues

Hi guys, I’m quite new to cuda and I’m having some issues with cudaMalloc and cudaFree.
The fact is that i get this runtime error:

*** glibc detected *** ./eigenvalues: malloc(): memory corruption: 0x0000000000e265a0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x75e66)[0x7fbea8147e66]
/lib64/libc.so.6(+0x79904)[0x7fbea814b904]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7fbea814c6b1]
/usr/lib64/libcuda.so(+0x1cc289)[0x7fbea708e289]
/usr/lib64/libcuda.so(+0x1ccd46)[0x7fbea708ed46]
/usr/lib64/libcuda.so(+0xe6f52)[0x7fbea6fa8f52]
/usr/lib64/libcuda.so(+0xe71a5)[0x7fbea6fa91a5]
/usr/lib64/libcuda.so(cuMemAlloc_v2+0x71)[0x7fbea6f98741]
/usr/local/cuda/lib64/libcudart.so.5.0(+0x246df)[0x7fbea870e6df]
/usr/local/cuda/lib64/libcudart.so.5.0(+0x112f4)[0x7fbea86fb2f4]
/usr/local/cuda/lib64/libcudart.so.5.0(cudaMalloc+0x68)[0x7fbea8721518]
./eigenvalues[0x401265]
./eigenvalues[0x4017e5]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbea80f0d5d]
./eigenvalues[0x400ca9]
======= Memory map: ========
00400000-00406000 r-xp 00000000 00:13 89456701                           /home/andreapapaluca/cuda/progetto/eigenvalues
00605000-00606000 rw-p 00005000 00:13 89456701                           /home/andreapapaluca/cuda/progetto/eigenvalues
00daa000-00e50000 rw-p 00000000 00:00 0                                  [heap]
.
.
.

which is caused by this cudaMalloc in my code:

void QR_step(float* dev_A,int size){

     float* dev_x;
     float norm;
     cudaMalloc((void**)&dev_x,size*sizeof(float));
     norm=u_gen(dev_A,dev_x,size,0);

     float* dev_P;
     cudaMalloc((void**)&dev_P,size*size*sizeof(float));
     P_gen<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_x,norm,size,0);


     float* P_test=new float(size*size);	
     cudaMemcpy(P_test,dev_P,size*size*sizeof(float),cudaMemcpyDeviceToHost);
     cout<<"P0"<<endl;
     print(P_test,size);   
     //delete[] P_test;	  

	/*float res[9];
	float* dev_res;
	//cudaMalloc((void**)&dev_res,size*size*sizeof(float));
	//matrix_product<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_A,dev_res,size);  
	//cudaMemcpy(res,dev_res,size*size*sizeof(float),cudaMemcpyDeviceToHost);	
	cout<<"P0*A"<<endl;
	print(res,3);
	//delete[] res;*/
	
     float* dev_Q;
     cudaMalloc((void**)&dev_Q,size*size*sizeof(float));     <-- this one i think
	cout<<"pippo"<<endl;
     copy<<<blocksPerGrid,threadsPerBlock>>>(dev_Q,dev_P,size);		
     matrix_product<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_A,dev_A,size);	

//////////////////////////////////////////////////////////////////////////////////////

	P_gen<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_x,norm,size,1);
	float* P=new float(size*size);
	cout<<"P1"<<endl;
	cudaMemcpy(P,dev_P,size*size*sizeof(float),cudaMemcpyDeviceToHost);
	print(P,size);
        .
        .
        .

I think is the one pointed out by the arrow becasue the string “pippo” is not printed to monitor, but I had the same issues with the previous “cudaMalloc((void**)&dev_res,sizesizesizeof(float));” (that’s the reason why it is commented). Hope I’ve explained myself with my poor english and thanks in advance for your help!

try cuda-memcheck. This tool provides more detailed info for you to identify memory access error.

Also, in your code line 37, P should be allocated as float(sizesizesizeof(float)) instead of float(size*size).

LongY you probably want to study how new works.

JakeTheDog this is very likely a (host) stack corruption of some sort. Once the stack is corrupted, a library call will often expose the corruption. The fact that you had a problem at a particular point, commented that out, and the problem moved to the next library call is also a strong indicator of corruption that is waiting to be “exposed” by a call of some sort.

Stack corruption can occur at almost any point in a C program, if you make a mistake. I think it’s unlikely that anyone will be able to tell you what is wrong with the code you have posted, it will probably require a complete program to test. (At least, I don’t see any obvious problems with what you have posted.) Here’s a recent example of the type of investigation that is needed:

[url]https://devtalk.nvidia.com/default/topic/831500/cudaeventrecord-segmentation-fault/[/url]

:(. I haven’t used new operator for a while, and confused new with malloc.
My ignorance must be seen by many people:) At least I will not forget the difference
of these two similar operators.

Thanks guys! i’ve found an error at line 13, i changed the () bracket with the bracket, i thought there was no difference, actually i’m wondering which constructor is called with(), anyway now the memory corruption has moved

andreapapaluca@spyro:~/cuda/progetto$ ./eigenvalues 
bpg: 32

7  2  -2  
2  3  -2  
-2  -2  3  

P0
-0.927173  -0.264906  0.264906  
-0.264906  0.963586  0.0364137  
0.264906  0.0364137  0.963586  

pippo
P1
1  0  0  
0  nan  nan  
0  nan  nan  

*** glibc detected *** ./eigenvalues: malloc(): memory corruption: 0x00000000009e08e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x75be6)[0x7f3acc974be6]
/lib/x86_64-linux-gnu/libc.so.6(+0x78c53)[0x7f3acc977c53]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xc2)[0x7f3acc979012]
./eigenvalues[0x42f093]
./eigenvalues[0x430824]
./eigenvalues[0x423673]
./eigenvalues[0x423cb7]
./eigenvalues[0x403629]
/lib/x86_64-linux-gnu/libc.so.6(+0x36ae2)[0x7f3acc935ae2]
/lib/x86_64-linux-gnu/libc.so.6(+0x36b35)[0x7f3acc935b35]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x104)[0x7f3acc91deb4]
./eigenvalues[0x402c39]
======= Memory map: ========
00400000-0045f000 r-xp 00000000 00:15 89456699                           /home/andreapapaluca/cuda/progetto/eigenvalues
0065e000-00662000 rw-p 0005e000 00:15 89456699                           /home/andreapapaluca/cuda/progetto/eigenvalues
00818000-009fb000 rw-p 00000000 00:00 0                                  [heap]
.
.
.

the nan matrix is a bit scary, could it be related to the mem corruption?
if i run cuda-memcheck i get

andreapapaluca@spyro:~/cuda/progetto$ cuda-memcheck ./eigenvalues
========= CUDA-MEMCHECK
bpg: 32

7  2  -2  
2  3  -2  
-2  -2  3  

P0
-0.927173  -0.264906  0.264906  
-0.264906  0.963586  0.0364137  
0.264906  0.0364137  0.963586  

pippo
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (31,0,0) in block (10,0,0)
=========     Address 0x50312067c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (30,0,0) in block (10,0,0)
=========     Address 0x503120678 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (29,0,0) in block (10,0,0)
=========     Address 0x503120674 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (28,0,0) in block (10,0,0)
=========     Address 0x503120670 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (27,0,0) in block (10,0,0)
=========     Address 0x50312066c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (26,0,0) in block (10,0,0)
=========     Address 0x503120668 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (25,0,0) in block (10,0,0)
=========     Address 0x503120664 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (24,0,0) in block (10,0,0)
=========     Address 0x503120660 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (23,0,0) in block (10,0,0)
=========     Address 0x50312065c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (22,0,0) in block (10,0,0)
=========     Address 0x503120658 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (21,0,0) in block (10,0,0)
=========     Address 0x503120654 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (20,0,0) in block (10,0,0)
=========     Address 0x503120650 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (19,0,0) in block (10,0,0)
=========     Address 0x50312064c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (18,0,0) in block (10,0,0)
=========     Address 0x503120648 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (17,0,0) in block (10,0,0)
=========     Address 0x503120644 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (16,0,0) in block (10,0,0)
=========     Address 0x503120640 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (15,0,0) in block (10,0,0)
=========     Address 0x50312063c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (14,0,0) in block (10,0,0)
=========     Address 0x503120638 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (13,0,0) in block (10,0,0)
=========     Address 0x503120634 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (12,0,0) in block (10,0,0)
=========     Address 0x503120630 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (11,0,0) in block (10,0,0)
=========     Address 0x50312062c is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (10,0,0) in block (10,0,0)
=========     Address 0x503120628 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
========= Invalid __global__ read of size 4
=========     at 0x00000140 in matrix_product(float*, float*, float*, int)
=========     by thread (9,0,0) in block (10,0,0)
=========     Address 0x503120624 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x2c5) [0x14ad95]
=========     Host Frame:./eigenvalues [0x1b768]
=========     Host Frame:./eigenvalues [0x3c0a3]
=========     Host Frame:./eigenvalues [0x3ead]
=========     Host Frame:./eigenvalues [0x37d6]
=========     Host Frame:./eigenvalues [0x3812]
=========     Host Frame:./eigenvalues [0x3388]
=========     Host Frame:./eigenvalues [0x35da]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xfd) [0x1eead]
=========     Host Frame:./eigenvalues [0x2c39]
=========
P1
9  4.57762e-41  -1.19245e-07  
4.57762e-41  2.10195e-44  0  
5.1848e-44  0  0  

========= ERROR SUMMARY: 23 errors

I’m not really familiar with this tool but it seems to me that matrix_product is the problem, right?

Yes, your matrix_product kernel is making an invalid access (out-of-bounds) to global memory.

:( that’s the only function I’ve not written myself…
Anyway thank you very much for support!

It’s possible that the out-of-bounds access is occurring because you did not properly allocate memory for that kernel, in the host code you wrote.

What does u_gen() do with device memory dev_x?

norm=u_gen(dev_A,dev_x,size,0);

here is u_gen :

float u_gen(float* dev_A,float* dev_x,int size,int i){

     load_x<<<blocksPerGrid,threadsPerBlock>>>(dev_A,dev_x,size,i);     
     //float* test=new float(size);
     //cudaMemcpy(test,dev_x,(size-i)*sizeof(float),cudaMemcpyDeviceToHost);
     //for(int j=0;j<size-i;++j)	
	//cout<<test[j]<<endl;						//test di load_x
	
     float v_norm[blocksPerGrid];
     float norm=0;

     float* dev_v_norm;
     cudaMalloc((void**)&dev_v_norm,blocksPerGrid*sizeof(float));
     
     scalar_prod<<<blocksPerGrid,threadsPerBlock>>>(dev_x,dev_x,dev_v_norm,size-i);
     cudaMemcpy(v_norm,dev_v_norm,blocksPerGrid*sizeof(float),cudaMemcpyDeviceToHost);

     for(int j=0;j<blocksPerGrid;++j){
     	     norm+=v_norm[j];
     }
     norm=sqrt(norm);
     //cout<<"norm: "<<norm<<endl;		//test norma 

     float* x;
     x=new float(1);	
     cudaMemcpy(x,dev_x,sizeof(float),cudaMemcpyDeviceToHost);
     float a=*x;						//salvo x0 originale per calcolare |u|
     if(*x>0)
	(*x)+=norm;
     else
	(*x)-=norm;
     
     //cout<<"nuovo x[0]: "<<*x<<endl; //test 
		
     cudaMemcpy(dev_x,x,sizeof(float),cudaMemcpyHostToDevice);
     
     float u_norm=2*(pow(norm,2)+a*norm);		//|u|^2

     return u_norm; //ritorno norm perche' mi serve in P_gen

}

and load_x :

__global__ void load_x(float* dev_A,float* dev_x,int size,int i){

	   int tid=threadIdx.x;
	   int bid=blockIdx.x;

	   if(tid+bid*threadsPerBlock<size-i)							// da implementare l'incremento
		dev_x[tid+bid*threadsPerBlock]=dev_A[tid*size+bid*threadsPerBlock+i*(size+1)];	// dei threads in caso la dim
												//della matrice sia troppo grande
}

i still have to implement it for matrices with dimension larger than number of threads, but my test matrix is a 3x3

guys I’m going crazy, I’ve commented every single kernel

void QR_step(float* dev_A,int size){

     float* dev_x;
     float norm;
     cudaMalloc((void**)&dev_x,size*sizeof(float));
     //norm=u_gen(dev_A,dev_x,size,0);

     float* dev_P;
     cudaMalloc((void**)&dev_P,size*size*sizeof(float));
     //P_gen<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_x,norm,size,0);

     float* P_test=new float;	
     cudaMemcpy(P_test,dev_P,size*size*sizeof(float),cudaMemcpyDeviceToHost);
     cout<<"P0"<<endl;
     print(P_test,size);   
     delete[] P_test;	  

	/*float* res=new float[9];
	float* dev_res;
	cudaMalloc((void**)&dev_res,size*size*sizeof(float));
	matrix_product<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_A,dev_res,size);  
	cudaMemcpy(res,dev_res,size*size*sizeof(float),cudaMemcpyDeviceToHost);	
	cout<<"P0*A"<<endl;
	print(res,3);
	delete[] res;*/
	
     float* dev_Q;
     cudaMalloc((void**)&dev_Q,size*size*sizeof(float));
	cout<<"pippo"<<endl;
     //copy<<<blocksPerGrid,threadsPerBlock>>>(dev_Q,dev_P,size);	
		
     //matrix_product<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_A,dev_A,size);	
cout<<"pluto"<<endl;
//////////////////////////////////////////////////////////////////////////////////////

	//P_gen<<<blocksPerGrid,threadsPerBlock>>>(dev_P,dev_x,norm,size,1);
	float* P=new float(size*size);
	cout<<"P1"<<endl;
	cudaMemcpy(P,dev_P,size*size*sizeof(float),cudaMemcpyDeviceToHost);
	print(P,size);
	
// after this point everything is commented
/*
.
.
.
*/
}

but I still get the mem corruption error, instead if i comment line 39 everything goes well, therefore I think something bad happens between this line and line 28, but I really can’t understand why the corruption comes out, essentially I’m only allocating and copying memory from/to device…

well you changed this to use square brackets:

 float* P_test=new float;	

maybe you should change this to use square brackets:

float* P=new float(size*size);

As you already discovered previously, those two constructs don’t do the same thing.


I’m a fool…
thanks txbob solved :)