Printf and thread idx problems
Hi every one
I wrote a simple program to test execution time of a kernel. The kernel may seem useless but I want to improve the code in the kernel, which is useless alone but make sense in my project.
But I have two problems:
- printf doesn't display in the shell
- The value of rsolfi doesn't change in the kernel.

Here is my code

[code]
#include <stdio.h>
#include <stdlib.h>
#include <time.h>


#define XBLOCK 256
#define YBLOCK 1
#define XGRID 84
#define YGRID 1

/** Kernel function **/
__global__ void KernelTest(float rsol1, float rsol2, float rsol3, float rsol4, float* rsolfi)
{
// Thread index
int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);

// To test the printf
printf("idx=%d\n",idx);

float rsolA, rsolB, rsolfiinter;
int icouche = 5, icouchefi;

if( rsol1<0.f ){
if( rsol2>0.f ) rsolA= rsol2;
else rsolA= -800.e6f;
}
else{
if( rsol2>0.f ) rsolA= fmin(rsol1,rsol2);
else rsolA= rsol1;
}


if( rsol3<0.f ){
if( rsol4>0.f ) rsolB= rsol4;
else rsolB= -800.e6f;
}
else{
if( rsol4>0.f ) rsolB= fmin(rsol3,rsol4);
else rsolB= rsol3;
}


if( rsolA>0.f ){
if( rsolB>0.f ){
if( rsolA<rsolB ){
rsolfiinter= rsolA;
icouchefi= icouche;
}
else{
rsolfiinter= rsolB;
icouchefi= icouche-1;
}
}
else{
rsolfiinter= rsolA;
icouchefi= icouche;
}
}
else{
rsolfiinter=rsolB;
icouchefi= icouche-1;
}

if(idx==0){
printf("Save of rsolfi\n");
*rsolfi = rsolfiinter;
}

}

int main(){

/** Initialisation **/
float rsol1 = 65.f;
float rsol2 = 3.f;
float rsol3 = -800.e6f;
float rsol4 = -4.21f;
float rsolfi=666.f;

clock_t debut, fin;
double temps;

/** Launch of the kernel **/
dim3 blockSize(XBLOCK,1);
dim3 gridSize(XGRID,1);

debut = clock();
KernelTest<<<gridSize, blockSize>>>(rsol1,rsol2, rsol3, rsol4, &rsolfi);
fin = clock();

temps = (double) (fin-debut)/CLOCKS_PER_SEC;
printf("\nKernel execution time: %lf\n", temps);

// Look at the solution
printf("rsol1=%f - rsol2=%f - rsol3=%f - rsol4=%f - rsolfi=%f\n",rsol1, rsol2, rsol3,rsol4,rsolfi);

}
[/code]

Can you see some problems?

And by the way if you have good ideas to improve my code, let me know ( I am trying to reduce the number of 'if' but it is not such easy that it seems).

Thanks
Hi every one

I wrote a simple program to test execution time of a kernel. The kernel may seem useless but I want to improve the code in the kernel, which is useless alone but make sense in my project.

But I have two problems:

- printf doesn't display in the shell

- The value of rsolfi doesn't change in the kernel.



Here is my code





#include <stdio.h>

#include <stdlib.h>

#include <time.h>





#define XBLOCK 256

#define YBLOCK 1

#define XGRID 84

#define YGRID 1



/** Kernel function **/

__global__ void KernelTest(float rsol1, float rsol2, float rsol3, float rsol4, float* rsolfi)

{

// Thread index

int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);



// To test the printf

printf("idx=%d\n",idx);



float rsolA, rsolB, rsolfiinter;

int icouche = 5, icouchefi;



if( rsol1<0.f ){

if( rsol2>0.f ) rsolA= rsol2;

else rsolA= -800.e6f;

}

else{

if( rsol2>0.f ) rsolA= fmin(rsol1,rsol2);

else rsolA= rsol1;

}





if( rsol3<0.f ){

if( rsol4>0.f ) rsolB= rsol4;

else rsolB= -800.e6f;

}

else{

if( rsol4>0.f ) rsolB= fmin(rsol3,rsol4);

else rsolB= rsol3;

}





if( rsolA>0.f ){

if( rsolB>0.f ){

if( rsolA<rsolB ){

rsolfiinter= rsolA;

icouchefi= icouche;

}

else{

rsolfiinter= rsolB;

icouchefi= icouche-1;

}

}

else{

rsolfiinter= rsolA;

icouchefi= icouche;

}

}

else{

rsolfiinter=rsolB;

icouchefi= icouche-1;

}



if(idx==0){

printf("Save of rsolfi\n");

*rsolfi = rsolfiinter;

}



}



int main(){



/** Initialisation **/

float rsol1 = 65.f;

float rsol2 = 3.f;

float rsol3 = -800.e6f;

float rsol4 = -4.21f;

float rsolfi=666.f;



clock_t debut, fin;

double temps;



/** Launch of the kernel **/

dim3 blockSize(XBLOCK,1);

dim3 gridSize(XGRID,1);



debut = clock();

KernelTest<<<gridSize, blockSize>>>(rsol1,rsol2, rsol3, rsol4, &rsolfi);

fin = clock();



temps = (double) (fin-debut)/CLOCKS_PER_SEC;

printf("\nKernel execution time: %lf\n", temps);



// Look at the solution

printf("rsol1=%f - rsol2=%f - rsol3=%f - rsol4=%f - rsolfi=%f\n",rsol1, rsol2, rsol3,rsol4,rsolfi);



}




Can you see some problems?



And by the way if you have good ideas to improve my code, let me know ( I am trying to reduce the number of 'if' but it is not such easy that it seems).



Thanks

#1
Posted 03/27/2012 10:07 AM   
Call cudaDeviceSynchronize() after the kernel and before the program exits. Printf() output only happens on certain occasions. See Appendix B.15.2 of the Programming Guide.
Call cudaDeviceSynchronize() after the kernel and before the program exits. Printf() output only happens on certain occasions. See Appendix B.15.2 of the Programming Guide.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#2
Posted 03/27/2012 10:24 AM   
Thanks for your answer but it changes nothing.
Does it work on your computer?
And the final value of rsolfi (the float*) doesn't change in the kernel.

Here is the makefile

[code]
CC = nvcc
EXEC = Prog

#=============Options============#
CFLAGS = -g -G -arch=sm_20 -O3 # -Xptxas -v
IFLAGS = -I /usr/local/cuda/include
IFLAGS += -I /usr/local/NVIDIA_GPU_Computing_SDK/C/common/inc
IFLAGS += -I /usr/include/hdf/

IIFLAGS = -ldf -lmfhdf -ljpeg

LFLAGS =
LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/common/lib/linux
LFLAGS += -L /usr/local/cuda/lib64
LFLAGS += -L /usr/local/cuda/lib
LFLAGS += -L /usr/local/cuda/bin
LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/lib
LFLAGS += -lcuda -lcudart

#####################################################################################

all: $(EXEC)

$(EXEC): test.o
$(CC) $^ $(IFLAGS) $(IIFLAGS) $(LFLAGS) -o $(EXEC)

test.o: test_solution.cu
$(CC) -c $< $(CFLAGS) $(IFLAGS) $(DFLAGS) -o $@


clean:
rm -f *.o *~ ./Prog
[/code]

I am working on CentOS6
The driver installed is 295.33
Thanks for your answer but it changes nothing.

Does it work on your computer?

And the final value of rsolfi (the float*) doesn't change in the kernel.



Here is the makefile





CC = nvcc

EXEC = Prog



#=============Options============#

CFLAGS = -g -G -arch=sm_20 -O3 # -Xptxas -v

IFLAGS = -I /usr/local/cuda/include

IFLAGS += -I /usr/local/NVIDIA_GPU_Computing_SDK/C/common/inc

IFLAGS += -I /usr/include/hdf/



IIFLAGS = -ldf -lmfhdf -ljpeg



LFLAGS =

LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/common/lib/linux

LFLAGS += -L /usr/local/cuda/lib64

LFLAGS += -L /usr/local/cuda/lib

LFLAGS += -L /usr/local/cuda/bin

LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/lib

LFLAGS += -lcuda -lcudart



#####################################################################################



all: $(EXEC)



$(EXEC): test.o

$(CC) $^ $(IFLAGS) $(IIFLAGS) $(LFLAGS) -o $(EXEC)



test.o: test_solution.cu

$(CC) -c $< $(CFLAGS) $(IFLAGS) $(DFLAGS) -o $@





clean:

rm -f *.o *~ ./Prog




I am working on CentOS6

The driver installed is 295.33

#3
Posted 03/27/2012 10:32 AM   
Then it's likely your kernel doesn't execute at all. Check return codes for errors.
Then it's likely your kernel doesn't execute at all. Check return codes for errors.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#4
Posted 03/27/2012 10:57 AM   
[quote name='tera' date='27 March 2012 - 12:57 PM' timestamp='1332845858' post='1388501']
Then it's likely your kernel doesn't execute at all. Check return codes for errors.
[/quote]

I implemented only conditional structures so I don't know which errors I can check.
[quote name='tera' date='27 March 2012 - 12:57 PM' timestamp='1332845858' post='1388501']

Then it's likely your kernel doesn't execute at all. Check return codes for errors.





I implemented only conditional structures so I don't know which errors I can check.

#5
Posted 03/27/2012 11:43 AM   
Now I get a very strange behaviour.
The printf display like I want (I don't know where come from the miracle) but the result is not save in the pointer rsolfi. And the execution time is now 0 or 0.01 sec whereas it was 4sec before.
My kernel only displays printf and does not compute

Do you know where is my mistake? Do you have some hints?
Now I get a very strange behaviour.

The printf display like I want (I don't know where come from the miracle) but the result is not save in the pointer rsolfi. And the execution time is now 0 or 0.01 sec whereas it was 4sec before.

My kernel only displays printf and does not compute



Do you know where is my mistake? Do you have some hints?

#6
Posted 03/27/2012 12:36 PM   
[quote name='Dext' date='27 March 2012 - 11:43 AM' timestamp='1332848582' post='1388513']
I implemented only conditional structures so I don't know which errors I can check.
[/quote]
I don't know what that means but check the return code of the cudaDeviceSynchronize() after the kernel invocation.
[quote name='Dext' date='27 March 2012 - 11:43 AM' timestamp='1332848582' post='1388513']

I implemented only conditional structures so I don't know which errors I can check.



I don't know what that means but check the return code of the cudaDeviceSynchronize() after the kernel invocation.

Always check return codes of CUDA calls for errors. Do not use __syncthreads() in conditional code unless the condition is guaranteed to evaluate identically for all threads of each block. Run your program under cuda-memcheck to detect stray memory accesses. If your kernel dies for larger problem sizes, it might exceed the runtime limit and trigger the watchdog timer.

#7
Posted 03/28/2012 03:05 PM   
[quote name='Dext' date='27 March 2012 - 05:36 AM' timestamp='1332851776' post='1388522']
Now I get a very strange behaviour.
The printf display like I want (I don't know where come from the miracle) but the result is not save in the pointer rsolfi. And the execution time is now 0 or 0.01 sec whereas it was 4sec before.
My kernel only displays printf and does not compute

Do you know where is my mistake? Do you have some hints?
[/quote]

You're passing a host pointer, &rsolfi, into a device kernel. You need to allocate memory on the device and then copy the result back.
[quote name='Dext' date='27 March 2012 - 05:36 AM' timestamp='1332851776' post='1388522']

Now I get a very strange behaviour.

The printf display like I want (I don't know where come from the miracle) but the result is not save in the pointer rsolfi. And the execution time is now 0 or 0.01 sec whereas it was 4sec before.

My kernel only displays printf and does not compute



Do you know where is my mistake? Do you have some hints?





You're passing a host pointer, &rsolfi, into a device kernel. You need to allocate memory on the device and then copy the result back.

#8
Posted 03/28/2012 04:20 PM   
[quote name='eelsen' date='28 March 2012 - 06:20 PM' timestamp='1332951606' post='1389110']
You're passing a host pointer, &rsolfi, into a device kernel. You need to allocate memory on the device and then copy the result back.
[/quote]

Thank you very much, I wasn't enough careful.
[quote name='eelsen' date='28 March 2012 - 06:20 PM' timestamp='1332951606' post='1389110']

You're passing a host pointer, &rsolfi, into a device kernel. You need to allocate memory on the device and then copy the result back.





Thank you very much, I wasn't enough careful.

#9
Posted 03/30/2012 10:04 AM   
Scroll To Top