Printf and thread idx problems

Hi every one

I wrote a simple program to test execution time of a kernel. The kernel may seem useless but I want to improve the code in the kernel, which is useless alone but make sense in my project.

But I have two problems:

  • printf doesn’t display in the shell

  • The value of rsolfi doesn’t change in the kernel.

Here is my code

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#define XBLOCK 256

#define YBLOCK 1

#define XGRID 84

#define YGRID 1

/** Kernel function **/

__global__ void KernelTest(float rsol1, float rsol2, float rsol3, float rsol4, float* rsolfi)

{

	// Thread index

	int idx = (blockIdx.x * YGRID + blockIdx.y) * XBLOCK * YBLOCK + (threadIdx.x * YBLOCK + threadIdx.y);

	

// To test the printf

	printf("idx=%d\n",idx);

	

	float rsolA, rsolB, rsolfiinter;

	int icouche = 5, icouchefi;

	

	if( rsol1<0.f ){

		if( rsol2>0.f ) rsolA= rsol2;

		else 			rsolA= -800.e6f;

	}

	else{

		if( rsol2>0.f ) rsolA= fmin(rsol1,rsol2);

		else 			rsolA= rsol1;

	}

	

	if( rsol3<0.f ){

		if( rsol4>0.f ) rsolB= rsol4;

		else 			rsolB= -800.e6f;

	}

	else{

		if( rsol4>0.f ) rsolB= fmin(rsol3,rsol4);

		else 			rsolB= rsol3;

	}

	

	

	if( rsolA>0.f ){

		if( rsolB>0.f ){

			if( rsolA<rsolB ){

				rsolfiinter= rsolA;

				icouchefi= icouche;

			}

			else{

				rsolfiinter= rsolB;

				icouchefi= icouche-1;

			}

		}

		else{

			rsolfiinter= rsolA;

			icouchefi= icouche;

		}

	}

	else{

		rsolfiinter=rsolB;

		icouchefi= icouche-1;

	}

	

	if(idx==0){

		printf("Save of rsolfi\n");

		*rsolfi = rsolfiinter;

	}

}

int main(){

	

	/** Initialisation **/	

	float rsol1 = 65.f;

	float rsol2 = 3.f;

	float rsol3 = -800.e6f;

	float rsol4 = -4.21f;

	float rsolfi=666.f;

	

	clock_t debut, fin;

	double temps;

	

	/** Launch of the kernel **/

	dim3 blockSize(XBLOCK,1);

	dim3 gridSize(XGRID,1);

	

	debut = clock();

	KernelTest<<<gridSize, blockSize>>>(rsol1,rsol2, rsol3, rsol4, &rsolfi);

	fin = clock();

	

	temps = (double) (fin-debut)/CLOCKS_PER_SEC;

	printf("\nKernel execution time: %lf\n", temps);

	

	// Look at the solution

	printf("rsol1=%f - rsol2=%f - rsol3=%f - rsol4=%f - rsolfi=%f\n",rsol1, rsol2, rsol3,rsol4,rsolfi);

}

Can you see some problems?

And by the way if you have good ideas to improve my code, let me know ( I am trying to reduce the number of ‘if’ but it is not such easy that it seems).

Thanks

Call cudaDeviceSynchronize() after the kernel and before the program exits. Printf() output only happens on certain occasions. See Appendix B.15.2 of the Programming Guide.

Thanks for your answer but it changes nothing.

Does it work on your computer?

And the final value of rsolfi (the float*) doesn’t change in the kernel.

Here is the makefile

CC = nvcc

EXEC = Prog

#=============Options============#

CFLAGS = -g -G -arch=sm_20 -O3 # -Xptxas -v

IFLAGS = -I /usr/local/cuda/include

IFLAGS += -I /usr/local/NVIDIA_GPU_Computing_SDK/C/common/inc

IFLAGS += -I /usr/include/hdf/

IIFLAGS = -ldf -lmfhdf -ljpeg

LFLAGS =

LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/common/lib/linux

LFLAGS += -L /usr/local/cuda/lib64

LFLAGS += -L /usr/local/cuda/lib

LFLAGS += -L /usr/local/cuda/bin

LFLAGS += -L /usr/local/NVIDIA_GPU_Computing_SDK/C/lib

LFLAGS += -lcuda -lcudart

#####################################################################################

all: $(EXEC)

$(EXEC): test.o

	$(CC) $^ $(IFLAGS) $(IIFLAGS) $(LFLAGS) -o $(EXEC)

test.o: test_solution.cu

	$(CC) -c $< $(CFLAGS) $(IFLAGS) $(DFLAGS) -o $@

clean:

	rm -f *.o *~ ./Prog

I am working on CentOS6

The driver installed is 295.33

Then it’s likely your kernel doesn’t execute at all. Check return codes for errors.

I implemented only conditional structures so I don’t know which errors I can check.

Now I get a very strange behaviour.
The printf display like I want (I don’t know where come from the miracle) but the result is not save in the pointer rsolfi. And the execution time is now 0 or 0.01 sec whereas it was 4sec before.
My kernel only displays printf and does not compute

Do you know where is my mistake? Do you have some hints?

I don’t know what that means but check the return code of the cudaDeviceSynchronize() after the kernel invocation.

You’re passing a host pointer, &rsolfi, into a device kernel. You need to allocate memory on the device and then copy the result back.

Thank you very much, I wasn’t enough careful.