How to use printf for Fermi

I see that we can use printf in kernel function for fermi cards. But when I compile the example for the programming guide, I got an error: identifier “printf” is undefined

__global__ void helloCUDA(float f) 

{ 

printf("Hello thread %d, f=%f\n", threadIdx.x, f); 

} 

void main() 

{ 

helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

}

I compile the program using “nvcc -arch sm_20 test.cu -o test”

Is there any problem?

I see that we can use printf in kernel function for fermi cards. But when I compile the example for the programming guide, I got an error: identifier “printf” is undefined

__global__ void helloCUDA(float f) 

{ 

printf("Hello thread %d, f=%f\n", threadIdx.x, f); 

} 

void main() 

{ 

helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

}

I compile the program using “nvcc -arch sm_20 test.cu -o test”

Is there any problem?

Add:
#include <stdio.h>

That should do the trick.

Add:
#include <stdio.h>

That should do the trick.

It doesn’t work. The error message changes to:

error: calling a host function from a device/global function is only allowed in device emulation mode

It doesn’t work. The error message changes to:

error: calling a host function from a device/global function is only allowed in device emulation mode

Which version of CUDA are you using?

Which version of CUDA are you using?

I can confirm that doesn’t work with either CUDA 3.1 or the current 3.2rc. I have gotten printf to work on Fermi in other codes before, and looking at the interim outputs from nvcc, this looks like an nvcc bug of some sort. Some magic combinations of code features get the printf definitions propagated into the correct phases of the compilation trajectory, other code (like this one), doesn’t.

I can confirm that doesn’t work with either CUDA 3.1 or the current 3.2rc. I have gotten printf to work on Fermi in other codes before, and looking at the interim outputs from nvcc, this looks like an nvcc bug of some sort. Some magic combinations of code features get the printf definitions propagated into the correct phases of the compilation trajectory, other code (like this one), doesn’t.

I’m using CUDA 3.0.

I can’t believe it’s just a bug of nvcc. How can NVIDIA says they provide this feature when even an example in the guide doesn’t work.

And this seems too simple to be a bug.

I’m using CUDA 3.0.

I can’t believe it’s just a bug of nvcc. How can NVIDIA says they provide this feature when even an example in the guide doesn’t work.

And this seems too simple to be a bug.

[codebox]include <stdio.h>

device void callme(int i, float f)

{

printf(“Hello device thread %d, f=%f\n”, i, f);

}

global void helloCUDA(float f)

{

printf(“Hello thread %d, f=%f\n”, threadIdx.x, f);

callme(threadIdx.x, f);

}

int main()

{

helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

return 0;

}

using GTX460 / 3.2RC

nvcc -arch sm_21 -o pfermi printfermi.cu

paehler@nvidia> pfermi

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

Hello device thread 0, f=1.234500

Hello device thread 1, f=1.234500

Hello device thread 2, f=1.234500

Hello device thread 3, f=1.234500

Hello device thread 4, f=1.234500

[/codebox]

[codebox]include <stdio.h>

device void callme(int i, float f)

{

printf(“Hello device thread %d, f=%f\n”, i, f);

}

global void helloCUDA(float f)

{

printf(“Hello thread %d, f=%f\n”, threadIdx.x, f);

callme(threadIdx.x, f);

}

int main()

{

helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

return 0;

}

using GTX460 / 3.2RC

nvcc -arch sm_21 -o pfermi printfermi.cu

paehler@nvidia> pfermi

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500

Hello device thread 0, f=1.234500

Hello device thread 1, f=1.234500

Hello device thread 2, f=1.234500

Hello device thread 3, f=1.234500

Hello device thread 4, f=1.234500

[/codebox]

I’ve tested this code using GTX480/3.0 and I got same errors.

I’ve tested this code using GTX480/3.0 and I got same errors.

Maybe a 3.0 issue? At least, as my example shows, it works for 3.2RC.

Maybe a 3.0 issue? At least, as my example shows, it works for 3.2RC.

You definitly need at least CUDA 3.1 or CUDA 3.2, since printf was not supported in 3.0. From the release notes of CUDA 3.1:

CUDA 3.2rc (ubuntu) and CUDA 3.1 (Fedora 12) give:

$ nvcc test2.cu -o test2 -arch=sm_20

$ ./test2

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500
#include <stdio.h>

__global__ void helloCUDA(float f)

{

		printf("Hello thread %d, f=%f\n", threadIdx.x, f);

}

int main()

{

		helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

		return 0;

}

You definitly need at least CUDA 3.1 or CUDA 3.2, since printf was not supported in 3.0. From the release notes of CUDA 3.1:

CUDA 3.2rc (ubuntu) and CUDA 3.1 (Fedora 12) give:

$ nvcc test2.cu -o test2 -arch=sm_20

$ ./test2

Hello thread 0, f=1.234500

Hello thread 1, f=1.234500

Hello thread 2, f=1.234500

Hello thread 3, f=1.234500

Hello thread 4, f=1.234500
#include <stdio.h>

__global__ void helloCUDA(float f)

{

		printf("Hello thread %d, f=%f\n", threadIdx.x, f);

}

int main()

{

		helloCUDA<<<1, 5>>>(1.2345f); cudaThreadExit();

		return 0;

}