Passing variables by reference to device function

betete · August 27, 2010, 12:59pm

Hi, I need help with passing arguments to device functions. Simply, what is giving me trouble is the following case:

I have 2 device functions:
device void foo1( float* a )
{
…
foo2( a ); //Notice I have not used the “&” this time as “a” is already a pointer.
…
}
device void foo2( float* b )
{
…
*b = *b + 2;
…
}

In the kernel I have the following:
global void kernel ( … )
{
float x;
…
foo1( &x);
…
}

Well, the code doesn’t work. While debugging I noticed that when calling foo1(), the parameter (address of x) is passed ok, but when foo1() calls foo2() there is a problem. What I see is that b is placed in a register. I can’t see its content.

Any idea? Is it recommended not to pass arguments by reference?

Cliff_Woolley · August 27, 2010, 10:30pm

Can you post a complete example that reproduces the issue? Also, what version of the CUDA Toolkit are you using?

Thanks,

Cliff

Cliff_Woolley · August 27, 2010, 10:30pm

Can you post a complete example that reproduces the issue? Also, what version of the CUDA Toolkit are you using?

Thanks,

Cliff

betete · August 28, 2010, 1:51pm

Thanks Cliff for your reply. The code is too long to put it here. I’m using CUDA Toolkit 3.1. I just want to know if passing arguments this way should work. When debugging, I see that some arguments (pointers actually) are in a register, and I can’t see their content, so I don’t know whether they are being passed ok, or not. I’ll try to put a portion of the code here.

betete · August 28, 2010, 1:51pm

Thanks Cliff for your reply. The code is too long to put it here. I’m using CUDA Toolkit 3.1. I just want to know if passing arguments this way should work. When debugging, I see that some arguments (pointers actually) are in a register, and I can’t see their content, so I don’t know whether they are being passed ok, or not. I’ll try to put a portion of the code here.

jan.heckman · August 28, 2010, 4:52pm

Hi,

The following code works afaik, so is there really a problem?

EDIT: replaced the pointers in the device routines with references, which is more easily readable,

since you can use the passed variables without derefencing.

This is not available for global functions, the compiler will issue an error.

[codebox]include <stdio.h>

include <cuda.h>

include “cutil_inline.h”

device void foo2( float &b )

{

//...

b = b + 2;

//...

}

device void foo1( float &a )

{

//.....

foo2( a ); //Notice you don't need an & as the compiler understands the reference.

//.....

}

global void kernel ( float *result, float kp )

{

//...

foo1( kp);

*result=kp;                      // result cannot be passed by reference (__global__ function)

//..

}

int main()

{

float a=3.f,*d_result,h_result;

cudaMalloc(&d_result,sizeof(*d_result));

kernel <<<1,1>>>(d_result,a);

cutilCheckMsg( "Kernel execution failed" );

cudaThreadSynchronize();

cudaMemcpy(&h_result,d_result,sizeof(h_result),cudaMemcpyDeviceToHos

t);

printf("3+2=%.0f\n",h_result);

return 0;

}

[/codebox]

jan.heckman · August 28, 2010, 4:52pm

Hi,

The following code works afaik, so is there really a problem?

EDIT: replaced the pointers in the device routines with references, which is more easily readable,

since you can use the passed variables without derefencing.

This is not available for global functions, the compiler will issue an error.

[codebox]include <stdio.h>

include <cuda.h>

include “cutil_inline.h”

device void foo2( float &b )

{

//...

b = b + 2;

//...

}

device void foo1( float &a )

{

//.....

foo2( a ); //Notice you don't need an & as the compiler understands the reference.

//.....

}

global void kernel ( float *result, float kp )

{

//...

foo1( kp);

*result=kp;                      // result cannot be passed by reference (__global__ function)

//..

}

int main()

{

float a=3.f,*d_result,h_result;

cudaMalloc(&d_result,sizeof(*d_result));

kernel <<<1,1>>>(d_result,a);

cutilCheckMsg( "Kernel execution failed" );

cudaThreadSynchronize();

cudaMemcpy(&h_result,d_result,sizeof(h_result),cudaMemcpyDeviceToHos

t);

printf("3+2=%.0f\n",h_result);

return 0;

}

[/codebox]

tera · August 28, 2010, 5:18pm

What happens if you move foo2 in front of foo1? Inlined functions are supposed to precede the calling functions, and all functions are inlined by default under CUDA.

Apart from the inlining issue, you might need a function prototype for foo2 to indicate it takes a float* as argument, not an int.

tera · August 28, 2010, 5:18pm

What happens if you move foo2 in front of foo1? Inlined functions are supposed to precede the calling functions, and all functions are inlined by default under CUDA.

Apart from the inlining issue, you might need a function prototype for foo2 to indicate it takes a float* as argument, not an int.

Cliff_Woolley · August 28, 2010, 5:31pm

Ah, yes, I agree this is probably what was going on: foo2()'s prototype was missing.

Cliff_Woolley · August 28, 2010, 5:31pm

Ah, yes, I agree this is probably what was going on: foo2()'s prototype was missing.

betete · August 30, 2010, 8:03pm

Sorry for not posting the complete code. Actually, the prototypes are defined before first function call for all the functions. I had once that problem and learnt the lesson.

I think I solved the problem. It was just some constant values that had wrong values.

And I say “I THINK”, because I have another problem. I cannot debug the hole code. If I compile the hole code with de -g and -G options, everything goes ok, but when I try to debug the code, it seams symbols are missing, because when breaking in the kernel, it says “Single stepping until exit from function Z3LORP3argPfS1_S1_S1”, and I cannot trace the code. So, what I have to do is remove parts of the code, calls to some “device functions”, and that way I can single step. The problem is that I need all the code to check that everything works ok.

Any idea why the compiler may not be including debug symbols?

betete · August 30, 2010, 8:03pm

Sorry for not posting the complete code. Actually, the prototypes are defined before first function call for all the functions. I had once that problem and learnt the lesson.

I think I solved the problem. It was just some constant values that had wrong values.

And I say “I THINK”, because I have another problem. I cannot debug the hole code. If I compile the hole code with de -g and -G options, everything goes ok, but when I try to debug the code, it seams symbols are missing, because when breaking in the kernel, it says “Single stepping until exit from function Z3LORP3argPfS1_S1_S1”, and I cannot trace the code. So, what I have to do is remove parts of the code, calls to some “device functions”, and that way I can single step. The problem is that I need all the code to check that everything works ok.

Any idea why the compiler may not be including debug symbols?

Passing variables by reference to __device__ function

Passing variables by reference to device function