Virtual funtions in kernels

Hey, so I’m using CUDA 4.0 and have a compute capability 2.1 card, and I want to use some objects with virtual functions in my kernel. If I create the object and then call a virtual function inside the same kernel it works exactly as it’s supposed to, however I’d like to be able to move my data structure containing a variety of objects up to the device, then run a kernel that references it. When I do this executing a virtual function it crashes the kernel. I’ve tried two ways and I think this is what is going on with them:

  1. Copy the objects from the host into global device memory using cudaMalloc and cudaMemcpy. I think the problem with this is that I create the objects on the host so the virtual function tables refer to host memory which when accessed on the device cause the crash.

  2. Create the objects on the device and save the pointers to be called with a different kernel later. I think the problem here is that the virtual function tables created reference the kernels I call to create the objects (which basically just contain a single ‘new’ statement), and the kernel I want to use to do my work has those functions at different addresses, resulting in a crash.

All the data in the objects gets transferred fine and is correct when I run a kernel later so I’m pretty sure I’m getting the objects on the device correctly otherwise.

Is it possible to use a virtual function of an object inside a kernel that the object was not created in? Could I somehow reference my working kernel in my object creation kernels to get the right function addresses? Ideally I would like to be able to do it from the host. Is this even what the problem actually is?

Thanks.

What about static pointers? Depends on how you use the mentioned object…

You can have a separate kernel that does a placement new on the buffer allocated on the host. That seems to overwrite the virtual table on host by that on device. Then you can call virtual functions in kernels.

I am not sure if it is portable though.

Something like this:

on host:
obj_host = new MyObject();
cudaMalloc(&obj_dvce, sizeof(MyObject));
cudaMemcpy(obj_dvce, obj_host, sizeof(MyObject)); // now obj_dvce has virtual table pointing to host memory
initObject<<<1, 1>>> (&obj_dvce); // after this kernel, the obj_dvce has virtual table pointing to device memory
other_kernels_that_call_virtualfunctions<<<…>>>(&obj_dvce);

global initObject(MyObject *obj)
{
if (threadIdx.x == 0 && blockIdx.x == 0)
new (obj) MyObject();
}

You can allocate space for a pointer on the device first, use an initialize kernel to assign to it using new, and pass this pointer on to other kernels:

BaseClass ** d_ptr;

cudaMalloc(&d_ptr, sizeof(BaseClass*));

init_kernel<<<1,1>>>(d_ptr);

kernel_that_uses_virtual_calls<<<g,b>>>(d_ptr);

clean_kernel<<<1,1>>>(d_ptr);

cudaFree(d_ptr);

__global__

void init_kernel(BaseClass** p)

{

  *p = new DerivedClass();

}

__global__

void kernel_that_uses_virtual_calls(BaseClass **p)

{

  (*p)->someVirtualFunc();

}

__global__

void clean_kernel(BaseClass **p)

{

  delete *p;

}