Copying objects to device with virtual functions

Hi
I’m writing my first cuda program, and I’m kind of stuck.

The issue is I need to get a couple classes to GPU, that have virtual functions, I need to call on GPU.
It took me a while, but I figured out that I was crashing because I can’t simply copy those, because the pointer is in the object, which gets copied too.

To circumvent this, I created a kernel, that I call after every single copy, which recreates the object from the copied data.

template<typename T>
__global__ void fixVirtualPointers(T *other) {
	*other = T(other);
}

Unfortunately this didn’t work, I still crash at the same spot, even though that kernel executed properly beforehand.

How am I supposed to solve this?

The limitation is pointed out in the programming guide.

One suggestion would be to create your objects on the device, rather than in host code. Whatever method you use to initialize the objects in host code could presumably be reimplemented as a device function. So pass your ordinary initialization data to the device. Create your objects on the device from raw data, similarly to how you are creating them on the host. I acknowledge that this doesn’t allow for easy interoperability of such objects between host and device code, but that is the nature of the limitation.

Another obvious suggestion of course, is to redesign your processing flow/approach so that copying of objects with virtual functions between host and device is not needed. The previous paragraph is just a specific possibility within the more general suggestion here.

Read beyond this point at your own risk.

Since the mechanics of class virtual functions are (AFAIK) implementation-specific, I hesitate to suggest another option. However, the following ugly mod to your fixup code seems to work for me:

template<typename T>
__global__ void fixVirtualPointers(T *other) {
        T temp =  T(*other);
        memcpy(other, &temp, sizeof(T));
}

this is based on my own experimentation that suggests that an ordinary object copy never copies the virtual function pointer table. A small modification to above (change global to host device) would probably allow for a “fixup” function that could work in either direction (H->D or D->H)

Someone may come along and say that this is a really bad idea. Use at your own risk. I’m reasonably sure this suggestion is exploring undefined behavior. The only sense of “correctness” I can attempt to ascribe to this code is that the difference between the host object and device object presumably only exists at the data level, not structural/organization/size level (otherwise a general H->D or D->H cudaMemcpy of an object would not work), and that this idea probably applies to the virtual function pointer table as well (thin ice here), and that the difference in virtual function pointers must be contained within the object itself, not external to the object (seems self-evident). Which is a bunch of hand-waving. YMMV.

Thanks, this seems to have solved the issue. I finally managed to get the code run successfully.
It crashes above a certain thread count for some reason, so I should probably solve this normally, just in case it is related to this.

I just realized that I am using a virtual function to copy objects to the device.
Is that an issue too, even if I don’t have a device equivalent, and as such I never call it?

If you never copy a particular object to the device, then there are no (CUDA-imposed) limitations on the usage of that object in host code.

virtual functions/polymorphism should work in host code exactly as you would expect, for objects created in host code. This should be more or less orthogonal to CUDA.

The concern arises, as stated in the documentation, when you copy an object created on the host to the device, or conversely, when you create an object on the device and then copy it to host code. In either of those cases, the virtual function pointer table in the object is no longer coherent.

After rereading your question, it seems that the above description is not what you were asking. If you are copying an object to the device, but you never invoke a virtual function (*), it should be OK. However this is a gray area, as the writing in the programming guide does not actually read this way. it simply says:

[url]Programming Guide :: CUDA Toolkit Documentation

“It is not allowed to pass as an argument to a global function an object of a class with virtual functions.”

I cannot abrogate the programming guide. If you want to follow the letter of the law, then do not do this, and I refer you to my original comment previously in this thread.

(*) It also seems to me, that for complex object manipulation code, guaranteeing this, the burden of which is entirely on the programmer, might be difficult or non-obvious or non-trivial.

My issue is I’m trying to port a fairly complex c++ code over to GPU, and it doesn’t work, and I don’t know why.
I’m really just trying to make sure I didn’t misunderstand anything.

Though now it seems like I have issues with code not related to this, and now I just need to manually debug what I messed up.

Anyway, thanks for the help.