1. The __device__ functions are always inlined. So if I am trying to pass a struct as an input argument, I don't have to pass the pointer of it to reduce the overhead, right?
2. Likewise, due to inlining, Passing a result pointer to return a complex return data doesn't really improve anything, right? For example
foo2 and foo1 are not really different in the overhead-wise, right?
3. Is reference(&) allowed for the kernel code? like this..
3. I consulted the "programming guide" and "reference manual". Are there any other documentation that I can read? Very little information is in the programming guide.
Well, there is a wealth of information in the programming guide. 90% of all questions on this form could be solved simply if people read it before posting. Your question isn't one of those, though: for whatever reason, NVIDIA has chosen not to document what C/C++ features are allowed in kernels and what are not.
FYI, here are a few more undocumented features:
templated kernel code works very well, though it is technically unsupported as far as I know.
simple classes with __device__ member functions also work if you are very careful in how you write them (i.e. only simple data members, all members inlined, no requirement for dynamic memory, no polymorphism and a few other gotchas I can't think of at the moment)
Taking a pointer and dereferencing it is a fundamental C feature that I should not worry about... it's disappointing. what should I expect to work. :">
I don't know how a compiler inlines a function, but as I don't do any pointer arithmetic, it should be able to figure out and eliminates & and * s and substitute them with regular variable..
Again, I don't know how ppl debug a kernel when it runs in emulation mode and doesn't on the device. I cannot gdb (well there's gdb 2.1 now) or printf to see what's going on it, and ... i can use any feature with a peace of mind.
You must Log In to add a comment.
New Private Message
Follow Us On
Copyright © 2014 NVIDIA Corporation