[solved] weird behaviour with RT_CALLABLE_PROGRAM

Hi,
I’m opening a new thread in order not to mix the topics. But slowly I feel like a spammer, due to the amount of problems I have (though I’m not posting all of them).

well, due to the context->compile problem, I switched to using callable programs. I’m not using program objects and rtCallableProgram(…) though, only calling them as functions from the ray generation program.

..
RT_CALLABLE_PROGRAM float G_includingV(const BiDirSubPathVertex& v1, const BiDirSubPathVertex& v2) {
    ..
    if(cosV1 > 0.f && cosV2 > 0.f) {
        ..
        rtTrace(..);
        ..
        if(!shadowPrd.inShadow) {
            return cosV1 * cosV2 / (distance * distance);
        }
        return 0.f;
    }
    else {
        return 0.f;
    }
}

..

RT_PROGRAM void pathtrace_camera()
{
   ...
   // the following code gives the correct result
   float3 c_st;
   c_st = brdf(...);
   float G = G_includingV(lightVertices[j], eyeVertices[i]);
   c_st *= G;
   c_st *= brdf(...);

   // replacing it with the following gives a wrong result
   float3 c_st;
   c_st = brdf(...);
   c_st *= G_includingV(lightVertices[j], eyeVertices[i]);
   c_st *= brdf(...);
   ...
}

I found the correct result only by chance while I wanted to debug. And I honestly don’t understand how to explain this behaviour.

I’m quite experienced with C++ and in C++ it would look to me like a corrupted stack. But I see no way how to corrupt the stack with my program in OptiX. There are now mallocs, no while loops, the only place where pointers are used, are several for loops indexing into 2 arrays.

any other ideas what could be happening here? I’m reluctant to post the full project since there are several compilation dependencies and it’s not very usable.

Did you enable exceptions? If not, you should do so.

yes, I enabled them already when having problems with context->compile() :)

m_context->setExceptionEnabled(RT_EXCEPTION_ALL, true)

You didn’t specify your GPU (Gerofce?) and etc.
As I saw in Programming Guide, GPU with capabilities 1.x requires different work with pointrs.

I’m not using program objects and rtCallableProgram(…) though, only calling them as functions from the ray generation program.<<

You’re doing it wrong. ;-)

Since callable programs are not inlined this should not work at all, because OptiX doesn’t support CUDA function calls outside the provided callable program mechanism. Any abuse of the mechanism should result in undefined behavior.

You must declare a callable program variable, assign your G_includingV() program to that, and call the declared callable program variable inside your ray generation program. That will result in the proper function calls inside the PTX code.

  • Programs need to be handled via program objects.
  • Functions you call should be of the type “forceinline device”.
    (I use a define RT_FUNCTION forceinline device for that.)

Ok, in that case I apparently misunderstood the documentation as it said “making it a RT_CALLABLE_PROGRAM can reduce code replication and compile time” (: but cool, that makes sense then.

I had device inline before (but many other things failing at that time), is inline also unsupported and I need to use forceinline instead?

I’ll test it after breakfast (:

Yes, the callable programs can reduce compilation time and PTX code size when used via the correct rtCallableProgram() mechanism. You’re effectively implementing real function calls with that. The amounts of savings depend on your use case.
Though in the currently shipping OptiX versions there still is a runtime impact when using callable programs.

inline works in most cases but is just a compiler hint. I saw a customer case where CUDA 4.2 chose to not inline a function anymore when adding more function parameters and forceinline solved that.

ok, using rtCallableProgram solved the problem :)

thanks for the help and the info about forceinline. I’ll start using it instead of inline.