Is it possible to store bindless callable program ID in a buffer, payload for later use?

The programming guide says it’s possible to use buffer IDs in arbitrary structs, data members of ray payload, so I thought that using bindless callable programs could work as well.

Is it possible?

I tried a small test with storing ID or pointer to ID in the ray payload during closest hit program and then calling it afterwards in ray generation program, but was getting this:
“Assertion failed: “var != vlm->m_variables.end””

Here how it was used:

typedef rtCallableProgramId<int()> BsdfEvalProgramID;
struct SubpathPRD
{
    BsdfEvalProgramID bsdfEvalProgID; // also tried storing a pointer
};

// --- some_material.cu
RT_CALLABLE_PROGRAM int bsdfEvalProg() {
  rtPrintf("Print from callable program\n");
}

rtDeclareVariable(BsdfEvalProgramID, bsdfEval, ,);
RT_PROGRAM void closestHit() {
  subpathPrd.bsdfEvalProgID = bsdfEval; // call bsdfEval() works here
  return;
}

// --- entry_point.cu - tried to use callable program as
BsdfEvalProgID callable = BsdfEvalProgID(lightPrd.bsdfEvalProgID);
//or
BsdfEvalProgID(lightPrd.bsdfEvalProgID) callable;
callable();

I’m working on implementation of Vertex Connection and Merging (extension of bidir PT and PPM). I need to store some per hit vertex data on first pass from light source, they are later used together with camera pass data to construct full paths/pdfs.

I can register callable programs for each material in context scope and later call the proper one based on stored material ID in per hit vertex data. But if I could store callable programs IDs in that data, I wouldn’t need to bother with material IDs at all.

What’s your compiler’s streaming multiprocessor target version?

As usual at least this system information would be helpful to investigate this:
OS version, OS bitness, installed GPU(s), display driver version, OptiX version, CUDA Toolkit version.

So is it supposed to be possible to use callable programs as I described? The programming guide doesn’t really explicitly say that.

Here are the code generation parameters
-gencode=arch=compute_20,code="sm_20,compute_20"

Specs:
Windows 8.1 x64, Optix 3.6.0, Cuda v6.0, GeForce GTX 770, driver 337.88
VS2012 64bit, VS2010 Win32 programs builds

Initially when I reported this I had driver 332.88 (before checking I was sure I had the latest). Apparently I had forgot to update the one Windows installed automatically (my motherboard went bad a week ago, so I set up temporary replacement).

Until the driver update I tried to use callable programs simply defined at the context scope and found that it didn’t work as well. They got called, but parameters had wrong values, zeros or QNANQs (pass by reference was crashing with error 700). The shadetree OptiX sample did work fine though.

After your comment I checked the driver version and concluded that it is outdated (Optix 3.6 release notes specify version 335 as minimum). Well, now after driver update the callables still doesn’t work properly, now printfs produce zeros with occasional weird values. Pass by reference crashes with error 716 now.

CORECTION: After writing the previous paragraph, I did a full rebuild of the test project, and now printfs only contain zeros. Really weird, can’t explain. I’m using Visual Studio 2012, it was x64 build.
CORECTION 2: After few rebuilds the weird numbers and QNANQs are back.

A small failing sample with callable attached to the context and called from ray generation program (add this in to CotextInitTest solution in github):

// creating the program on host
Program program = m_context->createProgramFromPTXFile( "test_generator.cu.ptx", "callableTest" );
m_context["callableTest"]->setProgramId(program);

// entry point
rtDeclareVariable(rtCallableProgramId<float(float)>, callable, ,);
RT_PROGRAM void generator()
{
  // prints mostly 0.00000f with occasional random values, but does return 42
  callable(1234.f);
  return;
}

RT_CALLABLE_PROGRAM float callableTest(float n)
{
    rtPrintf("Callable: %f\n", n);
    return 42.f;
}

Output:

I tried Win32 build and it doesn’t even launch (it does if callable call removed from the entry point function, declaration can be left):

Also I found out that bounded callable attached to the context or entry point program results in this:

Apparently I’ll have to drop the idea of using callable programs. Doesn’t seem reliable currently. I see that nljones in other currently active thread experienced exactly the same issue.

For isolation of the issue:
Your code comment // prints mostly 0.00000f with occasional random values, but does return 42 says that the callable program result is correct, just the rtPrintf() is behaving erratic.
That is, when not using rtPrintf() inside the callable program the applications work as expected?

No, it doesn’t. Printf in the callable doesn’t seem to affect.
I changed the callable to return the sum of 42 and the paramter and added a printf checking the returned value:

RT_CALLABLE_PROGRAM float callableTest(float cosnt n)
{
    //rtPrintf("Callable: %f\n", n);
    return 42.f + n;
}

RT_PROGRAM void generator()
{
    float a = callable(1234.f);
    rtPrintf("Callable reurned: %f\n", a);
    return;
}

Without printf in the callable I get this:

One of the results with printf in the callable was similar:

Without printf there were also cases of all 42s or QNANQs returned, and also something like this

Btw, you didn’t clarify if my initial idea of storing callable program ID is intended to work?

A possible solution is to disable RT_EXCEPTION_PROGRAM_ID_INVALID.