[Solved] Weird ray generation hang (really simple code)

voldemarz · June 9, 2014, 9:14pm

I’m struck on some weird hang in (after ?) ray generation program which is driving me mad… It seems like it is stuck in a loop, but it shouldn’t. It is stopped by Windows resetting the graphics driver. TdrDelay is set to 10 seconds. Please help me understand what is going on !

Initially I used loop in form for(;;), added counter for debug output. I also tried to use do-while loop.

Just before posting this I found out if I add stopping condition which is LOWER THAN 8, e.g. for(int i=0;i<7;i++) there is NO HANG !! What the hell ?!

The error (returned code sometimes is also 702, also seen 716 for non-stripped down code):
Unkonwn error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: Kernel launch returned (700): Launch failed, [6619200])

A minimalistic example (underscores added for indentation) where one thread traces a ray and stops after first hist or miss, rest of thread break out early:

RT_PROGRAM void generator()
{
  SubpathPRD lightPrd;
  lightPrd.done = 0;

  float3 rayOrigin = make_float3( 343.0f, 548.0f, 227.0f);
  float3 rayDirection = make_float3( .0f, -1.0f, .0f);
  Ray lightRay = Ray(rayOrigin, rayDirection, RayType::LIGHT, 0.0001, RT_DEFAULT_MAX );

  for (int i=0;;i++)
  {
    // Stop all threads except one
    if ((launchIndex.x + launchIndex.y) != 0)
    {
      printf("Gen %d - idx %d,%d - break\n", i, launchIndex.x, launchIndex.y);
      break;
    }
       
    // Launch ray
    printf("Gen %d - idx %d,%d - Dir %f %f %f\n", i, launchIndex.x, launchIndex.y, 
rayDirection.x, rayDirection.y, rayDirection.z);
    rtTrace( sceneRootObject, lightRay, lightPrd );

    // Escape loop
    if (lightPrd.done) 
    {
      printf("Gen %d - idx %d,%d - break\n", i, launchIndex.x, launchIndex.y);
      break;
    }
  }
  printf("Done idx %d,%d \n", launchIndex.x, launchIndex.y);
}

// Stop tracing loop at first hit or miss
RT_PROGRAM void miss()
{
  lightPrd.done = 1;
  printf("Miss %d,%d\n", launchIndex.x, launchIndex.
}

RT_PROGRAM void closestHitLight()
{
  lightPrd.done = 1;
  printf("  Diffuse hit idx %d,%d \n", launchIndex.x, launchIndex.y);
}

Output (used 2x2 launch dimensions to limit output)
Gen 0 - idx 1,0 - break
Gen 0 - idx 0,1 - break
Gen 0 - idx 1,1 - break
Done idx 1,0
Done idx 0,1
Done idx 1,1
Gen 0 - idx 0,0 - Dir 0.000000 -1.000000 0.000000
Diffuse hit idx 0,0
Gen 0 - idx 0,0 - break
Done idx 0,0

Issue reproduced on two machines (desktop with GTX 770, driver 337.88 and laptop with GT525M, driver 335.23):
Windows 8.1 x64
Optix 3.5.1
Cuda v5.5
VS2012 64bit application build
VS2010 32bit apllication build

In this same project there is a working pathtracing solution that iteratively generates rays in a for loop with stopping conditions i<maxPathLenght. This makes my current issues even weirder…

HamzaC · June 10, 2014, 7:44am

rtPrintf should be used instead of printf in OptiX programs.
I don’t know if it will solve your problem but last time I accidentally tried to use printf… Well, now, I’m very careful and I use rtPrintf :)

droettger · June 10, 2014, 7:59am

Instead of the CUDA printf have you tried the OptiX rtPrintf()?
printf() isn’t officially supported, see OptiX Programming Guide Chapter 12. Caveats.

voldemarz · June 10, 2014, 12:11pm

YES! That was it! Thanks!
I had seen the note that rtPrintf should be used, but didn’t think that was an issue since printf was used in other kernels for debut output and worked just fine before. Apparently I was too frustrated to realize that is the issue.

Also use of prinft didn’t produce illegal symbol error as Programming Guide Chapter 12 says. Would be better if it did…

By the way is there a reference with meaning of error codes returned by kernel launches? E.g. 700,702,716

m_sch · June 10, 2014, 12:22pm

Have a look at the cuda.h header file:

/**
 * Error codes
 */
typedef enum cudaError_enum {
    ...
} CUresult;

voldemarz · June 10, 2014, 12:29pm

Thanks! Interestingly 716 is not there. I’m pretty sure I saw that few times.

droettger · June 10, 2014, 12:44pm

Right, somehow headers up to CUDA 5.5. didn’t contain that, but CUDA 6.0 headers have it, but that also changed the meaning of some others for unknown reasons.

(BTW, as you see from m_sch’s post, the forum supports the “code” tag. There is no need to add some characters to preserve indentation and that also puts it into a scrollable window for smaller real estate per post.)

m_sch · June 10, 2014, 12:46pm

CUDA 6 has error code 716:

CUDA_ERROR_MISALIGNED_ADDRESS             = 716,

Are you looking at the correct header file?

voldemarz · June 10, 2014, 2:00pm

Yes. I have Cuda v5.5. Can I use Cuda 6? I read that it’s recommended to use the version of Cuda that the particular version of OptiX (3.5.1) was compiled against.

@Detlef sorry, didn’t notice that tag.

m_sch · June 10, 2014, 2:07pm

No, you would need OptiX 3.6 for CUDA 6 support.

voldemarz · June 11, 2014, 4:12am

The joy was short… Switching from printf to rtPrintf made it work only for case when tracing stopped after first hit. Here’s a version without prints that hangs (shooting rays back and forth until random break):

RT_PROGRAM void generatorDbg()
{
  SubpathPRD lightPrd;
  lightPrd.depth = 0;
  lightPrd.done = 0;
  lightPrd.randomState = randomStates[launchIndex]; // curand states

  float3 rayOrigin = make_float3( 343.0f, 548.0f, 227.0f);
  float3 rayDirection = make_float3( .0f, -1.0f, .0f);
  Ray lightRay = Ray(rayOrigin, rayDirection, RayType::LIGHT, 0.0001, RT_DEFAULT_MAX );
  
  for (int i=0;;i++)
  {
    rtTrace( sceneRootObject, lightRay, lightPrd );
    if (lightPrd.done) 
      break;

    lightRay.origin = lightPrd.origin;
    lightRay.direction = lightPrd.direction;
  }

  randomStates[launchIndex] = lightPrd.randomState;
}

RT_PROGRAM void miss()
{
  lightPrd.done = 1;
}

RT_PROGRAM void closestHitLightDbg()
{
  lightPrd.depth++;    
  if (0.5f < getRandomUniformFloat(&lightPrd.randomState))  // random stop
  {
    lightPrd.done = 1;
    return;
  }

  lightPrd.origin = ray.origin + tHit*ray.direction;
  lightPrd.direction = -ray.direction;                      // shoot back
}

Also when I switched to rtPrinf I added a macro that first prints launch index, spaces equal to number of ray depth and only then the string passed to the macro, basically 3 rtPrinf calls. This is when I was hit by issue when two consecutive rtPrintf calls sometimes cause “error in format string” exception

OptiX Error: Invalid value (Details: Function “RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)” caught exception: Error in rtPrintf format string: “”, [7995632])

This issue with rtPrintf has been seen by multiple people already 1-2 years ago:
http://celarek.at/2014/05/why-you-should-never-use-nvidia-optix/
https://devtalk.nvidia.com/default/topic/545986/bug-in-optix-when-using-exceptions-/?offset=8#4199465

Things I tried:

An equivalent set of programs in path tracing sample and they worked.
Made sure that parameters passed to NVCC are the same as for OptiX wizard generated projects.
Move programs for this entry point to separate files (all ray generations were in same file, all closest hit for material together in another file). Now context validation passes, but compilation doesn't return and I get "driver stopped responding" message after ~4 seconds (TdrDelay is set to 10)

Seems like there something screwed up in the context. Will try remove all other algorithm entry points and see if that helps to pinpoint the issue.

Any idea what could be wrong? How to find?

voldemarz · June 11, 2014, 7:46pm

Ok, so I separated initialization so that only variables/buffers needed for given algorithm are initialized. Now on context compilation I get initialization exception complaining about invalid value for a buffer used for other algorithm (context shouldn’t even know about it !? )

For example for Path Tracing I get error about “photons” buffer used in Photon Mapping even though context[“photons”] was newer set:

Invalid value (Details: Function "_rtContextCompile" caught exception:
Initalization of non-primitive type photons:  Buffer object, [1769653])"

// This is never called
m_photons = m_context->createBuffer(RT_BUFFER_OUTPUT);
m_context["photons"]->set( m_photons );

Similarly for Photon Mapping I get error about “raytracePassOutputBuffer” buffer used in Path Tracing

Invalid value (Details: Function "_rtContextCompile" caught exception: 
Initalization of non-primitive type raytracePassOutputBuffer:  Buffer object, [1769653])"

// This is never called
m_raytracePassOutputBuffer = m_context->createBuffer( RT_BUFFER_INPUT_OUTPUT );
m_context["raytracePassOutputBuffer"]->set( m_raytracePassOutputBuffer );

How is this possible ?!? Seems like I’m getting some recycled version of context.

Could clean driver reinstallation help? I guess I’ll have to try…

nljones · June 11, 2014, 9:41pm

It sounds like you might have removed the references from you .cpp files but not from your .cu files. If you still have “rtBuffer photons;” at the top of any of your .cu files, OptiX will expect you to initialize that buffer.

voldemarz · June 11, 2014, 9:59pm

Ohh, thanks! That was it. Will split those and see what happens then.

voldemarz · June 12, 2014, 2:02am

So, I split all cuda files so that I could load only those programs that are relevant for given algorithm. The other two algorithms (progressive photon mapping and path tracing) were working fine, but this small sample was still hanging if trace depth bigger than 1.

I already had written a post with more code sample that included the host code tho show how it all doesn’t make sense…

Then I remembered about the new Trbvh (GPU-based BVH) acceleration structure builder that I spotted in the docs today. So I changed to that and it WORKED ! And both print functions also seem to work fine now.

Previously it was Sbvh (Split-BVH). I don’t know what it didn’t like in my stupid little example and standard Cornell scene, it worked perfectly fine with other algorithms and different scenes.

I tried to reproduce it in path_trace sample, since the scene has exactly the same dimensions, but no luck. Will try some day more when have more time.

voldemarz · June 12, 2014, 10:22pm

Well… joy was short again. Simply shooting bunch of rays reflected back to origin worked. But when I put back cosine weighted hemisphere sampling it blew up again. Computation of values and debug output works (with issues explained later), but when they are assigned to PRD it hangs (error 702). I verified that vector components are valid numbers and length equal to 1.

I changed sampling to match code use in OptiX samples - hangs (error 702). Works when simply negate ray.direction
newDir = - oldDir
or reflect
newDir = 2*normal + oldDir

In all cases vectors are normalized. How could this be an issue ?

Closes hit and sampling code:

RT_PROGRAM void closestHitLightDbg()
{
  lightPrd.depth++;
  float3 worldShadingNormal = normalize( rtTransformNormal( RT_OBJECT_TO_WORLD, shadingNormal ) );
  float3 hitPoint = ray.origin + tHit*ray.direction;

  // Russian Roulette
  float rrSample = rnd(lightPrd.seed); // using random numbers same as path_trace sample
  if (0.5f < rrSample)
  {
    lightPrd.done = 1;
    return;
  }

  float2 sample = make_float2(rnd(lightPrd.seed),rnd(lightPrd.seed));
  float3 dir = sampleUnitHemisphereCos(worldShadingNormal, sample);  // doesn't work
  dir = sampleUnitHemisphereCosOptix(worldShadingNormal, sample);    // doesn't work
  dir = normalize(2*worldShadingNormal + ray.direction);          // works
  dir = -ray.direction;                                           // works
  lightPrd.direction = normalize(dir);

  lightPrd.origin = hitPoint;
}

// --------- Original hemisphere sampling version
static __device__ __inline__ optix::float3 sampleUnitHemisphereCos(
  const optix::float3 & normal, const optix::float2& sample)
{
  using namespace optix;

  float theta = acosf(sqrtf(sample.x));
  float phi = 2.0f * M_PIf *sample.y;
  float xs = sinf(theta) * cosf(phi);
  float ys = cosf(theta);
  float zs = sinf(theta) * sinf(phi);

  float3 U, V;
  createCoordinateSystem(normal, U, V);
  return optix::normalize(xs*U + ys*normal + zs*V);
}

// Create ONB from normalized normal (code: Physically Based Rendering, Pharr & Humphreys pg. 63)
static  __device__ __inline__ void createCoordinateSystem( 
  const optix::float3& N, optix::float3& U, optix::float3& V )

{
  using namespace optix;
  if(fabs(N.x) > fabs(N.y))
  {
      float invLength = 1.f/sqrtf(N.x*N.x + N.z*N.z);
      U = make_float3(-N.z*invLength, 0.f, N.x*invLength);
  }
  else
  {
      float invLength = 1.f/sqrtf(N.y*N.y + N.z*N.z);
      U = make_float3(0.f, N.z*invLength, -N.y*invLength);
  }
  V = cross(N, U);
}

// ------- Optix samples based hemisphere sampling
float3 __device__ __inline__ sampleUnitHemisphereCosOptix(float3 normal, float2 rnd)
{
    float3 p;
    cosine_sample_hemisphere(rnd.x, rnd.y, p);
    float3 v1, v2;
    createONB(normal, v1, v2);
    return v1 * p.x + v2 * p.y + normal * p.z;  
}

// Create ONB from normalaized vector
static __device__ __inline__ void createONB( 
    const optix::float3& n, optix::float3& U, optix::float3& V)
{
  using namespace optix;

  U = cross( n, make_float3( 0.0f, 1.0f, 0.0f ) );
  if ( dot(U, U) < 1.e-3f )
      U = cross( n, make_float3( 1.0f, 0.0f, 0.0f ) );
  U = normalize( U );
  V = cross( n, U );
}

Then there also still are issues with rtPrintf(), independent from issues with ray dirrections. The regular printf() seems to work fine always. It somehow seems to depend on the way fields of launchIndex are used. Below I identified all the funky cases when rtPrinft causes hang or format exception. There are four cases and their behavior depending if line marked #2 is commented or not.

RT_PROGRAM void generatorDbg()
{
  SubpathPRD lightPrd;
  lightPrd.depth = 0;
  lightPrd.done = 0;
  lightPrd.seed = tea<16>(1200*launchIndex.y+launchIndex.x, 1);

  float3 rayOrigin = make_float3( 343.0f, 548.0f, 227.0f);
  float3 rayDirection = make_float3( .0f, -1.0f, .0f);
  Ray lightRay = Ray(rayOrigin, rayDirection, RayType::LIGHT_VCM, 0.0001, RT_DEFAULT_MAX );

  int a = launchIndex.x; // #1 using launchIndex.x in the loop doesn't produce same effect

  for (int i=0;;i++)
  {
    // Example 1
    // Without #2 - output in first iteration, then hang
    // With    #2 - works
    rtPrintf("Output\n");

    // Example 2
    // Without #2 - output in first iteration, then "Error ir rtPrintf format string"
    // With    #2 - works
    if (launchIndex.x == 0 && launchIndex.y == 0)
    {
      rtPrintf("Outputs\n");
    }

    // Example 3
    // Without #2 - works
    // With    #2 - works
    //if (launchIndex.x == 0 && launchIndex.y == 0)
    {
      rtPrintf("i %d", launchIndex.x);
      rtPrintf("Outputs\n");
    }

    rtTrace( sceneRootObject, lightRay, lightPrd );

    if (lightPrd.done) 
    {
      lightPrd.done += a; // #2
      break;
    }

    lightRay.origin = lightPrd.origin;
    lightRay.direction = lightPrd.direction;

    // Example 4
    // Without #2 - output in first iteration, then "Error ir rtPrintf format string"
    // With    #2 - works
    rtPrintf("Output\n");
  }
}

I couldn’t reproduce this in OptiX sample projects. User GL_Kyle said in the thread below he has seen rtPrintf to cover up unrelated memory corruption issues. He didn’t specify though if he meant host or device memory.
https://devtalk.nvidia.com/default/topic/734914/optix/optix-bug-crash-with-cuda-error-kernel-ret-700-when-not-rtprinting-anything-small-demo-code-/

How that could happen before the first kernel launch? Setting some context buffers/variables multiple times ?
I’d really appreciate if someone could look through context manipulation code below and see if there is anything suspicious.

I collected all context affecting code except (anyhit program loading for given material / render methods). Everything is common to all methods, except parts marked METHOD SPECIFIC (less than 10 lines). Also call to initializeRandomStates() that initializes curandStates in GPU memory is commented out eliminate possibility it corrupts something, I changed kernels to use simple random numbers as in kernels.
Cornell::getSceneRootGroup() creates the scene geometry exactly the same as path_trace sample hence most of it is omitted, only the last part where GeometryGroup is added as a child to Group which is then returned and used as top level object.

void OptixRenderer::initialize(const ComputeDevice & device, RenderMethod::E renderMethod)
{    
  initDevice(device); // Sets OptiX device, simple hence won't include
  
  m_context["localIterationNumber"]->setUint(0); // context created in OptixRenderer constructor

  // An empty scene root node
  optix::Group group = m_context->createGroup();
  m_context["sceneRootObject"]->set(group);
  
  // Output Buffer
  m_outputBuffer = m_context->createBuffer( RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT3, m_width, m_height );
  m_context["outputBuffer"]->set(m_outputBuffer);

  // Random state buffer (must be large enough to give states to both photons and image pixels)
  m_randomStatesBuffer = m_context->createBuffer(RT_BUFFER_INPUT_OUTPUT|RT_BUFFER_GPU_LOCAL);
  m_randomStatesBuffer->setFormat( RT_FORMAT_USER );
  m_randomStatesBuffer->setElementSize( sizeof( RandomState ) );              // typedef curandState RandomState
  m_randomStatesBuffer->setSize( PHOTON_LAUNCH_WIDTH, PHOTON_LAUNCH_HEIGHT ); // use common size for all render methods
  m_context["randomStates"]->set(m_randomStatesBuffer);

  // Light sources buffer
  m_lightBuffer = m_context->createBuffer(RT_BUFFER_INPUT);
  m_lightBuffer->setFormat(RT_FORMAT_USER);
  m_lightBuffer->setElementSize(sizeof(Light));
  m_lightBuffer->setSize(1);
  m_context["lights"]->set( m_lightBuffer );
  
  m_context->setRayTypeCount(RayType::NUM_RAY_TYPES);
  m_context->setStackSize(ENABLE_PARTICIPATING_MEDIA ? 3000 : 1596);

  m_initialized = true;
  m_contextCompiled = false;

  initializeRenderMethod(renderMethod); // specific to each method
}

void OptixRenderer::initializeRenderMethod(RenderMethod::E renderMethod)
{
  // METHOD SPECIFIC
  Program generatorProgram = m_context->createProgramFromPTXFile( "LightPathGeneratorVCMDbg.cu.ptx", "generatorDbg" );
  Program exceptionProgram = m_context->createProgramFromPTXFile( "LightPathGeneratorVCMDbg.cu.ptx", "exception" );
  Program missProgram = m_context->createProgramFromPTXFile( "LightPathGeneratorVCMDbg.cu.ptx", "miss");
  m_context->setRayGenerationProgram(OptixEntryPointVCM::LIGHT_ESTIMATE_PASS, generatorProgram);
  m_context->setMissProgram(OptixEntryPointVCM::LIGHT_ESTIMATE_PASS, missProgram);
  m_context->setExceptionProgram(OptixEntryPointVCM::LIGHT_ESTIMATE_PASS, exceptionProgram);
}

// Called after OptixRenderer::initialize()
void OptixRenderer::initScene( IScene & scene )
{
  m_sceneRootGroup = scene.getSceneRootGroup(m_context, m_initializedRenderMethod);  // code for Cornell::getSceneRootGroup below
  m_context["sceneRootObject"]->set(m_sceneRootGroup);

  // Add the lights from the scene to the light buffer
  m_lightBuffer->setSize(lights.size());
  Light* lights_host = (Light*)m_lightBuffer->map();
  memcpy(lights_host, scene.getSceneLights().constData(), sizeof(Light)*lights.size());
  m_lightBuffer->unmap();
  m_contextCompiled = false;
}

optix::Group Cornell::getSceneRootGroup(optix::Context & context, RenderMethod::E renderMethod)
{
  // GeometryGroup built exactly as in in path_trace sample except it is added
  // as a single child to a Group which is then returned (by conventions that
  // loaded scenes contains root group and multiple child GeometryGroups)
  geometry_group->setAcceleration(context->createAcceleration("Trbvh", "Bvh"));

  optix::Group gro = context->createGroup();
  gro->setChildCount(1);
  gro->setChild(0, geometry_group);
  optix::Acceleration acceleration = context->createAcceleration("Trbvh", "Bvh");
  gro->setAcceleration(acceleration);
  return gro;
}

// called before rendering a frame if m_contextCompiled is false
void OptixRenderer::compile()
{
  m_context->validate();
  m_context->compile();
  m_contextCompiled = true;
}

void OptixRenderer::renderNextIteration(unsigned long long localIterationNumber, const RenderServerRenderRequestDetails & details)
{
  if( details.getWidth() != m_width || details.getHeight() != m_height)
  {
      this->resizeBuffers(details.getWidth(), details.getHeight());
  }
  
  // Omitted logging and branching to entry point for current render method
  // METHOD SPECIFIC
  m_context->launch( OptixEntryPointVCM::LIGHT_ESTIMATE_PASS,
    static_cast<unsigned int>(SUBPATH_LENGHT_ESTIMATE_LAUNCH_WIDTH),
    static_cast<unsigned int>(SUBPATH_LENGHT_ESTIMATE_LAUNCH_HEIGHT) );  
}

void OptixRenderer::resizeBuffers(unsigned int width, unsigned int height)
{
  m_outputBuffer->setSize( width, height );
  m_randomStatesBuffer->setSize(max(PHOTON_LAUNCH_WIDTH, (unsigned int)1280),
                                max(PHOTON_LAUNCH_HEIGHT, (unsigned int)768));
  // Calls cuda kernel and initializes curandStates in GPU memory,
  // commented out to eliminate as possible cause of memory corruption.
  // Using simple random numbers as in OptiX path_trace sample
  //initializeRandomStates();
  m_width = width;
  m_height = height;
}

By the way this the renderer that I’m trying to extend:

I guess I’ll try to revert to Optix 3.0 that was used developing this project and see if that works better.
EDIT: Apparently it is not available for download. There seems to be fresh 3.6 release, will try that.

voldemarz · June 13, 2014, 3:26am

Update to OptiX 3.6 and Cuda 6 didn’t help. :( Still seeing same issues.
Differences are that rtPrintf issues always result in “format” exception, previously there was case where it hang in Example 1.

Also I forgot to mention that hangs occasionally resulted in video driver error/crash (?) with screen this error message pop up:

Nvidia OpenGL driver detected a problem with the display driver and is unable to continue. The application must close.
Error code 3.

Now it happens 90% of the time, instead of a hang and then Optix exception. Once when did get an exception I noticed a new error code than before:
CUDA_ERROR_LAUNCH_FAILED = 719

cuda.h says
/**
* An exception occurred on the device while executing a kernel. Common
* causes include dereferencing an invalid device pointer and accessing
* out of bounds shared memory. The context cannot be used, so it must
* be destroyed (and a new one should be created). All existing device
* memory allocations from this context are invalid and must be
* reconstructed if the program is to continue using CUDA.
*/

I fail to see what could cause such behavior based on the description above. When I generate random number similarly to Optix samples the only memory I use is ray payload and local variables (e.g. no preinitialized states as for curand)

droettger · June 13, 2014, 7:27am

Are you doing accumulations inside that output buffer? Then it would need to be of type RT_BUFFER_INPUT_OUTPUT.
Avoid RT_FORMAT_FLOAT3 type buffers for output, that will be slower then RT_FORMAT_FLOAT4. See performance notes inside the programming guide.

If you experience any issues related to acceleration structure builders, always try Bvh first.

Your loop doesn’t have an upper limit. Looking at the closestHitLightDbg(), if the Russian Roulette termination for a single thread is not terminating early enough you will hit the timeout.

What happens when commenting out all rtPrintf() calls?

In either case, if you could provide a minimal reproducer in failing state, I’d be able to file a bugreport. Please don’t use file sharing sites for that, they are blocked. You can also contact the OptiX-Help e-mail you’ll find inside the release notes if there are confidential information involved.

voldemarz · June 13, 2014, 3:43pm

Thanks for looking into this.
I updated driver to 337.88. Now it doesn’t crash with the OpenGL issue, just hangs.

Yes, path tracer and photon mapper was accumulating, but they did work any way. In my case with the generation I posted I’m not touching output buffer yet at all. Though it had been initialized.

I tried Bvh as well, same behavior.

It’s a simple kernel and I’m launching only 4 threads, TdrDelay is now set to 5, that should be plenty enough. It usually bounces at most 10 times, if set termination possibility to 0.5 then much less. Also it doesn’t work if I simply stop at second hit by adding the depth check at the end of anyhit program. If I set lightPrd.direction to value compute by hemisphere sampling, it hangs, even though this value is never used in generation program to trace next ray.

float3 dir;
dir = sampleHemisphereCosOptix(worldShadingNormal, bsdfSample);// doesn't work
dir = -ray.direction;                                          // works
lightPrd.direction = normalize(dir);

if (lightPrd.depth == 2)
{
  lightPrd.done = 1;
  return;
}

Still hangs. Issues with rtPrintf and prd.direction doesn’t affect each other. If I set prd.direction to -ray.direction, all rtPrinft examples (used one at the time) still behave as in comments of generation program.

As I mentioned before, I couldn’t reproduce this in path_trace sample (same scene) by changing generation and hit programs to match my example. So something in the way context is initialized exactly in my case causes the issue. Weirdly the only differences from working solutions in this same project are generation and hit programs which are way simpler than ones in other solutions.

I’ll try to make a minimal sample.

voldemarz · June 15, 2014, 8:08pm

I made a minimal project reproducing the issue. It’s here in a branch “CotextInitTest” (yes, typo in the name of the branch).

There’s a ContextTest solution with static library project ContextInitializer that initializes the context, and simple console application project ContextLauncher that uses it. In the ContextInitializer project there’s also readme.txt where I summarized the issues and linked to this thread, maybe forgot something mentioned here.

To compile it requires CUDA_PATH and OPTIX_PATH environment variables defined pointing to respective installation directories.

It’s together with rest of the project since I will use this minimal context initialization library within the main project to test if/when this gets fixed.

As 3rd issue in that readme.txt I mentioned the fact that output from rtPrintf doesn’t show up if program output is redirected to file. I couldn’t find solution to this online or this forum.

Can you give any estimate how soon I could get to know at least what is the problem and if there is way to work around the hanging issues?