rtContextLaunch2D failed (719)

Paveway · January 23, 2017, 4:53pm

Hi,

im trying to calculate data for multiple meshes.
So i’ve created a program (with OpenGL output) which loads the data from mesh files (we’re only using stl’s at the moment), which works fine for 1 mesh.
All data is calculated as expected.
But when i try to load a new mesh, i get this exception upon the next call of my optix context’s launch:

OptiX Error: ‘Unknown error (Details: Function “_rtContextLaunch2D” caught exception: Encountered a CUDA error: driver().cuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (719): Launch failed)’

I’ve tried the following:

Destroy the old geometry group when loading a new mesh, same exception when calling launch the next time
Destroy and recreate the whole context when loading a new mesh, also the same exception.

We’re using Optix 4.0.2 and i have not found any clues what that exception really means and how to fix this.

Any suggestions?

Thx a lot

dlacewell · January 24, 2017, 7:20pm

So you get a crash when you load file #1, process it with an OptiX launch, then load file #2 and try to process it? Does the order matter? What if you load file #2 on its own, does that work fine or does it crash? I’m trying to narrow down if any two files crash, or there’s something specific about file #2 causing the crash.

The OptiX error you posted is a generic error that means the kernel crashed during the launch. You can try turning on all OptiX exceptions and installing a user exception program to see if OptiX throws an exception with a more helpful error message before crashing.

Paveway · January 25, 2017, 8:50am

Hi, thx for your reply.

The mesh doesn’t matter. For their own every mesh i tried was working.

The program flow is as follows:

Create a context and a root group
When loading the first mesh, we’re creating a geometry group with a geometry instance as a child and finally we add the geometry group as a child to the root group
We call the launch function, everything works fine
When switching to the next mesh, we remove the old geometry instance from the root group via removeChild() func of our root group.
Jump to step 2
Step 3 produces the mentioned exception

Hope this helps fruther investigating our issue.

We’ll add an exception user program to check wether there is another error just before that exception gets thrown.

Kind regards

dlacewell · January 25, 2017, 11:17pm

I tested this sequence of operations with 4.0.2 on Linux and saw expected behavior, no crash.

If enabling exceptions also doesn’t shed any light on this, please send mail to optix-help and we’ll reply with instructions for collecting an OptiX API trace (it’s pretty easy). We can debug from the trace. Or if you have sample code that shows the crash and you don’t mind sharing, that’s even better.

Paveway · February 13, 2017, 9:27am

Hi dlacewell,

i’ve implemented an exception program and enabled device printing

context->setPrintEnabled(true);
context->setPrintBufferSize(1024);

and also enabled all exceptions:

context->setExceptionEnabled(RT_EXCEPTION_ALL, true);

I still get no more error information as stated in the original post.
So i tried to force some exceptions by adding a user exception to our launch program:

#define TEST_EXCEPTION_0 RT_EXCEPTION_USER + 0

RT_PROGRAM void launch_program(void)
{
    rtThrow(TEST_EXCEPTION_0);
    ...
}

and the exception program works as expected:

RT_PROGRAM void exception(void)
{
	const unsigned int code = rtGetExceptionCode();

	if (code == RT_EXCEPTION_STACK_OVERFLOW)
		output_buffer[launch_index] = make_color(bad_color);
	else
	{
		rtPrintf("Caught exception 0x%X at launch index (%d,%d)\n", code, launch_index.x, launch_index.y);
		rtPrintExceptionDetails();
	}
}

I get exception output to our console window as it was meant to be.
So i removed the throw and tested the program with a single mesh which works great.
Then i switched to the next mesh:

void loadMesh(const std::string& filename)
{
	if (optixRootGroup->getChildCount())
	{
		auto child = optixRootGroup->getChild<GeometryGroup>(0);
		optixRootGroup->removeChild(0);
		child->destroy();
	}

        ... loading the next stl like the first one did ...

        GeometryGroup geometry_group = context->createGeometryGroup();
	geometry_group = context->createGeometryGroup();
	geometry_group->addChild(mesh.geom_instance);

        ... setting acceleration mode and so one goes here ...
}

But then the _rtContextLaunch2D failed exceptions pops up.
As i discovered in the meantime this also happens to some of our meshes from our mesh pool when i load them first, but i don’t know if the internal error is the same here, as i again only get the rtContextLaunch2D error.

I’m, working on:
Windows 10 Pro, 64 bit
Intel Xeon E3-1245
32 GB Ram
Quadro M2000 using driver version 376.62
CUDA 8 SDK
Optix 4.0.2 SDK
VS2015

building an 64bit console application running OpenGL for visualization.

Thx for your help.

droettger · February 13, 2017, 1:39pm

Not sure if it helps, but note that the OptiX C++ wrapper functions are not reference counting the OptiX objects in the scene hierarchy, they just wrap the C-API calls.

That means destroying the root Group node’s child is not automatically destroying the whole sub-tree below that GeometryGroup child as well! GeometryInstance, Material, Geometry, and Acceleration structure are still there afterwards, but possibly orphaned, if you didn’t track them otherwise.
Your algorithm is generating memory leaks. You need to track and destroy these resources individually. Try to fix that first and see if it solves the problem.

If I interpret your code snippets correctly, you have the following scene structure:
Group (Acceleration) → GeometryGroup (Acceleration) → GeometryInstance (Material) → Geometry.

First, if there isn’t anything else under the root Group node, there is no need for that node at all. The GeometryGroup could be the root then.

In your code block you delete the top-level GeometryGroup under the root node, create a new one and assign the mesh GeometryInstance. That is more work than necessary. Your initial description contained the better algorithm.
There would be no need to destroy and re-create that top-level GeometryGroup itself. You could only exchange the sub-tree starting at the GeometryInstance instead.
Assuming there is only one GeometryInstance you could simply call gg->setChild(0, mesh.geom_instance) and then call markDirty() on the existing Acceleration nodes at the GeometryGroup and root Group to rebuild the acceleration structures from the exchanged Geometry.
Try that only after cleanly destroying the previously attached sub-tree nodes.

If that also happens when completely destroying and recreating the OptiX context, which would clean up all orphaned OptiX objects, this sounds more like a CUDA Driver issue. In that case trying different display drivers would be recommended.

There is not much to analyze more with the given information. If you’re able to capture this behaviour with an OptiX API Capture trace (please search the OptiX forum for how to generate that), we could try to reproduce your case in-house.

dlacewell · February 13, 2017, 4:44pm

If following Detlef’s advice you’re still not able to find the root cause of the crash, can you create a complete sample that demonstrates this behavior, that you can package up and send to optix-help? It sounds like two generic meshes, e.g., cow and teapot, should show the same behavior; it’s not specific to any proprietary scene data, correct?

If you don’t want to share code, can you modify an SDK sample like optixMeshViewer or optixSphere and get the same crash? That’s the test I did earlier, and it wasn’t crashing for me. Maybe we can work toward the middle – start with your code that crashes and a modified SDK sample that does not, and figure out the significant difference.