optiXTutorial 11 - remove (free)GLUT

Hi People,

I have a question regarding the ‘optixTutorial.cpp’ file. I’d like to comment out the GLUT Part,
as I don’t Need a ‘picture’ to be rendered and shown. I only would like to let OptiX compute
values for me.

My question now is if it is possible to remove the GLUT part completely, because I have the following Problem:

  1. The modified createContext() and createObject() function do not make any Problems, however

  2. When I like to start OptiX calculating the values for my desired context by calling

    context->launch(0, width, height, depth);

    it crashes with the hint

    (i) optix::shared::AssertionFailure at memory location 0x000000000029E830
    (ii) optix::Exception at memory location 0x000000000029F740

Kind Regards
Robert

Hi Robert, the “optixConsole” sample does not depend on GLUT/GL and might be a better starting point for you.

Hi dlacewell,

thank you very much for your answer.

May I ask you or the other people to answer a few questions more.

  1. How really does OptiX determine if a face is intersected by a Ray?
    1.1 What do the user have to define in the first place?
    I mean e.g. you have given 3 verteces (float3) points in space and a normal vector of the
    triangle face. I had a look on the SDK examples, especially the intersection programs, but in fact
    I could’t find any part in the code which calculates if a certain Object was hit or not.

  2. I have a question to following code line: shading_normal = geometric_normal = n;
    Usually in C you cannot have two equal signs in just one instruction, right?
    I suppose in OptiX you’ll assign the most right part to all variables on its left hand side, right?

  3. In the recommended optixConsole.cpp (with parallelogram.cu)
    There the dot product of the normal vector of the parallelogram and the ‘anchor’ of the
    face of the parallelogram is calculated and assigned to the variable ‘plane’. How can this be interpreted - I mean it is not the value of the face content, isn’t it?

  4. In the parallelogram.cu this variable is then taken in the line
    float t = (plane.w - dot(n, ray.origin)) / dt;
    What is being calculated here?

Last but not least

  1. float3 t0 = (boxmin - ray.origin)/ray.direction;

What does the ‘/’ operator do in this context? It cannot be a usual division, can it?

Kind Regards
Robert

  1. That’s the cool thing about OptiX, you control that!

1.1 You provide a bounding box program and an intersection program per primitive type which are assigned to the Geometry node.
Your bounding box program is used by OptiX to build a bounding volume hierarchy (BVH) which is used to speed up intersection tests in a scene.
That hierarchy is traversed by OptiX when calling rtTrace and if a leaf BVH node is hit, your intersection program is called to determine if a primitive is hit, whatever that primitive might be.

“I could’t find any part in the code which calculates if a certain Object was hit or not.”
Look again. You already found the intersection program. The routine which does the actual intersection calculation always happens before rtPotentialIntersection().
For example, for triangles that happens inside the provided intersect_triangle() function which you can find inside the header OptiX SDK 4.0.2\include\optixu\optixu_math_namespace.h if you do a “Find in Files” on all SDK sources and headers.

  1. Of course you can have multiple assignment operators in one statement. The assignment operator is evaluated right to left.
    Nice operator precedence and associativity table here: [url]Microsoft Docs - Developer tools, technical documentation and coding examples

      1. Please have a look at the point-normal form of a plane equation, for example here [url]https://en.wikipedia.org/wiki/Plane_(geometry)[/url]
        The “d” is a projection of the anchor vector onto the normal direction and is used to determine distances to a plane.
        Computer graphics programming requires some serious skills in geometry and linear algebra!
  2. Most operators are overloaded for the vector types (float2, float3, float4, and such).
    The division float3 / float3 in this case happens on all three components .xyz individually.
    Similar question here: [url]https://devtalk.nvidia.com/default/topic/976822/?comment=5020278[/url]

Wow thank you very much for your detailed explanaitions!!!

Your answer to point 1. opened a new question.

  1. In your *.cpp file you setup all CUDA variables etc…
  2. You launch the calculations by calling ‘context->launch([…]);’
  3. When I define some Debugging printouts like rtprintf() different “RT_PROGRAMS” are called
    depending on if the ray hits the object or not.

Now my question: How does OptiX know that an object has been hit before the user has checked
this fact based upon his own user defined intersection test?

In which order does an OptiX Program loop look like?
Is the following correct?

  1. context->launch() => 2. intersect_program() resp. 2. miss_program() if the ray did’t hit anything
    => 3. => 4. …???

Another question is the following: When I define a ray (camera) that does not hit my object
the program ‘void miss()’ is called but not just once but many many times. Why?
There is just one/two ray(s) defined in this Example.

  1. Like I understood your answer I by myself have to write code which calculates intersection points
    and/or find the nearest intersection point.

But, indeed, in this case I don’t understand why OptiX has these programs called
(i) Any_Hit
(ii) Closest_Hit in case I have to write down these algorithms by myself. And again
from where does OptiX know which of those mentioned programs are going to be called.

And another question: when I have e.g. 2 following rtPrintf() directives just one after another like

rtPrintf(“String_1”);
rtPrintf(“String_2”);

Why is it not printed chronologically but like

“String_1”
“String_1”
“String_1”

“String_2”
“String_2”
“String_2”
?

And with it I encountered another thing. When calling the launch function in the
optixPrimitiveIndexOffset example I get rtPrintf Outputs (more than the cmd.exe can show)
although the one ray problem as described.

Additionally when I disable this printf and enable the printf in the Any_Hit Program in
file “phong.cu” cmd.exe doesn’t show anything. Why is that? I mean in the intersection Program
the correct hitpoint was found according to my setup.

So the any hit program should be called afterwards, shouldn’t it?

Kind Regards
Robert

That’s explained inside the OptiX Programming Guide!
Chapter 2.2. Programs explains when which program is called by OptiX.
The order in which program domains are invoked depends on the BVH traversal and intersection test results.

Also see Chapter 3.4.2 Material.
The closest hit and any hit programs are attached to Material nodes per ray type. You define which programs to be used via the C-API rtMaterialsSetClosestHitProgram() and rtMaterialSetAnyHitProgram(), resp. the C++ wrappers setClosestHitProgram() and setAnyHitProgram() in the optix::Material class.

While OptiX uses a single ray programming model, rays inside the ray generation program are launched in parallel. The GPU has thousands of threads running simultaneously.

That’s also the reason why your rtPrintf results are not sequential.
If you want to know which launch index printed what, you would need to print the launch index as well and sort later, or much easier, simply limit the printing to only one launch index which helps debugging a lot.

Find code how to do that here:
https://devtalk.nvidia.com/default/topic/899658/?comment=4740108
https://devtalk.nvidia.com/default/topic/872080/?comment=4656683
https://devtalk.nvidia.com/default/topic/936762/?comment=4882450
https://devtalk.nvidia.com/default/topic/973192/?comment=5004330

The forum has a search function on the top right of this page.
Search for something, then press the “Show” link on the results page to limit your search to the OptiX forum.

OptiX uses an acceleration structure to figure out if the ray hits a bounding box for your object. For more on this topic, consult a textbook such as Realistic Ray Tracing, by Shirley and Morley.

Some of your other questions imply that you may not have much experience with parallel programming. The free “Introduction to Parallel Programming With CUDA” online course given by David Luebke is a good resource, but there are others.

Thank you very much for your answers!!!

@dlacewell Yeah that’s right I never dealt with parallel computing by now, so thank you very much
for your advice!

@Detlef
Thank you very much for your reply to the question about rtTrace(); I think the binding
to a special thread is, indeed, a powerful means.
But, in fact, I still have an issue regarding this function.

Again I’m talking about the optixPrimitiveIndexOffsets.cpp example.
I have three different testcases: (i) setPrintLaunchIndex(0, 0);
(ii) setPrintLaunchIndex(0);
The third one is just this line commented out.

In both cases (i), (ii), there is just nothing printed out. In the third case
the function prints out the values:
launch_index.x == 0; (usually always)
launch_index.y == -0.3… values between ]-1; 0[. most of the times -0.3… something.

This fact leads to the problem that I cannot really filter out to just one precise launch_index.

  1. Question: Is there a workaround for it?
  2. Why can one define 2 and higher order dimension launch_indeces?

Usually one index represents one thread on a GPU Core. So it should just be interpreted as
an unsigned int value?!

B: struct Ray_Payload{};

Ray_Payload payload;

How can I access payload.any_member from the closest hit program for example?

Kind Regards
Robert

Try this pattern to debug a single launch index:

#include <stdio.h>
...
rtDeclareVariable(uint2,         launch_index, rtLaunchIndex, );
...
if ( launch_index.x == 100 && launch_index.y == 100 ) {
    printf("hello from launch index %d %d\n", launch_index.x, launch_index.y);
}

For the question about how to access a member of the payload in a closest hit program, this is done throughout the samples. For example, optixSpherePP uses the closest_hit_radiance() program in file “normal_shader.cu”, which writes to the “result” field of the payload.

Sorry, but it’s unclear how you changed the optixPrimitiveIndexOffsets.cpp to contain the setPrintLaunchIndex() calls, or where you printed what information with which formatting inside the device code to arrive at that output.
Maybe nothing is printed because the launch index (0, 0) isn’t reaching your rtPrintf instruction, for example because it’s in a closest hit program and the ray missed.
The forum supports code blocks. Please provide the relevant code changes with enough context to be able to answer.

Accessing the current per ray payload you set on an rtTrace call in other domains is possible via the user defined variable name with the rtPayload semantic.
The OptiX Programming Guide chapter 4.1.3. Internally Provided Semantics explains which semantic variable can be accessed where.
The chapters 4.5.4. Example Closest Hit Program, 4.6.4. Example Any Hit Program, and 4.7.2. Example Miss Program explain rtPayload accesses.

Hello again,

rtPrintf(); is working fine now. I think the problem with the incorrect values shown for the
2-DIM launch_index come from the wrong formatting, propably. I used %f for each launch_index.x
and launch_index.y. But anyway, I had expected that this would not happen even though.

Still, can you please tell me why you can define 2, or 3-dim launch_indices?
I mean usually a 1-dim index with a uintXX_t would just do it, wouldn’t it?

As well, now I can access the Ray Payload. I’m just wondering that you have to declare
the Payload struct several times (meaning in each *.cu file) the same way. I expected the compiler couldn’t associate the ‘different’ structs properly.

Now, I just have one question left. I’m trying to write something to the Output Buffer, and
actually I think in principle I made it (almost) accordingly to the optixBuffersOfBuffers.cpp project.

  1. I declared two global variables in the optixPrimitiveIndexOffsets.cpp
Buffer    Out_Buffer_Hit_Points_Buffer;
Variable  Hit_Points_Variable;
  1. In the function ‘createScene([…])’ I added the following lines in order to link the CUDA world
    with the host world. The width and height variables are the same as they were in the original SDK.
Hit_Points_Variable = context["output_buffer_Hit_Points_float4_dim2"];
Out_Buffer_Hit_Points_Buffer = context->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, width, height);
Hit_Points_Variable->set(Out_Buffer_Hit_Points_Buffer);
  1. Here, I provide the code of the edited phong.cu file
rtBuffer<float4, 2> output_buffer_Hit_Points_float4_dim2 // Globally declared in the Header of this file
// I added the following lines to the closest_hit_radiance() Program
uint2 test = make_uint2(0, 0);
output_buffer_Hit_Points_float4_dim2[test] = make_float4( 0.0f, 0.0f, 0.0f, 0.0f);

Unfortunately, when trying to run this code I get the following error message when the debugger
reaches the context->launch(0, width, height); instruction.

Exception thrown at 0x00000000774FF401 (ntdll.dll) in optixHello.exe: oxC0000005;
Access violation reading location 0x0000000006CF2540.

Could you please help with this issue?

Kind Regards
Robert

>>Still, can you please tell me why you can define 2, or 3-dim launch_indices?
I mean usually a 1-dim index with a uintXX_t would just do it, wouldn’t it?<<

Because there exist 1D, 2D, and 3D buffers and allowing developers to use same dimension indices is more natural and efficient than calculating everything as linear index. You’re free to use what you like.

>>As well, now I can access the Ray Payload. I’m just wondering that you have to declare
the Payload struct several times (meaning in each *.cu file) the same way. I expected the compiler couldn’t associate the ‘different’ structs properly.<<

Simply put your per ray payload structure definition into a header and include that where needed to make sure the definition matches across files.

Exception thrown at 0x00000000774FF401 (ntdll.dll) in optixHello.exe: oxC0000005;
Access violation reading location 0x0000000006CF2540.<<

Sorry, you get a crash in optixHello after changing code in optixPrimitiveIndexOffsets.cpp and the shared createScene() and phong.cu which all three are not used inside the optixHello example which is C-API based? Did you change the name of the app?

What is your system configuration:
OS version, installed GPU(s), display driver version, OptiX SDK version, CUDA toolkit version used to translate the CUDA code to PTX?

Hi Detlef,

first of all thank you for those explanaitions.

  1. Yes, indeed, I’m using the original project ‘optixPrimitiveIndexOffsets.cpp’ and made all changes, as explained. Right, optixHello.exe is confusing in that sence, because I compiled it with this name by the means of CMake.

I’m using Windows 7 Enterprise, NVIDIA GeForce GTX 745, OptiX SDK 4.0.2, CUDA v8.0.

Driver Version: 21.21.13.7633

At the end if it works - can I, in principle, access the data of the OUTUT_BUFFER by linking
the buffer to a variable like

var_float4 *ptr_float4 = reinterpret_cast<float4>( Out_Buffer_Hit_Points_Buffer->map() );

?

Kind Regards
Robert

That’s not what you would normally do. Wrong place.
Instead add a float4 member to your per ray data payload structure and write that member inside the closest hit program to be able to communicate that per ray result back to the ray generation program, which then finally writes the data to your new output buffer once per launch index.
Means the code above belongs into the ray generation program.
Your implementation would write the data on every closest hit and the final result would not be deterministic when using multiple values or when there are only misses.
Again, think parallel programming here.

Yes, map(), read, and unmap() an output buffer to access its data, as shown in all examples.
Mind that the mapped pointer is only valid until unmap() and you need to unmap() before the next launch().

Hi Detlef,

now actually it is working fine besides one thing. I can’t really access the Buffer Array.
The content of the float4* variable is just 0 when using

context->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, width, heigth);

When using

context->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, sizeof(float4), sizeof(float4));

I’ll get some weird content, when reading this variable.

What basically has the user to consider besides this map/unmap call?
For a first test I would say even the unmap call is not necessary as the launch function
is called just once, by now.

  1. I think the correct width and height has to be set, right? Otherwise the
    assignment to the float4* variable will be wrong (e.g. will start at a wrong location
    in the storage), which leads to the Output_Buffer_Array cannot be read accordingly, right?

  2. Do I have to use the markDirty function? I read something about it and I wanted to try this out
    but, in fact, I’ll get an error, that, in the first place, the function
    setDevicePointer OR getDevicePointer has to be called.
    Unfortunately, I couldn’t figure out how to obtain the optix_device_number which the
    getDevicePointer requires.

  3. A colleague of mine working with CUDA told me perhaps it is possible to copy the data
    from the buffer to the host with the cuda_runtime.h library, as well.
    Actually, I would like to avoid this, especially because it should work with OptiX as well ?!,
    but if it would work like this than that’s fine, as well.

  4. By the way I added the code line

context["output_buffer_Hit_Points_float4_dim2"]->setBuffer( Out_Buffer_Hit_Points_Buffer );
  1. In sum Actually the CUDA World is working correctly but after the
context->launch([...]);

call my variables
especially my float4 *pt_to_linked_OUTPUT_Buffer_in_OptiX_World just shows me zeros. So I think
there is something missing o define this link (device<->host) correctly.

  1. In principle it should be possible to loop through all elements with the float4 *pointer,
    shouldn’t it? So if I increment the pointer by 1 each loop I should get the whole Buffer content?

Have you got any ideas?

Many Thanks!

Kind Regards
Robert

You mean the pointer is nullptr or the data behind it is 0.0f?
In the latter case, you wrote make_float4(0.0f) in your given code, then 0.0f would be the correct result.

That’s the same as context->createBuffer(RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, 16, 16); and makes no sense when all access is via width and height which are possibly different.

Do not access the pointer outside the map/unmap pair. Do not access the memory out of bounds. Nothing more. It’s that simple.

  1. Correct.

  2. No, that is only for buffer updates though interoperability with CUDA when OptiX cannot track if a buffer has changed.
    You have to use map() and unmap() which will always notify OptiX of potential buffer changes and trigger the required upload.
    That’s also one reason why you need the unmap() before a launch().

  3. You really don’t need CUDA interoperability for this.
    Since OptiX is using CUDA internally the map() and unmap() functions are actually using such CUDA mechanims internally already.

  4. Impossible to say without the complete code.

  5. Exactly. It’s that simple!
    You should be able to get this worked out by looking at all the existing examples which fill an output buffer.

Hi Detlef,

I’m sorry but it is still not working.

  1. No just to test it out I made the following:

1.1. In order to have values different as the standard zeros I added the following line
in the ‘pinhole_camera.cu’

prd.test_output_var_float_4 = make_float4( 0.0f, 2.0f, 2.0f, 0.0f );

1.2. Afterwards BEFORE the rtTrace([…]); Call I added a rtPrintf-call => cmd.exe shows the right values

rtPrintf("Function RT_PROGRAM void pinhole_camera() before rtTrace has been called: \n  %f \n %f \n %f \n %f : \n ", prd.test_output_var_float_4.x, prd.test_output_var_float_4.y, prd.test_output_var_float_4.z, prd.test_output_var_float_4.w);

1.3 The same as in 1.2 I added AFTER the rtTrace call.
What is happening in the meanwhile is that OptiX calls the
‘RT_PROGRAM void closest_hit_radiance()’ in phong.cu.

In that file I added the line in order to see, if the value is taken over by the pinhole_camera.cu
correctly and, like described in 1.3., the value 1,2,3,4 is printed correctly in the cmd.exe.

prd.test_output_var_float_4 = make_float4(1.0f, 2.0f, 3.0f, 4.0f);

1.4 Then I added the following line to write those values in the RT_OUTPUT_BUFFER
and tested if the correct values have been written IN-to the Buffer by calling
the rtPrintf function again → and they were (1,2,3,4).

output_buffer_Hit_Points_float4_dim2[launch_index] = prd.test_output_var_float_4;

So far so good, but now I’m becoming exasperated with copying those float4 into my *.cpp file.
I even tried it with the CUDA interop functionality today, but it showed the same behaviour.

  1. Global definitions in the header of the *.cpp file
Variable		 st_OUTPUT_rtBuffer_Variable;
Buffer			 st_OUTPUT_rtBuffer_Buffer;
float4			*pt_DATA_OUTPUT_rtBuffer_float4;

2.1 According to the SDK Sample I created the Buffer-Linkage just after the context
has been created createContext()

st_OUTPUT_rtBuffer_Variable = context["output_buffer_Hit_Points_float4_dim2"];
// Here I tried several things -- nothing lead to a solution 
// Now with | RT_BUFFER_GPU_LOCAL
st_OUTPUT_rtBuffer_Buffer = context->createBuffer(RT_BUFFER_INPUT_OUTPUT, RT_FORMAT_FLOAT4, width, height);
pt_DATA_OUTPUT_rtBuffer_float4	= reinterpret_cast<float4*>(st_OUTPUT_rtBuffer_Buffer->map());
context["output_buffer_Hit_Points_float4_dim2"]->set(st_OUTPUT_rtBuffer_Buffer);
st_OUTPUT_rtBuffer_Variable->set(st_OUTPUT_rtBuffer_Buffer);
st_OUTPUT_rtBuffer_Buffer->unmap();

2.2. Latestly AFTER the context->launch([…]); call I added the following debug code
in order to see if I can read my (1,2,3,4) at any place but I can’t.

int		tryout(0);
float4	        laufvariable_float4;
double	        float_sizeof_d;

		float_sizeof_d = sizeof(float);

for (	tryout	=	0; 
	tryout	<=	width + height;
        tryout++
	) 
		{
			
			laufvariable_float4 = *pt_DATA_OUTPUT_rtBuffer_float4;

			pt_DATA_OUTPUT_rtBuffer_float4 = pt_DATA_OUTPUT_rtBuffer_float4 + 1;
		
		}

Actually, according to everthing I read in this forum (Old Posts) and in the internet,
this code should work, actually.

Have you any idea what’s happening here or if there is a structural error?

Kind Regards
Robert

Read my post again!

Do NOT access the pointer outside the map/unmap pair. Do not access the memory out of bounds.

That’s exactly what you do incorrectly! You get a pointer to the buffer with map() then unmap() the buffer. Now your pointer is invalid. Don’t use it anymore! Even crashes are possible if you do.
Put the map/unmap around the code where you read the buffer.

Hi Detlef,

many thanks. Yeah this problem was, in the end, very easy to resolve.

Now another question came up to me. Intersection Programs are called internally by OptiX itself.
It is called for example with

intersection_program( int primIdx ){}

In my Code I’m getting an Exception with Exception Code 0x3FD. As I don’t know which of my
rtBuffers are called Out of Bounds I’d like to print the related

primIdx

to make sure everything is ok with those values. As it is not possible to use
own semantics with the ‘attribute’ identifier in an Exeption Program and as it is not
possible to access the current ray and therefore its payload I don’t have an idea how to
access the current primIdx in an Exeption Program.

  1. Do you have any Idea?

  2. How do you set the intervall of primIdx? primIdx should be element [0; 5[

  3. A question, which is regarding this topic as well is the following:

A: mesh->setPrimitiveCount( uint noe );
B: mesh->setPrimitiveIndexOffset( uint offset );

To A: E. g. if I have 2 triangles defined by 4 verteces. Do I have to set noe = 2 or noe = 3?
To B: offset should be 3 as triangles consists of three verteces, right?

Kind Regards,
Robert

If you want to add an assertion to check that conditions are what you expect them to be, you can use the code of one of the links I posted in this thread before, which shows how to implement your own assertion function which uses rtThrow to trigger a user exception with some code you can use to identify what went wrong. Here it is again: [url]https://devtalk.nvidia.com/default/topic/973192/?comment=5004330[/url]

I’m assuming you use indexed triangles here, e.g. with a vertex buffer with four float3 positions describing a quad and an index buffer with two uint3 indices, one per triangle, like (0,1,2) and (2,3,0) to render that quad.

If you have indices to two primitives, your primitive count obviously needs to be 2.
The valid primitive indices which should occur inside the intersection program are therefore 0 and 1.

If you only use one Geometry node, the primitive offset shouldn’t be needed at all.
See the OptiX API Reference for what it actually does: [url]http://docs.nvidia.com/gameworks/content/gameworkslibrary/optix/optixapireference/optix__host_8h.html#af070db26aafc71410897ba0d7ea50fbc[/url]

The primitive index offset is only needed if you want to put multiple primitives into one buffer and share that among multiple Geometry nodes which each could use a separate range of the indices defined by the index offset and count.
That’s an optimization option you normally don’t need, esp. not when just starting with OptiX.

If you had set that index offset to 3 the access would have been behind your two triangle indices and the reason for the out of bounds exception.
Remove that setPrimitiveIndexOffset(3) call or set it to 0 and try if your primitive indices start to work.