Problem with turning off RTX mode on GTX 1080

Hello everyone, first time posting here,

I’m currently tasked with porting an older OptiX 5.0 project into Unreal Engine (it was a standard OpenGL renderer before). While this was going better than expected, I also wanted to upgrade to OptiX 6.0, but now I’m running into a few weird issues. I need to turn off the RTX mode, but when I execute before creating the context

int rtxon = 0;
rtGlobalSetAttribute(RT_GLOBAL_ATTRIBUTE_ENABLE_RTX, sizeof(rtxon), &rtxon);

my application crashes with

OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: ptxas application ptx input, line 64; error   : Feature 'activemask' requires PTX ISA .version 6.2 or later
ptxas application ptx input, line 68; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 72; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 76; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 80; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 84; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1000; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1185; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1187; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1189; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1191; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1193; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas fatal   : Ptx assembly aborted due to errors returned (218): Invalid ptx)'

I tried to track this error down and ended up testing it in the standard OptiX 6.0 samples, in the optixTutorial sample to be exact.

It works on one computer with a GTX 1070 and the recent creator ready driver for the 1070.
It throws the error on a machine with a GTX 1080. Both machines have the same CUDA version installed (v10.1).
The 1080 has the studio driver installed.

This seems to be a driver issue - I will try different versions on both machines and try to track it down, but it’s a bit limiting to be locked in to very specific driver versions.

I don’t know much about CUDA, but is there a way to check for the installed/used PTX ISA version? The actual ptx files start with this header, both the ones I compiled with nvcc manually and the ones in the sample directory:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-25769353
// Cuda compilation tools, release 10.1, V10.1.105
// Based on LLVM 3.4svn
//

.version 6.4
.target sm_30
.address_size 64

The weird thing is that the samples (and my project) work with RTX mode on. I have other weird problems with RTX mode enabled though, but will post those in a different thread as they’re hard to isolate.

I would appreciate any tips or hints towards driver/cuda versions,

Thanks a lot,
David

Quick follow up:

If I install the creator ready driver from here [url]NVIDIA Creator Ready Driver | 419.67 | Windows 10 64-bit | NVIDIA, I can turn off RTX mode on the 1080 machine as well. This seems to be an issue with the newest Game Ready / Studio Drivers.

Just so I am absolutely clear which driver is causing the problem, will you post the version number or link to the exact Creator Ready Driver you installed that is causing you trouble?


David.

Sure,

the one that is working (for both my program and the optix samples I tried) is this one:

The ones that are NOT working (throwing above CUDA error when creating the context) for me are:

I will double check and confirm this in an hour or so again on a third computer with a 1060.

Just tried it on a 3rd machine, which had the 4.25.31 Game Ready Driver installed, and I could disable RTX mode there (1060).

I updated the driver to the latest 430.86 Game Ready driver and get the same issue when trying to turn off RTX mode:

OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: ptxas application ptx input, line 64; error   : Feature 'activemask' requires PTX ISA .version 6.2 or later
ptxas application ptx input, line 68; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 72; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 76; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 80; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 84; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1000; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1185; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1187; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1189; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1191; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas application ptx input, line 1193; error   : Feature 'shfl.sync' requires PTX ISA .version 6.0 or later
ptxas fatal   : Ptx assembly aborted due to errors returned (218): Invalid ptx)'

C:\ProgramData\NVIDIA Corporation\OptiX SDK 6.0.0\SDK\build\bin\Debug\optixTutorial.exe (process 8068) exited with code 1.

I then tried the version before that (430.64 Game Ready) and that one also seems to work.

So it looks like it’s an issue with the most recent driver only.

Hi, I can reproduce this, so I’ve filed a bug report. I believe turning RTX mode off is still supposed to work in OptiX 6. Apologies for the snag. In the mean time, let me know if we can help you port the OptiX 5 code to OptiX 6 in a way that will work with the RTX mode. Were you using selector nodes in your OptiX 5 code?


David.

Hi, thanks for following up on this!

One of the issues I had with porting the code was related to boolean flags in the payload data.
The actual codebase is a bit convoluted, so I never managed to boil it down to a minimal example, hence I didn’t post about it here. If I have some time, I will do that, but the actual problem manifested like this:

I was tracking if a ray hit an object (a lens) and got refracted or reflected with a bool hit_lens in the payload. Dependent on the negation of this boolean (!hit_lens), the alpha value of the resulting color was set.
This worked fine in OptiX 5, but absolutely not in OptiX 6. I was getting almost the opposite of what I expected. Turning on the exceptions (RT_EXCEPTION_PAYLOAD_ACCESS_OUT_OF_BOUNDS especially) did not lead to anything being thrown, but fixed the problem completley. Turning the exception back off reintroduced the problem.

In the end, I ended up swapping the boolean to an int, and using 1 - int as the negation. This worked with exceptions on and off and gave me the desired results. I still have absolutely no clue why it didn’t work with booleans.

If you think it’s worth it, I can try and adjust one of the samples to see if I can reproduce it again.

Okay, so I did a quick check and I also get strange behavior when modifying the optix tutorial sample - wall of text incoming.

Again, using the standard optix 6 sdk samples, optixTutorial, #10.

Following modifications:

tutorial.h

struct PerRayData_radiance
{
  float3 result;
  float  importance;
  int depth;

  bool miss; // new
  bool hit_lens; // new
};

tutorial10.cu - pinhole_camera()

//...

optix::Ray ray(ray_origin, ray_direction, RADIANCE_RAY_TYPE, scene_epsilon);

  PerRayData_radiance prd;
  prd.importance = 1.f;
  prd.depth = 0;
  prd.miss = false; // new
  prd.hit_lens = false; // new

  rtTrace(top_object, ray, prd);

  uchar4 res = make_color(make_float3(prd.miss, 0, 0)); // new, just for debugging
  output_buffer[launch_index] = res;

Alternatively a standard if-clause can be used as well, the results are equivalent:

float r = 0;
  if (prd.miss)
  {
    r = 1;
  }
  else
  {
    r = 0;
  }
  uchar4 res = make_color(make_float3(r, 0, 0));

Additionally, all other recursively created payload data needs to be initialized obv. (4 times in the file)

PerRayData_radiance refl_prd;
refl_prd.miss = prd_radiance.miss;
refl_prd.hit_lens = prd_radiance.hit_lens;

Now, in envmap_miss():

//...
float v     = 0.5f * ( 1.0f + sin(phi) );	
prd_radiance.miss = !prd_radiance.hit_lens; // new - considering that hit_lens is never set true, miss should always be true.
prd_radiance.result = make_float3(tex2D(envmap, u, v));

Running the above yields a few different cases:

Case 1 - RTX Mode disabled:

int rtxon = 0;
rtGlobalSetAttribute(RT_GLOBAL_ATTRIBUTE_ENABLE_RTX, sizeof(rtxon), &rtxon);

This works as expected and yields the following image:

Case 2 - RTX Mode enabled:

This just results in a black screen:

Now the interesting part: I actually have a payload that is slighty larger, so I added a padding as well to make it similar:

struct PerRayData_radiance
{
  float3 result;
  float  importance;
  float padding;
  int depth;

  bool miss;
  bool hit_lens;
};

This yields the exact same image with RTX mode disabled, as expected.
For RTX mode enabled, the image is also just black as above.

However, I noticed that turning on exceptions makes it go right again:

//rtGlobalSetAttribute(RT_GLOBAL_ATTRIBUTE_ENABLE_RTX, sizeof(rtxon), &rtxon);
context->setExceptionEnabled(RT_EXCEPTION_ALL, true);

No actual exceptions are thrown.

Tracking it a bit further, it seems to be tied to the RT_EXCEPTION_PAYLOAD_ACCESS_OUT_OF_BOUNDS exception.

context->setExceptionEnabled(RT_EXCEPTION_PAYLOAD_ACCESS_OUT_OF_BOUNDS, true);

As I said in the previous post, just swapping to int instead of bool seems to do the trick - the image looks as expected in all cases.

struct PerRayData_radiance
{
  float3 result;
  float  importance;
  float padding;
  int depth;

  int miss;
  int hit_lens;
};

Am I doing something obvious very wrong here, or is this really strange behavior? As I said, I’m not experienced with optix and just inherited this codebase, which, as described, runs fine with rtx mode disabled.

Thanks a lot for your patience!
David

Sorry about the temporarily hidden posts.
The forum software uses some heuristics to detect SPAM and isn’t always correct.

This is strange behaviour and would need to be analyzed.

If you have more of these boolean variables, I would recommend to put them into an unsigned int as bitfield.
I have quite a few in my renderers which I combined like this:
[url]optix_advanced_samples/per_ray_data.h at master · nvpro-samples/optix_advanced_samples · GitHub

It would be interesting what happens when you pad the initial structure to a nicer 4 byte alignment.
The problem with bool is that the size is implementation specific. It should be a char in CUDA but isn’t even listed in this table:
[url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types__alignment-requirements-in-device-code[/url]

Your images are locked at 60 fps. For benchmarks it’s recommended to disable vsync in the NVIDIA Control Panel under Manage 3D Settings, or use the OpenGL swap interval extension and set an interval of 0.

Thanks for un-hiding the posts again!

I only have two of the booleans (two ints now), so moving them to a bitfield would save me a bit of memory - I will do that, thanks.

As for the alignment, I just checked and moving the bools to ints actually creates a nice 32 byte struct.
Interestingly enough, if I keep the two bools, but add 6 byte of padding to end up at 32 byte, the whole thing works again without enabling exceptions.

struct PerRayData_radiance
{
  float3 result;
  float  importance;
  float padding;
  int depth;

  bool miss;
  bool hit_lens;

  uchar2 padding2;
  uchar4 padding3;

};

If I only use 30 bytes, it still breaks.
What I really don’t understand is how this is related to turning said exception on and off…

As for the images - I just used the tutorial demo to demonstrate this, not really caring about performance in this case. My actual project is rendered in Unreal Engine anyway, which takes care of vsync etc.

Again, thanks for the answer, the memory alignment is a good hint that at least fixes the acute problems.

The reason padding causes things to work is because your first payload is small enough that OptiX uses only registers for it. When your payload gets bigger, it turns into a little memory buffer, and the pointer to the memory is passed around rather than the values. It seems like what’s going wrong here is that when bools are used in a small payload, the packing of the bools into registers might not be working correctly. This may be a bug in OptiX, so we’ll investigate, but in the mean time follow Detlef’s advice and use ints instead.

I’m not certain why turning on the bounds checking exception fixes the problem, but I will go ahead and speculate that in order for it to function and do the bounds calculation, small payloads are put into memory instead of using registers, so it has the same effect as using a larger payload. I’ll verify this theory soon.

In any case, you definitely still want to use a small payload when you can, it is going to be more performant. We just want to make sure the code compiles correctly in that case.

Hi there, apologies for hijacking the post, but sadly I started receiving the same error message when I switched from OptiX 5.0 to 6.0. Unfortunately, the code base relies on selectors, so I had to disable the RTX mode. I was wondering if that bug (?) is still something that is planned to be tackled in the foreseeable future (maybe OptiX 6.5?), such that I don’t need to port the animation code and can wait it out and focus on other tasks. I am using latest CUDA 10.1 Update 2, GeForce GTX 970 and 431.60 Game Ready Driver. Thank you for your time.

Hi @starkr,

We don’t have any specific plans to restore selectors to RTX mode, and I can confirm that OptiX 6.5 will not bring selectors over. Removing selectors was an intentional breaking change we made for performance reasons in the new RTX mode.

Have you started looking into visibility masks and/or OptiX 7? Visibility masks are currently limited to 8 sets, so they don’t work for everyone, but so far some people have been able to switch their use of selectors to visibility masks and enable RTX mode. You may be able to use multiple top level acceleration structures to extend the 8 set limitation, as long as your number of visible permutations is low-ish, and not in the thousands or millions. Visibility masks are available in OptiX 6 with RTX mode as well as OptiX 7.

If visibility masks won’t work for you, we could discuss recipes for emulating selector node behavior in OptiX 6 or 7, either now or when you’re ready to port your code.


David.