Access violation when creating program from PTX file

I am encountering a confusing problem when creating programs from a ptx file. Sadly, my current project isn’t in any sort of version control and I’ve changed a bit too much since it last worked (restructuring), so it’s hard to say what introduced it. The function calls that are giving me the problem never gave any issues before, though.

When creating programs from a ptx file using the C++ API (so ContextObj::createProgramFromPTXFile) I get an access violation. However, this only happens on some calls of this function, not all.

This is on Windows 7 64-bit using OptiX 3.9.0 with CUDA 7.5. GPU is a 970, driver 364.72 (I just updated to try to fix this, the problem already existed on the previous driver).

Relevant code snippet below. The error occurs on the last of these four calls. I am sure all four Program objects are declared and all four programs exist by the correct name in the .cu file.

ignore_hit = cont->createProgramFromPTXFile(pars.ptx_dir + "hitprogs.cu.ptx", "ignore_hit");
terminate_hit = cont->createProgramFromPTXFile(pars.ptx_dir + "hitprogs.cu.ptx", "terminate_on_hit");
done_hit = cont->createProgramFromPTXFile(pars.ptx_dir + "hitprogs.cu.ptx", "ray_done");
rec_hit = cont->createProgramFromPTXFile(pars.ptx_dir + "hitprogs.cu.ptx", "receiver_hit");

Output:

First-chance exception at 0x000007FEE2682EF3 (optix.1.dll) in a.exe: 0xC0000005: Access violation reading location 0x0000000000000030.

OAC trace (with paths stripped):

6
64
Platform: Windows
CUDA driver version: 8000
OptiX Version:[3.9.0] Build Number:[20240475Branch%] CUDA Version:[7.5] 64-bit 2015-12-09 
Command line: a.exe
Capture time: 2016-04-13 10:45:54
%%
rtContextCreate( 00000000001AF870 )
  res = 0
  hdl = 0000000000757140
rtContextSetRayTypeCount( 0000000000757140, 2 )
  res = 0
rtContextSetEntryPointCount( 0000000000757140, 1 )
  res = 0
rtProgramCreateFromPTXFile( 0000000000757140, C:\...\ptx\raygen.cu.ptx, raygen, 00000000001AF898 )
  file = oac.ptx.000000.potx
  res = 0
  hdl = 00000000066ADAA0
rtContextSetRayGenerationProgram( 0000000000757140, 0, 00000000066ADAA0 )
  res = 0
rtContextQueryVariable( 0000000000757140, ray_openangle, 00000000001AF7F8 )
  res = 0
rtContextDeclareVariable( 0000000000757140, ray_openangle, 00000000001AF7F8 )
  res = 0
  hdl = 000000000083BB70
rtVariableSet1f( 000000000083BB70, 0.000174533 )
  res = 0
rtContextQueryVariable( 0000000000757140, tx_pos, 00000000001AF7F8 )
  res = 0
rtContextDeclareVariable( 0000000000757140, tx_pos, 00000000001AF7F8 )
  res = 0
  hdl = 000000000083BC70
rtVariableSet3fv( 000000000083BC70, 00000000001AF900 )
  val = 100 0 0
  res = 0
rtContextQueryVariable( 0000000000757140, tx_normal, 00000000001AF7F8 )
  res = 0
rtContextDeclareVariable( 0000000000757140, tx_normal, 00000000001AF7F8 )
  res = 0
  hdl = 000000000083BD70
rtVariableSet3fv( 000000000083BD70, 00000000001AF900 )
  val = -1 0 0
  res = 0
rtContextQueryVariable( 0000000000757140, tx_pol, 00000000001AF7F8 )
  res = 0
rtContextDeclareVariable( 0000000000757140, tx_pol, 00000000001AF7F8 )
  res = 0
  hdl = 000000000083BE70
rtVariableSet3fv( 000000000083BE70, 00000000001AF900 )
  val = 0 0 1
  res = 0
rtContextQueryVariable( 0000000000757140, tx_k, 00000000001AF7F8 )
  res = 0
rtContextDeclareVariable( 0000000000757140, tx_k, 00000000001AF7F8 )
  res = 0
  hdl = 000000000083BF70
rtVariableSet1f( 000000000083BF70, 1611.07 )
  res = 0
rtProgramCreateFromPTXFile( 0000000000757140, C:\...\ptx\hitprogs.cu.ptx, ignore_hit, 00000000001AF898 )
  file = oac.ptx.000001.potx
  res = 0
  hdl = 0000000006B5D040
rtProgramCreateFromPTXFile( 0000000000757140, C:\...\ptx\hitprogs.cu.ptx, terminate_on_hit, 00000000001AF898 )
  file = oac.ptx.000001.potx
  res = 0
  hdl = 0000000006B5D110
rtProgramCreateFromPTXFile( 0000000000757140, C:\...\ptx\hitprogs.cu.ptx, ray_done, 00000000001AF898 )
  file = oac.ptx.000001.potx
  res = 0
  hdl = 0000000006B5D1E0
rtProgramCreateFromPTXFile( 0000000000757140, C:\...\ptx\hitprogs.cu.ptx, receiver_hit, 00000000001AF898 )
  file = oac.ptx.000001.potx

It’s not possible to analyze this without a running application or complete OAC trace archive.
The error is an access violation because the pointer 0x0000000000000030 is outside the program’s address space (in a protected system area).

You could test if re-ordering the four createProgramFromPTXFile() calls moves the error with the “receiver_hit” or if it’s always the last, but that’s about it.
If it moves with the program, check what is different in that program compared to the others.

Other than that, you could try OptiX 4.0 beta.

Thanks for the quick response. Just to confirm: you are saying this can be caused by something inside the program? For some reason I thought the createProgramFromPTXFile() would just make my kernel programs accessible to OptiX, but not actually interact with the CUDA code. If it can be something inside the “receiver_hit” program, that gives me some more things to try. Quick testing shows that the error does move with that program.

edit: Some quick commenting of parts of “receiver_hit” shows that the problem is indeed inside that function, and specifically with a function I’m calling inside that function. I should be able to figure out the problem from here. Thanks again, I thought the CUDA code wouldn’t be relevant at this point in setting up my context, but I guess creating a callable OptiX program from a compile PTX file is a more involved process than I thought.

OptiX doesn’t support function call instructions inside PTX code unless they have been created via OptiX callable program mechanism.
Other function calls are always inlined. To make sure that happens in the CUDA to PTX compilation process already I’m declaring all functions in OptiX CUDA C++ with forceinline device

inline alone is not enough, because that’s just a hint and the CUDA compiler might decide to not inline a function if it’s too big or has too many arguments. I’ve seen that happening.

I’m using a define for that which looks like this:

#ifndef RT_FUNCTION
#define RT_FUNCTION __forceinline__ __device__
#endif

// Example: 
RT_FUNCTION int random()
{
  return 4; // Random number, determined by fair dice roll! ;-)
}

Depending on how you use the callable program (bound to a variable or bindless via ID) there are also different scopes you can access!
Bound callable programs inherit the scope of the caller, bindless callable programs only have the context and themselves, the program, as scope. Means bindless callable programs cannot call rtTrace() or rtTransform*() functions for example.
Refer to the OptiX Programming Guide for more information.

The function in question was intended to be an OptiX callable program, but there was an error in the way I created it. It just didn’t occur to me to look at that, because I did not expect OptiX to interact with the internals of my PTX files at this stage in setting up my context, so thanks again for pointing me in the right direction. The forceinline way of declaring functions is also a useful tip, I think I’ll use that in the future for simple-ish functions.