I’m running into issues when I try to use the following inline PTX assembly in a closest-hit program:
int laneId; asm("mov.s32 %0, %laneid;" : "=r"(laneId) );
What I’m trying to do is use the thread’s lane index as a “swizzle” factor to reduce atomicAdd contention. The overall situation looks something like this:
rtDeclareVariable(int, nvals, , );
rtBuffer<float, 1> vals; //Accumulates data over every ray. (size==nvals)
RT_PROGRAM void closestHit() {
//...
int swizzle;
asm("mov.s32 %0, %laneid;" : "=r"(swizzle) ); //Fails with segfault if I do this.
//swizzle = launch_index.x; //No segfault if I do something like this instead.
for(int idx=swizzle; idx<(nvals+swizzle); ++idx) {
int wrappedIdx = idx % nvals;
float someVal = 2.0f*wrappedIdx;
atomicAdd(&vals[wrappedIdx], someVal);
}
//... (recursively spawn child rays that use this same closest-hit program)
}
The program compiles fine, but execution fails with a segfault whenever I try to use the laneid as the swizzle factor. Other swizzle factors that I tried worked fine, so I know this isn’t a simple array indexing issue.
So here’s my question: is retrieving the laneid via inline asm not supported in Optix programs, and that’s why I’m getting this error? Or is this something I should create a bug report for? I’m using Optix 3.9 / Cuda 7.5 on Ubuntu 14.04 Linux (64-bit).
Thank you for your help.