With the code below, when I try to load the resulting module (compiled with ptxas -v -arch sm_30), cuModuleLoadData returns CUDA_ERROR_INVALID_IMAGE. If I comment out the rem instruction, it loads just fine? I’m running with CUDA 7.5 on a GTX 690, Windows 10.
.version 3.2
.target sm_30
.address_size 64
.entry renderKernel
(
.param.u64 _bufferAddr,
.param.s32 _bufferWidth
)
{
.reg.s64 bufferAddr;
.reg.u32 bufferWidth, t0;
ld.param.u64 bufferAddr, [_bufferAddr];
cvta.to.global.u64 bufferAddr, bufferAddr;
ld.param.u32 bufferWidth, [_bufferWidth];
mov.u32 t0, %ntid.x;
mad.wide.u32 bufferAddr, t0, 4, bufferAddr;
rem.u32 t0, t0, bufferWidth; // comment this line out to make this work
st.global.u32 [bufferAddr], t0;
}
:D