OptiX Bug? crash with CUDA error: Kernel ret (700) when not rtPrinting anything (small demo code)
Hi, I have a crash with the following exception: [quote]OptiX Error: Unknown error (Details: Function "RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)" caught exception: Encountered a CUDA error: Kernel launch returned (700): Launch failed, [6619200])[/quote] This is the sourcecode of the cuda file (edit: cpp file see below): [code] #include <optix_world.h> struct BiDirSubPathVertex {bool existing;}; using namespace optix; rtCallableProgram(void, sampleLightPath, ()); rtCallableProgram(void, sampleEye, ()); rtDeclareVariable(uint2, launch_index, rtLaunchIndex, ); rtBuffer<float4, 2> output_buffer; RT_CALLABLE_PROGRAM void sampleLightPath_f() {} RT_CALLABLE_PROGRAM void sampleEye_f() {} RT_PROGRAM void pathtrace_camera() { BiDirSubPathVertex lightVertices[2]; lightVertices[0].existing = false; lightVertices[1].existing = false; for(unsigned int i=0; i<2; i++) { sampleLightPath(); if(!(lightVertices[i].existing)) break; } // rtPrintf("cztery\n"); sampleEye(); output_buffer[launch_index] = make_float4(1.f, 1.f, 1.f, 1.f); } RT_PROGRAM void exception() { rtPrintExceptionDetails(); output_buffer[launch_index] = make_float4(1.f, 1.f, 0.f, 0.0f); } [/code] You see the rtPrintf? if the comment signs are removed, the program doesn't crash. So while the crash is simple to work around in this specific place, it would be hard if the print is not already there. The original code file was about 700 lines long, the functions had parameters and traced rays. While removing more and more code, it always depended only on this one rtPrintf whether it crashed or not. In the original code I had exceptions and printing enabled, but it didn't make any difference. I verified the crash on Win7 64 (vs12 compiler, nvidia driver around 336 whql) and OpenSuse Linux 13.1 64 (gcc 4.8, nvidia driver 331.49), both systems had Cuda 5.5 and Optix 3.5 installed. the workstation has an AMD quadcore and a GeForce GTX 550Ti. edit: >>additionally verified on a Win7 64bit, vs12 compiler, nvidia driver 332.76, cuda 5.0 and Optix 3.0. the workstation has a intel xeon quad core and quadro 2000 graphics.<< minimal example: [s]http://xibo.at/meine/optixCrashBugPrintMinimalExample.zip[/s] [url=http://xibo.at/meine/optixCrashBugPrintMinimalExample2.zip]new file with less code[/url] It's based on sutil, the same build steps as in the optix examples are necessary. [b]Is anybody able to reproduce?[/b] thanks, adam
Hi,
I have a crash with the following exception:
OptiX Error: Unknown error (Details: Function "RTresult _rtContextLaunch2D(RTcontext, unsigned int, RTsize, RTsize)" caught exception: Encountered a CUDA error: Kernel launch returned (700): Launch failed, [6619200])


This is the sourcecode of the cuda file (edit: cpp file see below):
#include <optix_world.h>
struct BiDirSubPathVertex {bool existing;};
using namespace optix;

rtCallableProgram(void, sampleLightPath, ());
rtCallableProgram(void, sampleEye, ());
rtDeclareVariable(uint2, launch_index, rtLaunchIndex, );
rtBuffer<float4, 2> output_buffer;
RT_CALLABLE_PROGRAM void sampleLightPath_f() {}
RT_CALLABLE_PROGRAM void sampleEye_f() {}

RT_PROGRAM void pathtrace_camera() {
BiDirSubPathVertex lightVertices[2];
lightVertices[0].existing = false;
lightVertices[1].existing = false;

for(unsigned int i=0; i<2; i++) {
sampleLightPath();
if(!(lightVertices[i].existing)) break;
}
// rtPrintf("cztery\n");
sampleEye();

output_buffer[launch_index] = make_float4(1.f, 1.f, 1.f, 1.f);
}

RT_PROGRAM void exception()
{
rtPrintExceptionDetails();
output_buffer[launch_index] = make_float4(1.f, 1.f, 0.f, 0.0f);
}


You see the rtPrintf? if the comment signs are removed, the program doesn't crash. So while the crash is simple to work around in this specific place, it would be hard if the print is not already there.

The original code file was about 700 lines long, the functions had parameters and traced rays. While removing more and more code, it always depended only on this one rtPrintf whether it crashed or not. In the original code I had exceptions and printing enabled, but it didn't make any difference.

I verified the crash on Win7 64 (vs12 compiler, nvidia driver around 336 whql) and OpenSuse Linux 13.1 64 (gcc 4.8, nvidia driver 331.49), both systems had Cuda 5.5 and Optix 3.5 installed. the workstation has an AMD quadcore and a GeForce GTX 550Ti.

edit:
>>additionally verified on a Win7 64bit, vs12 compiler, nvidia driver 332.76, cuda 5.0 and Optix 3.0. the workstation has a intel xeon quad core and quadro 2000 graphics.<<

minimal example: http://xibo.at/meine/optixCrashBugPrintMinimalExample.zip new file with less code
It's based on sutil, the same build steps as in the optix examples are necessary.

Is anybody able to reproduce?

thanks,
adam
#1
Posted 04/24/2014 10:15 PM   
Not sure if this will help in your particular case, but I have seen rtPrint cover up unrelated memory corruption problems.
Not sure if this will help in your particular case, but I have seen rtPrint cover up unrelated memory corruption problems.

#2
Posted 04/25/2014 07:00 PM   
memory corruption on host or device side? yes, I was thinking of memory corruption all the time. That's also why I shortened the program to these 31 lines, the host side, apart from sutil is also just 65 lines long. If something, it could be memory corruption on the host side inside sutil. maybe I should stop using this behemoth. edit: I got rid of the SampleScene class. sutilSamplesPtxDir() is now the only sutil function I'm calling and still the same behaviour. here is the updated full code: http://xibo.at/meine/optixCrashBugPrintMinimalExample2.zip edit2: now the minimal project contains 2 source files (apart from sutil, cmake etc). the cuda file is posted already above, the other is here (I also updated the zip): [code] #include <optixu/optixpp_namespace.h> #include <sutil.h> #include <stdlib.h> #include <string.h> const char* const ptxpath( const std::string& target, const std::string& base ) { static std::string path; path = std::string(sutilSamplesPtxDir()) + "/" + target + "_generated_" + base + ".ptx"; return path.c_str(); } int main( int argc, char** argv ) { try { optix::Context context = optix::Context::create(); context->setEntryPointCount( 1 ); optix::Buffer buffer = context->createBuffer( RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, 512, 512); context["output_buffer"]->set(buffer); optix::Program exceptionProgram = context->createProgramFromPTXFile(ptxpath("helsinki", "BiDirCamera.cu"), "exception"); optix::Program ray_gen_program = context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "pathtrace_camera"); ray_gen_program["sampleLightPath"] ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleLightPath_f")); ray_gen_program["sampleEye"] ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleEye_f")); context->setRayGenerationProgram(0, ray_gen_program); context->setExceptionProgram(0, exceptionProgram); context->validate(); context->compile(); context->launch(0, 512, 512); } catch( optix::Exception& e ){ sutilReportError( e.getErrorString().c_str() ); exit(1); } return 0; } [/code] still the same behavior. edit3: made the code even shorter for the post. tested but zip not updated.
memory corruption on host or device side?

yes, I was thinking of memory corruption all the time. That's also why I shortened the program to these 31 lines, the host side, apart from sutil is also just 65 lines long.

If something, it could be memory corruption on the host side inside sutil. maybe I should stop using this behemoth.

edit:
I got rid of the SampleScene class. sutilSamplesPtxDir() is now the only sutil function I'm calling and still the same behaviour.
here is the updated full code: http://xibo.at/meine/optixCrashBugPrintMinimalExample2.zip

edit2:
now the minimal project contains 2 source files (apart from sutil, cmake etc). the cuda file is posted already above, the other is here (I also updated the zip):
#include <optixu/optixpp_namespace.h>
#include <sutil.h>
#include <stdlib.h>
#include <string.h>

const char* const ptxpath( const std::string& target, const std::string& base ) {
static std::string path;
path = std::string(sutilSamplesPtxDir()) + "/" + target + "_generated_" + base + ".ptx";
return path.c_str();
}

int main( int argc, char** argv ) {
try {
optix::Context context = optix::Context::create();
context->setEntryPointCount( 1 );

optix::Buffer buffer = context->createBuffer( RT_BUFFER_OUTPUT, RT_FORMAT_FLOAT4, 512, 512);
context["output_buffer"]->set(buffer);

optix::Program exceptionProgram = context->createProgramFromPTXFile(ptxpath("helsinki", "BiDirCamera.cu"), "exception");
optix::Program ray_gen_program = context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "pathtrace_camera");
ray_gen_program["sampleLightPath"] ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleLightPath_f"));
ray_gen_program["sampleEye"] ->set(context->createProgramFromPTXFile(ptxpath( "helsinki", "BiDirCamera.cu" ), "sampleEye_f"));

context->setRayGenerationProgram(0, ray_gen_program);
context->setExceptionProgram(0, exceptionProgram);
context->validate();
context->compile();
context->launch(0, 512, 512);
} catch( optix::Exception& e ){
sutilReportError( e.getErrorString().c_str() );
exit(1);
}

return 0;
}


still the same behavior.

edit3: made the code even shorter for the post. tested but zip not updated.
#3
Posted 04/25/2014 09:16 PM   
I had the chance to test the program on one of my universities workstations. again, it's the same behaviour. these are the specs: Win7 64bit, vs10 compiler, nvidia driver 332.76 whql, cuda 5.0 and Optix 3.0. the workstation has an intel xeon quad core and quadro 2000 graphics. are there actually any OptiX developers on this forum? any chance of this being investigated? or, anybody seeing a possibility for corrupting the stack?
I had the chance to test the program on one of my universities workstations. again, it's the same behaviour.

these are the specs:
Win7 64bit, vs10 compiler, nvidia driver 332.76 whql, cuda 5.0 and Optix 3.0. the workstation has an intel xeon quad core and quadro 2000 graphics.

are there actually any OptiX developers on this forum? any chance of this being investigated?

or, anybody seeing a possibility for corrupting the stack?
#4
Posted 04/28/2014 07:51 AM   
Not that it's of much help but I can confirm the issue. Specs: Win 8.1 x64, VS2012, driver 337.88, Cuda 5.5, Optix 3.5.1, Intel i7 4770K, GTX770.
Not that it's of much help but I can confirm the issue. Specs: Win 8.1 x64, VS2012, driver 337.88, Cuda 5.5, Optix 3.5.1, Intel i7 4770K, GTX770.

#5
Posted 06/11/2014 02:36 AM   
I know, your post is over 3 years ago, but I face the same issue today as you at that time I think. I try to achieve also a hybrid result with OptiX. In your blog you said, that you think a special rtPrintf function would have been the problem. I eliminated that function completely. I don't use it, but I also have same exception 700. [i][b]OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address) [/b][/i] and: [i][b]OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)[/b][/i] see my current state: [url]https://devtalk.nvidia.com/default/topic/1028487/optix/exception-after-rtcontextlaunch2d-failure/post/5233815/#5233815[/url] rtContextLaunch2D does the kernel launch. but why then a writeback device to host (CuMemcpy[b]DtoH[/b]Async) occurs? Normally you run the kernel and results remain on the GPU. Only when desired a "download" from GPU to CPU can be requested. What is written to host there? [B]UPDATE:[/B] solved. TDR was not enabled. And I use CUDA 9.0 on driver 388.59 now.
I know, your post is over 3 years ago, but I face the same issue today as you at that time I think.

I try to achieve also a hybrid result with OptiX. In your blog you said, that you think a special
rtPrintf function would have been the problem.
I eliminated that function completely. I don't use it, but I also have same exception 700.

OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)


and:
OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception:
Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)


see my current state: https://devtalk.nvidia.com/default/topic/1028487/optix/exception-after-rtcontextlaunch2d-failure/post/5233815/#5233815

rtContextLaunch2D does the kernel launch. but why then a writeback device to host (CuMemcpyDtoHAsync) occurs? Normally you run the kernel and results remain on the GPU. Only when desired a "download" from GPU to CPU can be requested. What is written to host there?


UPDATE: solved. TDR was not enabled. And I use CUDA 9.0 on driver 388.59 now.

Disclaimer: No warranty. No legal advice.

#6
Posted 01/20/2018 07:01 AM   
What is written to host is a status byte (or couple of bytes) indicating whether the launch succeeded. When the launch crashes you get the error you pasted above -- it is very generic and does not necessarily have anything to do with rtPrintf.
What is written to host is a status byte (or couple of bytes) indicating whether the launch succeeded. When the launch crashes you get the error you pasted above -- it is very generic and does not necessarily have anything to do with rtPrintf.

#7
Posted 01/22/2018 06:14 PM   
@dlacewell Thank you very much for this clarification. So actually "CuMemcpyDtoHAsync" itself did not crash with code 716; it only reports, that during kernel execution a mis-alignment occured. Is there somewhere a documentation about all the cases, when mis-alignment can happen? I found [url]http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types[/url] Are there more guidelines?
@dlacewell
Thank you very much for this clarification.

So actually "CuMemcpyDtoHAsync" itself did not crash with code 716; it only reports, that during kernel execution a mis-alignment occured.
Is there somewhere a documentation about all the cases, when mis-alignment can happen?
I found http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#vector-types
Are there more guidelines?

Disclaimer: No warranty. No legal advice.

#8
Posted 01/22/2018 07:40 PM   
A misaligned address is another very generic error. It occurs when you're reading from an unexpected memory location that doesn't satisfy certain conditions for the read instruction, and it usually indicates an error in user code. It is very roughly analogous to a segfault in host code. I would follow Detlef's advice on the other thread to try and narrow it down, rather than continuing to cross-post here on this thread.
A misaligned address is another very generic error. It occurs when you're reading from an unexpected memory location that doesn't satisfy certain conditions for the read instruction, and it usually indicates an error in user code. It is very roughly analogous to a segfault in host code.

I would follow Detlef's advice on the other thread to try and narrow it down, rather than continuing to cross-post here on this thread.

#9
Posted 01/22/2018 09:50 PM   
Scroll To Top

Add Reply