[SOLVED] Exception after rtContextLaunch2D failure
[B]EDIT:[/B] this post oringinally was called "Shadow Issue After Denoising render result of Photon Mapper Sample" this part is [b]SOLVED.[/b] (usage of "appendLaunch" was wrong in my code, see post #7 of this thread) now thread renamed, cause a related exception still is present [B]SOLVED:[/B] using one TOP GROUP instead of a group hierarchy solved it! Thanks Detlef. I tried 3 different settings: Test 1. adding the Denoiser+ToneMapper stages (from Denoiser Sample in OptiX 5.0.0 SDK) to Photon Mapper of OptiX Advaned Samples. (see result in picture1.jpg) As you see a Gamma issue occurs and no visibile shadows are present; On using sutil::displayBufferGL(getTonemappedBuffer(), BUFFER_PIXEL_FORMAT_DEFAULT, true); that even got more worse: see picture2.jpg Test 2. pure Denoiser (see picture3.jpg) I simply removed the tone mapper stage and used [code]denoiserStage->declareVariable("input_buffer")->set(getOutputBuffer());[/code] (ignoring the 2.2 Gamma issue with the training data "6.4.1 Deep-learning based Denoiser" in doc) => color seems to be ok, but shadows are still removed. Test 3. a) tone maapping with Gamma 2.2 (as in original Denoiser sample) b) denoiser (as in original Denoiser sample) c) second tone mapping with Gamme 1.0 / 2.2 [code] tonemapStage2->declareVariable("input_buffer")->set(denoisedBuffer); tonemapStage2->declareVariable("output_buffer")->set(getOutputBuffer());[/code] => nearly same result as test 2; a very slight shadow present (picture4.jpg) I tried out 0.0, 0.5 and 1.0 for Variable(denoiserStage->queryVariable("blend"))->setFloat(denoiseBlend); Always Exposure = 1.0 But no difference. Always the Photon Mapper finishes before Denoising starts. I use: [code]int PhotonMappingFrames = 5; isEarlyFrame = (accumulation_frame <= (unsigned int)(numNonDenoisedFrames+PhotonMappingFrames) ); skipDenoising = (accumulation_frame <= (unsigned int)(PhotonMappingFrames));[/code] in all cases the output is smooth, but all the shadows disappeared nearly completely. Is there an option to configure the Denoiser to avoid this? In the docs this is not mentioned as limitation. is it a limitation?
EDIT: this post oringinally was called "Shadow Issue After Denoising render result of Photon Mapper Sample"
this part is SOLVED. (usage of "appendLaunch" was wrong in my code, see post #7 of this thread)
now thread renamed, cause a related exception still is present
SOLVED: using one TOP GROUP instead of a group hierarchy solved it! Thanks Detlef.



I tried 3 different settings:

Test 1. adding the Denoiser+ToneMapper stages (from Denoiser Sample in OptiX 5.0.0 SDK) to Photon Mapper of
OptiX Advaned Samples. (see result in picture1.jpg)
As you see a Gamma issue occurs and no visibile shadows are present;
On using sutil::displayBufferGL(getTonemappedBuffer(), BUFFER_PIXEL_FORMAT_DEFAULT, true);
that even got more worse: see picture2.jpg

Test 2. pure Denoiser (see picture3.jpg) I simply removed the tone mapper stage and used
denoiserStage->declareVariable("input_buffer")->set(getOutputBuffer());

(ignoring the 2.2 Gamma issue with the training data "6.4.1 Deep-learning based Denoiser" in doc)
=> color seems to be ok, but shadows are still removed.

Test 3. a) tone maapping with Gamma 2.2 (as in original Denoiser sample)
b) denoiser (as in original Denoiser sample)
c) second tone mapping with Gamme 1.0 / 2.2
tonemapStage2->declareVariable("input_buffer")->set(denoisedBuffer);
tonemapStage2->declareVariable("output_buffer")->set(getOutputBuffer());

=> nearly same result as test 2; a very slight shadow present (picture4.jpg)

I tried out 0.0, 0.5 and 1.0 for Variable(denoiserStage->queryVariable("blend"))->setFloat(denoiseBlend);
Always Exposure = 1.0
But no difference.

Always the Photon Mapper finishes before Denoising starts. I use:
int PhotonMappingFrames = 5; 
isEarlyFrame = (accumulation_frame <= (unsigned int)(numNonDenoisedFrames+PhotonMappingFrames) );
skipDenoising = (accumulation_frame <= (unsigned int)(PhotonMappingFrames));



in all cases the output is smooth, but all the shadows disappeared nearly completely.
Is there an option to configure the Denoiser to avoid this?
In the docs this is not mentioned as limitation. is it a limitation?

Disclaimer: No warranty. No legal advice.

#1
Posted 01/09/2018 10:57 PM   
Would you be able to provide the full changes to the optixProgressivePhotonMap.cpp file instead of just the code excerpts above? Then I could drop that into the advanced OptiX sample locally and try to reproduce this. That would speed up the turnaround time and avoid code differences when guessing about the rest of the necessary changes. Please always provide the system configuration when reporting issues: OS version, installed GPU(s), display driver version, OptiX version, CUDA toolkit version used to produce the input PTX code.
Would you be able to provide the full changes to the optixProgressivePhotonMap.cpp file instead of just the code excerpts above?
Then I could drop that into the advanced OptiX sample locally and try to reproduce this.
That would speed up the turnaround time and avoid code differences when guessing about the rest of the necessary changes.

Please always provide the system configuration when reporting issues:
OS version, installed GPU(s), display driver version, OptiX version, CUDA toolkit version used to produce the input PTX code.

#2
Posted 01/11/2018 02:44 PM   
thank you for your answer, Detlef. in the attachment I added a file TestApp0.zip with a full VS2017 solution. [b]REMOVED[/b] please inform me, when you downloaded the attachment, so that I can remove it from this post; or simply remove it on your own from here (if possible); its only made for this issue and should not remain online yet. in file #info.txt in that zip file is more information. main cpp file is "optixMeshViewer.cpp" in folder D:\TestApp0\optixMeshViewer ------------------------------------------------------- my current system info: Device: GTX 1050 Driver: 388.71 OptiX 5.0 with CUDA Toolkit 9.1.85 on Visual Studio 2017 Community 15.5.2 (toolset v140 of VS2015) on Windows10 PRO 64bit (still Anniversary Update version 1607 build 14393.1593) Win SDK Target Platform 10.0.15063.0 (Creators Update SDK) however, same on 10.0.14393.0 OptiX 5.0.0 is installed in C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0 CUDA 9.1.85 is installed in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1 Windows 10 Kits are installed in C:\Program Files (x86)\Windows Kits\10\Include\10.0.15063.0 C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0 C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0
thank you for your answer, Detlef.

in the attachment I added a file TestApp0.zip with a full VS2017 solution. REMOVED
please inform me, when you downloaded the attachment, so that I can remove it from this post; or simply remove it on your own from here (if possible); its only made for this issue and should not remain online yet.


in file #info.txt in that zip file is more information.
main cpp file is "optixMeshViewer.cpp" in folder D:\TestApp0\optixMeshViewer


-------------------------------------------------------
my current system info:

Device: GTX 1050 Driver: 388.71
OptiX 5.0 with CUDA Toolkit 9.1.85 on Visual Studio 2017 Community 15.5.2 (toolset v140 of VS2015)
on Windows10 PRO 64bit (still Anniversary Update version 1607 build 14393.1593)
Win SDK Target Platform 10.0.15063.0 (Creators Update SDK) however, same on 10.0.14393.0

OptiX 5.0.0 is installed in C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0
CUDA 9.1.85 is installed in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1
Windows 10 Kits are installed in C:\Program Files (x86)\Windows Kits\10\Include\10.0.15063.0
C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0
C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0

Disclaimer: No warranty. No legal advice.

#3
Posted 01/12/2018 12:16 AM   
Thanks for providing the complete project. I downloaded and removed it from the post. In the future, if you need things to stay confidential, you can either attach it to a private message, look into the OptiX release notes for the OptiX-Help e-mail address if attachments are smaller than 10 MB, or for bigger data ask for a temporary FTP account I can setup and send to your registered e-mail address. Unfortunately your project setup is a little too hardcoded to work with on different systems. We'd really need a minimal reproducer which is neither limited to a specific hard-drive location, Visual Studio version, or even operating system. That's why I asked if you'd be able to reproduce the problem by adding just your post-processing code the same way into the the original progressive photon mapper sources because the CMake based solutions there allow to work with other system setups (including Linux). It might simply be that the post-processing is not working on the photon mapper nicely because that produces low frequency noise while the denoiser was trained on path tracing images which contain high frequency noise. That's why I'd like to keep the necessary reproduction effort low.
Thanks for providing the complete project. I downloaded and removed it from the post.

In the future, if you need things to stay confidential, you can either attach it to a private message, look into the OptiX release notes for the OptiX-Help e-mail address if attachments are smaller than 10 MB, or for bigger data ask for a temporary FTP account I can setup and send to your registered e-mail address.

Unfortunately your project setup is a little too hardcoded to work with on different systems.
We'd really need a minimal reproducer which is neither limited to a specific hard-drive location, Visual Studio version, or even operating system.

That's why I asked if you'd be able to reproduce the problem by adding just your post-processing code the same way into the the original progressive photon mapper sources because the CMake based solutions there allow to work with other system setups (including Linux).

It might simply be that the post-processing is not working on the photon mapper nicely because that produces low frequency noise while the denoiser was trained on path tracing images which contain high frequency noise. That's why I'd like to keep the necessary reproduction effort low.

#4
Posted 01/12/2018 10:48 AM   
I managed to create such a modified version of the optixProgressivePhotonMap.cpp file for you. additional to that file the include file "Denoising.h" must be present in the original "Progressive Photon Mapping OptiX Advanced Sample". Both files are in the attachment of this post. [b]REMOVED[/b] Thank you very much! Next time I will use that email address for submitting projects. Thank you. in http://www.ci.i.u-tokyo.ac.jp/~hachisuka/ppm.pdf I found: [..] To avoid noise it is necessary to use a large number of photons[..] so I also increased MAX_PHOTON_COUNT = 20u; => 512 * 512 * 20 = 5.2 million photons ! but no difference at all Different denoiseBlend 0.0, 0.5 and 1.0 also does not change anything On denoiseBlend = 1.0 the original image should be visible, shouldn't it? But also there the shadows are missing. in docs [..]value of 1.0 means that the original image is written to the output buffer.[..] at "6.4.1 Deep-learning based Denoiser" The docs also say "Simple tonemapper". What exactly does that mean? What algorithm does it use? If it only would do gamma correction and Exposure (which is 1.0f) that should not affect the picture at all. [u][B]UPDATE:[/b][/u] [b]mail7hfx9's question[/b] on: [url]https://devtalk.nvidia.com/default/topic/1028868/optix/tonemapper-doing-weird-things/[/url] said: [..]Even if exposure and gamma are set to 1.0f it modifies the image. I would expect it to not change the "original" image, but it does get darker here.[..] [b]Detlef Roettger's answer: [/b] [..]I've confirmed what you're experiencing and the reason is that the TonemapperSimple is a three stage implementation of exposure, tonemapping, and gamma. Means even if the exposure and gamma values are set to default values, the tonemapper step is still applied and it's inspired by what analog films or real-time games, for that matter, do to preserve a natural look as much as possible.[..] The documentation touches on that by saying that the stage does tonemapping and gamma correction. That explanation is going to be clarified in the future. Sorry for the confusion.[..] -------------------------------------- -------------------------------------- However, here 2 issues which do not affect the denoiser/tonemapper, but I wondered about them during testing the samples: ------------------- 1. a very strange exception see: TestApp0.zip\TestApp0\bin\Debug\CU\optixMeshViewer\pinhole_camera.cu line 150 Expressions containing a var of type "[b]rtIntersectionDistance[/b]" in a ray-gen program can cause an OS freeze/crash dependend on an not-closeable process: [b][i]OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)'[/i][/b] in a post here on devtalk I found [i]"error about cuMemcpyDtoHAsync is generic and just means the launch failed. At the end of the launch we read back some status info from device to host, which is an illegal read if the kernel is no longer running."[/i] That all is fine to quit the app on such errors, but the entire system freeze is not so great. Only by plugging power on/off device resets. the docs "4.1.3 Internally provided semantics" say: "rtIntersectionDistance" is not supported in the Ray-Generation (its actually no check mark in table 5 "Semantic Variables" for it); No exception if such a var is used in an expression in a "Closest Hit" prog. But there's unfortunately no compiler error when using "[b]rtIntersectionDistance[/b]" within a ray-gen program Some people a few years ago here on devtalk also said that the rtprintf function could be a problem. Was there really a problem with that function? The system freeze in my case also occured directly after the exception message (which is written by rtprintf). It was not possible to close the app within the task manager (Access denied). And on trying to reboot the system the OS freezes. ------------------- 2. I just wondered about that in the "Photon Mapper" sample: in original file optixProgressivePhotonMap.cpp in line 243: context->setMissProgram( [b]rtpass[/b], context->createProgramFromPTXFile( ptx_path, "rtpass_miss" ) ); "rtpass" is an "[b]entry_point_index[/b]" (enum "ProgramEnum") [u]BUT[/u]: function [i]void setMissProgram(unsigned int [b]ray_type_index[/b], Program program);[/i] (in file optixpp_namespace.h in optixu sub folder: C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0\include\optixu) requests a "ray_type_index" [b]rtpass[/b] is no "ray_type_index"; however its 0 anyway. So within this sample it makes no difference whether its an "entry point index" or a "ray type index". - in optix_host.h: [..] rtContextSetMissProgram sets a context's miss program associated with ray type.[..] ray_type_index : The ray type the program will be associated with[..] So using [b]rtpass[/b] as "ray_type_index" is not 100% logical, isn't it?
I managed to create such a modified version of the optixProgressivePhotonMap.cpp file for you.
additional to that file the include file "Denoising.h" must be present in the original "Progressive Photon Mapping OptiX Advanced Sample".
Both files are in the attachment of this post. REMOVED

Thank you very much!

Next time I will use that email address for submitting projects. Thank you.

in http://www.ci.i.u-tokyo.ac.jp/~hachisuka/ppm.pdf I found:
[..] To avoid noise it is necessary to use a large number of photons[..]
so I also increased MAX_PHOTON_COUNT = 20u; => 512 * 512 * 20 = 5.2 million photons !
but no difference at all
Different denoiseBlend 0.0, 0.5 and 1.0 also does not change anything

On denoiseBlend = 1.0 the original image should be visible, shouldn't it? But also there the shadows are missing. in docs [..]value of 1.0 means that the
original image is written to the output buffer.[..] at "6.4.1 Deep-learning based Denoiser"

The docs also say "Simple tonemapper". What exactly does that mean? What algorithm does it use?
If it only would do gamma correction and Exposure (which is 1.0f) that should not affect the picture at all.
UPDATE:
mail7hfx9's question on:
https://devtalk.nvidia.com/default/topic/1028868/optix/tonemapper-doing-weird-things/ said:
[..]Even if exposure and gamma are set to 1.0f it modifies the image.
I would expect it to not change the "original" image, but it does get darker here.[..]
Detlef Roettger's answer: [..]I've confirmed what you're experiencing and the reason is that the TonemapperSimple
is a three stage implementation of exposure, tonemapping, and gamma.
Means even if the exposure and gamma values are set to default values, the tonemapper
step is still applied and it's inspired by what analog films or real-time games, for that matter,
do to preserve a natural look as much as possible.[..]
The documentation touches on that by saying that the stage does tonemapping and gamma
correction. That explanation is going to be clarified in the future. Sorry for the confusion.[..]


--------------------------------------
--------------------------------------
However, here 2 issues which do not affect the denoiser/tonemapper, but I wondered about them during testing the samples:

-------------------
1. a very strange exception see:
TestApp0.zip\TestApp0\bin\Debug\CU\optixMeshViewer\pinhole_camera.cu line 150
Expressions containing a var of type "rtIntersectionDistance" in a ray-gen program can cause an OS freeze/crash dependend on an not-closeable process:
OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)'
in a post here on devtalk I found "error about cuMemcpyDtoHAsync is generic and just means the launch failed. At the end of the launch we read back some status info from device to host, which is an illegal read if the kernel is no longer running."

That all is fine to quit the app on such errors, but the entire system freeze is not so great.
Only by plugging power on/off device resets.
the docs "4.1.3 Internally provided semantics" say:
"rtIntersectionDistance" is not supported in the Ray-Generation (its actually no check mark in table 5 "Semantic Variables" for it);
No exception if such a var is used in an expression in a "Closest Hit" prog.
But there's unfortunately no compiler error when using "rtIntersectionDistance" within a ray-gen program

Some people a few years ago here on devtalk also said that the rtprintf function could be a problem. Was there really a problem with that function? The system freeze in my case also occured directly after the exception message (which is written by rtprintf). It was not possible to close the app within the task manager (Access denied). And on trying to reboot the system the OS freezes.



-------------------
2. I just wondered about that in the "Photon Mapper" sample:

in original file optixProgressivePhotonMap.cpp in line 243:
context->setMissProgram( rtpass, context->createProgramFromPTXFile( ptx_path, "rtpass_miss" ) );

"rtpass" is an "entry_point_index" (enum "ProgramEnum")
BUT: function void setMissProgram(unsigned int ray_type_index, Program program); (in file optixpp_namespace.h in optixu sub folder: C:\ProgramData\NVIDIA Corporation\OptiX SDK 5.0.0\include\optixu) requests a "ray_type_index"
rtpass is no "ray_type_index"; however its 0 anyway. So within this sample it makes no difference whether its an "entry point index" or a "ray type index". -

in optix_host.h:
[..] rtContextSetMissProgram sets a context's miss program associated with ray type.[..]
ray_type_index : The ray type the program will be associated with[..]

So using rtpass as "ray_type_index" is not 100% logical, isn't it?

Disclaimer: No warranty. No legal advice.

#5
Posted 01/12/2018 10:40 PM   
I've pruned your changes down to get an isolated implementation of post-processing stages applied to the photon mapper example explicitly. The two main problems in your code were: 1.) You must not change the original number of entry points and ray types for the photon mapper algorithm. 2.) The post-processing CommandLists were setup with the incorrect entry point for the photon mapper. Means in the command list setup the append() of OptiX launches need to be done with the "gather" entry point (== 2) which does the photon map rendering. Please note that resizing is broken with those changes and fails with an illegal address error. Possibly because of the hardcoded width and height defines. I didn't look into that. You're right, the miss program is per ray type and the proper index in the photon mapper example code should have been "rtpass_ray_type" (== 0). It happened to work because the entrypoint index "rtpass" was 0 as well. If you have generic CUDA launch issues, the first step is to check if that happens with newer drivers as well, then follow recommendations in my earlier forum posts where I described debugging by using the exception program and enabled exceptions, rtThrow with user defined codes, and then rtPrintf to see if anything can be detected by OptiX already before CUDA reports a failure.
I've pruned your changes down to get an isolated implementation of post-processing stages applied to the photon mapper example explicitly.

The two main problems in your code were:
1.) You must not change the original number of entry points and ray types for the photon mapper algorithm.
2.) The post-processing CommandLists were setup with the incorrect entry point for the photon mapper.
Means in the command list setup the append() of OptiX launches need to be done with the "gather" entry point (== 2) which does the photon map rendering.

Please note that resizing is broken with those changes and fails with an illegal address error. Possibly because of the hardcoded width and height defines. I didn't look into that.


You're right, the miss program is per ray type and the proper index in the photon mapper example code should have been "rtpass_ray_type" (== 0). It happened to work because the entrypoint index "rtpass" was 0 as well.

If you have generic CUDA launch issues, the first step is to check if that happens with newer drivers as well, then follow recommendations in my earlier forum posts where I described debugging by using the exception program and enabled exceptions, rtThrow with user defined codes, and then rtPrintf to see if anything can be detected by OptiX already before CUDA reports a failure.

#6
Posted 01/15/2018 10:58 AM   
Detlef, thank you very much! Great answer. Now it works! I obviously did not recognize the meaning of "appendLaunch" the right way. Now its clear. The post-processing stage does actually the launch. Doing the launch twice was incorrect in my code. Thank you again. The final output is a bit brigther when Denoiser is active. I tried to change Gamma Correction, which seems to be ok. Reducing Exposure to 0.75f or 0.25f changes it a bit, but the result is sort of "washed out". I changed saturation and contrast to come near to the expected one. (see [b]ColorContrastAndSaturation.jpg[/b] in attachment) This brightening does not occur, when denoising is applied on the path tracer "glass" advanced sample. There exposure 1.0f gives the expected result. So I think I need another kernel launch after the in-built densoier-post-production for executing a Brightness/Saturation/Contrast post-processing after the denoising. I have successfully combined a kernel for an output- and depth-buffer reset, the modified MeshViewer "diffuse raytracer" and the Photon Mapper sample in a way that both scenes are "hybrid-rendered" on same output_buffer. See [b]CombinedRenderRayTracerAndPhotonMapping.jpg[/b] in attachment. The small "boat" is a diffuse object within the photon mapper; that is done ok; but the big boat has some noise on the edges between the photon mapper and the diffuse raytracing. So on post-production these should be also blurred with the denoiser. After that denoising then I want to add another kernel for Brightness/Saturation/Contrast. That last one yet I did not create, cause there is another problem: When I add the denoiser the following exception occurs: OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned [b](716): Misaligned address)[/b]' To reproduce this exception, simply insert the files of "[b]TestCombined.zip[/b]" in attachment [b]REMOVED[/b] [i]optixProgressivePhotonMap.cpp Denoising.h DiffuseRayTracer.h[/i] into the original optix photon map advanced sample. and put the .cu and all other files in the "toEXE" sub folder into the same folder, where the EXE is located. NOTE: in my application also the photon mapper uses the depth handling, which is not in this .cu files; however, that only changes the visual appearance, but the exception is the same also without it. See the video MisAligned_Exception.mp4 Is this the tone mapper related exception you referred to [b][i]"The tonemappe has a known bug where not having a standard launch throws an exception on a pinned memory assertion."[/i][/b] I used for all the tests Driver: 388.71 will try 390.65 soon. Currently I commented all rtprintf() functions and now use the workaround of this post [url]https://devtalk.nvidia.com/default/topic/545986/optix/bug-in-optix-when-using-exceptions-/post/5038050/#5038050[/url] One time this instead of an entire system freeze gave me a Windows blue screen STOP eror, but in nearly all other cases it really only quits the application. the "updateEntryPointCount"... macros don't assign the value in any case as overwrite, they only assign them, when the "x" parameter value is higher than current, which ensures, that always the highest value is assigned. On start the initial values are all zero, so that they are at least one time assigned. I still use them and all works great now.
Detlef, thank you very much!
Great answer. Now it works!
I obviously did not recognize the meaning of "appendLaunch"
the right way.
Now its clear. The post-processing stage does actually
the launch. Doing the launch twice was incorrect in my code.
Thank you again.

The final output is a bit brigther when Denoiser is active.
I tried to change Gamma Correction, which seems to be ok.
Reducing Exposure to 0.75f or 0.25f changes it a bit, but the
result is sort of "washed out".
I changed saturation and contrast to come near to the expected one.
(see ColorContrastAndSaturation.jpg in attachment)

This brightening does not occur, when denoising is applied on
the path tracer "glass" advanced sample. There exposure 1.0f gives the expected
result.

So I think I need another kernel launch after the in-built densoier-post-production for executing a
Brightness/Saturation/Contrast post-processing after the denoising.



I have successfully combined a kernel for an output- and depth-buffer reset, the modified MeshViewer "diffuse raytracer" and the Photon Mapper sample in a way that both scenes are "hybrid-rendered" on same output_buffer.
See CombinedRenderRayTracerAndPhotonMapping.jpg in attachment.
The small "boat" is a diffuse object within the photon mapper; that is done ok; but the big boat has some noise on the edges between the photon mapper and the diffuse raytracing. So on post-production these should be also blurred with the denoiser.
After that denoising then I want to add another kernel for Brightness/Saturation/Contrast.
That last one yet I did not create, cause there is another problem:

When I add the denoiser the following exception occurs:
OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception:
Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)'

To reproduce this exception, simply insert the files of "TestCombined.zip" in attachment REMOVED
optixProgressivePhotonMap.cpp
Denoising.h
DiffuseRayTracer.h

into the original optix photon map advanced sample.
and put the .cu and all other files in the "toEXE" sub folder into the same folder, where the EXE is located.
NOTE: in my application also the photon mapper uses the depth handling, which is not in this .cu files; however, that only changes the visual appearance, but the exception is the same also without it.
See the video MisAligned_Exception.mp4

Is this the tone mapper related exception you referred to
"The tonemappe has a known bug where not having a standard launch throws an exception on a pinned memory assertion."



I used for all the tests Driver: 388.71 will try 390.65 soon.
Currently I commented all rtprintf() functions and now use the workaround of this post
https://devtalk.nvidia.com/default/topic/545986/optix/bug-in-optix-when-using-exceptions-/post/5038050/#5038050
One time this instead of an entire system freeze gave me a Windows blue screen STOP eror,
but in nearly all other cases it really only quits the application.



the "updateEntryPointCount"... macros don't assign the value in any case as overwrite,
they only assign them, when the "x" parameter value is higher than current, which ensures, that always the highest value is assigned. On start the initial values are all zero, so that they are at least one time assigned. I still use them and all works great now.

Disclaimer: No warranty. No legal advice.

#7
Posted 01/15/2018 09:32 PM   
Let's try some dry analysis based in the attached images alone. My guess is that the washed out result of the minimal code example inside the tonemapper is most likely just gamma related. Either the tonemapper doesn't need to apply the gamma of 2.2 or the display should or should not be done with sRGB enabled. Experiments to isolate that would be to keep the exposure at 1.0 and test what effect different gamma values have while checking what the final display routines do. [quote]"but the big boat has some noise on the edges between the photon mapper and the diffuse raytracing."[/quote]The frayed edges in the CombinedRenderRayTracerAndPhotonMapping.jpg look like depth fighting of intersecting surfaces to me. Hard to tell from a still image. If that's the case, that isn't noise from a Monte Carlo sampling but a geometric artifact which would also be present in your final rendering and nothing the denoiser is meant to handle. The necessary changes would need to happen to the scene geometry instead until that depth fighting is resolved. Also check the scene_epsilon which is set differently among the individual OptiX Advanced Samples. It's scene size dependent. It's meant to prevent self intersections esp. for shadow testing rays. Set it to the smallest possible value which does not show shadow artifacts from self intersections anymore. Mind that no notification gets sent by the forum when you edit posts. That only happens when posting something new.
Let's try some dry analysis based in the attached images alone.

My guess is that the washed out result of the minimal code example inside the tonemapper is most likely just gamma related.
Either the tonemapper doesn't need to apply the gamma of 2.2 or the display should or should not be done with sRGB enabled.
Experiments to isolate that would be to keep the exposure at 1.0 and test what effect different gamma values have while checking what the final display routines do.

"but the big boat has some noise on the edges between the photon mapper and the diffuse raytracing."
The frayed edges in the CombinedRenderRayTracerAndPhotonMapping.jpg look like depth fighting of intersecting surfaces to me. Hard to tell from a still image.
If that's the case, that isn't noise from a Monte Carlo sampling but a geometric artifact which would also be present in your final rendering and nothing the denoiser is meant to handle.
The necessary changes would need to happen to the scene geometry instead until that depth fighting is resolved.
Also check the scene_epsilon which is set differently among the individual OptiX Advanced Samples. It's scene size dependent. It's meant to prevent self intersections esp. for shadow testing rays. Set it to the smallest possible value which does not show shadow artifacts from self intersections anymore.

Mind that no notification gets sent by the forum when you edit posts. That only happens when posting something new.

#8
Posted 01/16/2018 05:35 PM   
In the attachment of this post all 4 combinations of Gamma Correction apply are shown in [B]GammaCorrectionTestsUpdated2.jpg[/B] (All on Exposure = 1.0f) The one with [Gamma 1.0f on ToneMapper and output buffer Gamma 2.2] and the one with [Gamma 2.2 on ToneMapper and output buffer Gamma 1.0] are similar, and compared to the not-denoised one, its not "washed out". You are right. But they are not bright enough... I finally set Exposure to 3.0 (that is somewhat similar; see my later posts) But even if Saturation/Contrast/Brightness adjustment is not anymore required, what did I do wrong when I want to launch additional kernels? I updated to driver 390.65 but same exception "Misaligned Address" occurs.
In the attachment of this post all 4 combinations of Gamma Correction apply
are shown in GammaCorrectionTestsUpdated2.jpg (All on Exposure = 1.0f)

The one with [Gamma 1.0f on ToneMapper and output buffer Gamma 2.2]
and the one with [Gamma 2.2 on ToneMapper and output buffer Gamma 1.0] are similar,
and compared to the not-denoised one, its not "washed out". You are right.
But they are not bright enough...

I finally set Exposure to 3.0 (that is somewhat similar; see my later posts)

But even if Saturation/Contrast/Brightness adjustment is not anymore required,
what did I do wrong when I want to launch additional kernels?



I updated to driver 390.65 but same exception "Misaligned Address" occurs.

Disclaimer: No warranty. No legal advice.

#9
Posted 01/17/2018 01:01 AM   
I updated the last post. and replaced the attachments due to an error. The correct gamma compare is GammaCorrectionTestsUpdated2.jpg And TestCombined3.zip is the corrected project [B]REMOVED[/B]
I updated the last post. and replaced the attachments due to an error.

The correct gamma compare is GammaCorrectionTestsUpdated2.jpg

And TestCombined3.zip is the corrected project REMOVED

Disclaimer: No warranty. No legal advice.

#10
Posted 01/17/2018 02:01 AM   
[quote="Detlef Roettger"]If you have generic CUDA launch issues, the first step is to check if that happens with newer drivers as well, then follow recommendations in my earlier forum posts where I described debugging by using the exception program and enabled exceptions, rtThrow with user defined codes, and then rtPrintf to see if anything can be detected by OptiX already before CUDA reports a failure.[/quote] it happens also with the newest driver 390.65 the exception is caused, when the diffuse-raytracer is launched the second time. the only difference to the other combined case (-r -pm) is, that an additional kernel is launched after the photon mapper. that BCS.cu file is very simple: it only reads the output_buffer and writes back to it. that shouldn't be a problem. Or did I something wrong with the buffer handling? When commenting the depth_buffer[launch_index] assignments in ClearBuffers.cu OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)' and app cannot be closed. only system power off and then power on did it. OS freezes when clicking on reboot. I also tried to chenge the buffer types to "input output" and even the GPU LOCAL flag. But exception on all cases yet.
Detlef Roettger said:If you have generic CUDA launch issues, the first step is to check if that happens with newer drivers as well, then follow recommendations in my earlier forum posts where I described debugging by using the exception program and enabled exceptions, rtThrow with user defined codes, and then rtPrintf to see if anything can be detected by OptiX already before CUDA reports a failure.


it happens also with the newest driver 390.65

the exception is caused, when the diffuse-raytracer is launched the second time. the only difference to the other combined case (-r -pm) is,
that an additional kernel is launched after the photon mapper. that BCS.cu file is very simple: it only reads the output_buffer and writes back to it. that shouldn't be a problem.
Or did I something wrong with the buffer handling?


When commenting the depth_buffer[launch_index] assignments in ClearBuffers.cu
OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)'
and app cannot be closed. only system power off and then power on did it.
OS freezes when clicking on reboot.


I also tried to chenge the buffer types to "input output" and even the GPU LOCAL flag. But exception on all cases yet.

Disclaimer: No warranty. No legal advice.

#11
Posted 01/17/2018 05:08 AM   
finally I found a somewhat similar result on denoising the photon mapper but what I am doing wrong on launching additional kernels? buffer handling? pay load conflicts? I want to read/write from/to several buffers for my depth handling.
finally I found a somewhat similar result on denoising the photon mapper

but what I am doing wrong on launching additional kernels?
buffer handling? pay load conflicts?
I want to read/write from/to several buffers for my depth handling.

Disclaimer: No warranty. No legal advice.

#12
Posted 01/17/2018 06:37 AM   
Sorry, there are too many hardcoded assumptions inside your test program code again. There are absolute paths to OptiX and CUDA SDK and some more. For example, the *.cu files didn't load from the executable working directory locations at runtime with the given getCuStringFromFile() edits and your instructions. Accumulation never progresses. I only get a black window after adjusting the hardcoded assumptions. I cannot exclude potential bugs inside the denoiser implementation, but I won't be able to look at any of your projects if they are not minimal and complete reproducers for just the failing case which live inside the OptiX SDK samples or OptiX Advanced Samples frameworks by using the same CMakeList.txt mechanism there. The other option to provide a reproducing test case would be an OptiX API Capture (OAC). Please see this post for instructions how to produce that: [url]https://devtalk.nvidia.com/default/topic/803116/?comment=4436953[/url] Again, the minimal failing case is all we need. The smaller the better.
Sorry, there are too many hardcoded assumptions inside your test program code again. There are absolute paths to OptiX and CUDA SDK and some more. For example, the *.cu files didn't load from the executable working directory locations at runtime with the given getCuStringFromFile() edits and your instructions. Accumulation never progresses. I only get a black window after adjusting the hardcoded assumptions.

I cannot exclude potential bugs inside the denoiser implementation, but I won't be able to look at any of your projects if they are not minimal and complete reproducers for just the failing case which live inside the OptiX SDK samples or OptiX Advanced Samples frameworks by using the same CMakeList.txt mechanism there.

The other option to provide a reproducing test case would be an OptiX API Capture (OAC).
Please see this post for instructions how to produce that:
https://devtalk.nvidia.com/default/topic/803116/?comment=4436953
Again, the minimal failing case is all we need. The smaller the better.

#13
Posted 01/17/2018 11:55 AM   
[quote="Detlef Roettger"]Sorry, there are too many hardcoded assumptions inside your test program code again.[/quote] sorry, that the project again did not work for you. I tried to provide source code only. I'll try to reduce more of the code to remove the dependencies. please post the changes you had to made for the hard-coded part. [quote="Detlef Roettger"] I only get a black window after adjusting the hardcoded assumptions.[/quote] does the console have this output? [i][b]0 launch buf clear 0 launch buf clear 0 launch buf clear 0 launch buf clear[/b][/i] ... if this actually is the case, the app successfully loaded all required data. NOTE: This test app (TestCombined with or without small_set changes) loads ALL data and only renders the selected ones based on your passed command line options It launches the ClearBuffers.cu program successfully and continues with next frame. This is the minimal test. Its only clearing the depth buffer and the output_buffer. In DirectX11 you would do that with ID3D11DeviceContext::ClearRenderTargetView and ID3D11DeviceContext::ClearDepthStencilView, but I did not find such a functino in OptiX 5.0.0, so I simply created my own CUDA program "ClearBuffers.cu" Now through adding command line parameters the other handling can be tested. So several tests can be done with same source code. to reproduce the issue command line params -r -pm -bcs must be used. [quote="Detlef Roettger"]The other option to provide a reproducing test case would be an OptiX API Capture (OAC). Please see this post for instructions how to produce that: [url]https://devtalk.nvidia.com/default/topic/803116/?comment=4436953[/url] Again, the minimal failing case is all we need. The smaller the better.[/quote] thank you for the suggestion of OAC: in the attachment is a zip with OAC [B]EDIT: FILE REMOVED[/B] of this run (TestCombined with small_set changes without denoiser): D:\TestBuildAdvancedOptiX\bin\Debug>set OPTIX_API_CAPTURE=1 D:\TestBuildAdvancedOptiX\bin\Debug>optixProgressivePhotonMap -r -pm -bcs top_groupDefault->getChildCount() = 1, launch rays OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)'
Detlef Roettger said:Sorry, there are too many hardcoded assumptions inside your test program code again.

sorry, that the project again did not work for you. I tried to provide source code only.
I'll try to reduce more of the code to remove the dependencies.

please post the changes you had to made for the hard-coded part.


Detlef Roettger said: I only get a black window after adjusting the hardcoded assumptions.


does the console have this output?
0 launch buf clear
0 launch buf clear
0 launch buf clear
0 launch buf clear

...
if this actually is the case, the app successfully loaded all required data.
NOTE: This test app (TestCombined with or without small_set changes) loads ALL data and only renders the selected ones based on your passed command line options
It launches the ClearBuffers.cu program successfully and continues with next frame. This is the minimal test.
Its only clearing the depth buffer and the output_buffer.
In DirectX11 you would do that with ID3D11DeviceContext::ClearRenderTargetView and
ID3D11DeviceContext::ClearDepthStencilView, but I did not find such a functino in OptiX 5.0.0, so I simply created my own CUDA program "ClearBuffers.cu"

Now through adding command line parameters the other handling can be tested.
So several tests can be done with same source code.

to reproduce the issue command line params -r -pm -bcs must be used.




Detlef Roettger said:The other option to provide a reproducing test case would be an OptiX API Capture (OAC).
Please see this post for instructions how to produce that:
https://devtalk.nvidia.com/default/topic/803116/?comment=4436953
Again, the minimal failing case is all we need. The smaller the better.


thank you for the suggestion of OAC:

in the attachment is a zip with OAC EDIT: FILE REMOVED of this run (TestCombined with small_set changes without denoiser):
D:\TestBuildAdvancedOptiX\bin\Debug>set OPTIX_API_CAPTURE=1
D:\TestBuildAdvancedOptiX\bin\Debug>optixProgressivePhotonMap -r -pm -bcs
top_groupDefault->getChildCount() = 1, launch rays OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (716): Misaligned address)'

Disclaimer: No warranty. No legal advice.

#14
Posted 01/17/2018 08:34 PM   
MiniTest.zip (in attachment) has hopefully no hard-coded dependencies. [B]REMOVED[/B] a test (with denoiser ON) caused a full system freeze: DefRay children = 1, launch rays OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)' (The system still works, but "optixProgressivePhotonMap.exe" cannot be closed. And the freeze occurs on try to reboot. This OAC was recorded: [B]EDIT: FILE REMOVED[/B] oac00002_denoiser.zip (NOTE: the OAC from previous post did it without denoiser on) if in file pinhole_camera.cu in line 110 a "return" is done, the exception does not occur. switch off denoiser by commenting #define DENOISER_TEST in GlobalHelpers.h obviously the problem is not dependend on the denoiser and not even on the photon mapper; it also occurs when a path tracer (glass) sample is used instead of the photon mapper; and also when no denoising happens (diffuse ray tracer not present in the test project in the attachment) NOTE: for the diffuse raytracer in this test sample project there's no visual output (all the diffuse ray tracer .cu files simply use pure "return" for test; so they are eliminated as the source of the exception)
MiniTest.zip (in attachment) has hopefully no hard-coded dependencies. REMOVED

a test (with denoiser ON) caused a full system freeze:
DefRay children = 1, launch rays OptiX Error: 'Unknown error (Details: Function "_rtContextLaunch2D" caught exception: Encountered a CUDA error: cudaDriver().CuMemcpyDtoHAsync( dstHost, srcDevice, byteCount, hStream.get() ) returned (700): Illegal address)'
(The system still works, but "optixProgressivePhotonMap.exe" cannot be closed. And the freeze occurs on try to reboot.

This OAC was recorded: EDIT: FILE REMOVED
oac00002_denoiser.zip

(NOTE: the OAC from previous post did it without denoiser on)


if in file pinhole_camera.cu in line 110 a "return" is done, the exception
does not occur.


switch off denoiser by commenting #define DENOISER_TEST in GlobalHelpers.h


obviously the problem is not dependend on the denoiser and not even
on the photon mapper; it also occurs when a path tracer (glass) sample
is used instead of the photon mapper; and also when no denoising happens
(diffuse ray tracer not present in the test project in the attachment)


NOTE: for the diffuse raytracer in this test sample project there's no
visual output (all the diffuse ray tracer .cu files simply use pure "return" for test;
so they are eliminated as the source of the exception)

Disclaimer: No warranty. No legal advice.

#15
Posted 01/17/2018 11:46 PM   
Scroll To Top

Add Reply