Unordered Access Views and poor performance with SLI Poor performance with UAV and SLI.
This question was posted last week under SLI, but I don't think it was the correct location to post, so I'm going to try GPU computing.

I'm currently writing some code to work with UAV's for order independent transparency.
I've noticed that binding the UAV's in dx11 using OMSetRenderTargetsAndUnorderedAccessViews with SLI enabled and windowed mode on causes significant performance degradation.
Judging from the SLI developer documentation from 3 years ago, I am guessing that I have to clear the UAV's to some magical value in order to prevent them being copied over the sli bridge?
I have tried clearing the uav's to 0 and MAX_UINT but the performance problem remains.

Is there a magical value I have to clear the UAV's to?
Is this a known dx11 SLI bug?
Will the developer team be issuing a new "best practices guide" for SLI and the new fermi class hardware?

Is there somewhere else I should be posting this?

Hardware:
2x GTX480 (258.96 driver release)
Intel 980x (Stock)
64bit windows 7

Thanks.

matt
This question was posted last week under SLI, but I don't think it was the correct location to post, so I'm going to try GPU computing.



I'm currently writing some code to work with UAV's for order independent transparency.

I've noticed that binding the UAV's in dx11 using OMSetRenderTargetsAndUnorderedAccessViews with SLI enabled and windowed mode on causes significant performance degradation.

Judging from the SLI developer documentation from 3 years ago, I am guessing that I have to clear the UAV's to some magical value in order to prevent them being copied over the sli bridge?

I have tried clearing the uav's to 0 and MAX_UINT but the performance problem remains.



Is there a magical value I have to clear the UAV's to?

Is this a known dx11 SLI bug?

Will the developer team be issuing a new "best practices guide" for SLI and the new fermi class hardware?



Is there somewhere else I should be posting this?



Hardware:

2x GTX480 (258.96 driver release)

Intel 980x (Stock)

64bit windows 7



Thanks.



matt

GPU: EVGA GTX690. Factory clock.
CPU: i7-3970X. Factory clock.
Memory: 32 GB
OS: Windows 7 Ultimate 64 Bit
Audio: Realtek HD Elcheapo.

#1
Posted 09/08/2010 03:56 PM   
This question was posted last week under SLI, but I don't think it was the correct location to post, so I'm going to try GPU computing.

I'm currently writing some code to work with UAV's for order independent transparency.
I've noticed that binding the UAV's in dx11 using OMSetRenderTargetsAndUnorderedAccessViews with SLI enabled and windowed mode on causes significant performance degradation.
Judging from the SLI developer documentation from 3 years ago, I am guessing that I have to clear the UAV's to some magical value in order to prevent them being copied over the sli bridge?
I have tried clearing the uav's to 0 and MAX_UINT but the performance problem remains.

Is there a magical value I have to clear the UAV's to?
Is this a known dx11 SLI bug?
Will the developer team be issuing a new "best practices guide" for SLI and the new fermi class hardware?

Is there somewhere else I should be posting this?

Hardware:
2x GTX480 (258.96 driver release)
Intel 980x (Stock)
64bit windows 7

Thanks.

matt
This question was posted last week under SLI, but I don't think it was the correct location to post, so I'm going to try GPU computing.



I'm currently writing some code to work with UAV's for order independent transparency.

I've noticed that binding the UAV's in dx11 using OMSetRenderTargetsAndUnorderedAccessViews with SLI enabled and windowed mode on causes significant performance degradation.

Judging from the SLI developer documentation from 3 years ago, I am guessing that I have to clear the UAV's to some magical value in order to prevent them being copied over the sli bridge?

I have tried clearing the uav's to 0 and MAX_UINT but the performance problem remains.



Is there a magical value I have to clear the UAV's to?

Is this a known dx11 SLI bug?

Will the developer team be issuing a new "best practices guide" for SLI and the new fermi class hardware?



Is there somewhere else I should be posting this?



Hardware:

2x GTX480 (258.96 driver release)

Intel 980x (Stock)

64bit windows 7



Thanks.



matt

GPU: EVGA GTX690. Factory clock.
CPU: i7-3970X. Factory clock.
Memory: 32 GB
OS: Windows 7 Ultimate 64 Bit
Audio: Realtek HD Elcheapo.

#2
Posted 09/08/2010 03:56 PM   
DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.

It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.
DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.



It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.

#3
Posted 09/09/2010 08:31 AM   
DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.

It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.
DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.



It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.

#4
Posted 09/09/2010 08:31 AM   
[quote name='Simon Green' post='1114832' date='Sep 9 2010, 08:31 AM']DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.

It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.[/quote]

Hi Simon,
Thanks for the response. This is actually quite easy to reproduce.
Yakiimo3d [url="http://www.yakiimo3d.com/"]http://www.yakiimo3d.com/[/url] has a demo with source
[url="http://yakiimo3d.codeplex.com/releases/view/49570"]http://yakiimo3d.codeplex.com/releases/view/49570[/url] that can be used to demonstrate and maybe debug the problem.

Performance Results
Windowed 640x480 Disable SLI: 3000 fps
Windowed 640x480 AFR1 90fps
Windowed 640x480 AFR2 90fps
Fullscreen 2560x1600 AFR1 6.97 fps
Fullscreen 2560x1600 AFR2 590 fps

In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.
If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.
As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.

If I disable the UnorderedAccessView bind:
Windowed AFR1 1945fps
Windowed AFR2 1943fps
Windowed Disable SLI: 4500 fps
Fullscreen 2560x1600 Disable SLI 550fps
Fullscreen 2560x1600 AFR1 780fps
Fullscreen 2560x1600 AFR2 780fps

I hope that this helps.

> matt
[quote name='Simon Green' post='1114832' date='Sep 9 2010, 08:31 AM']DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.



It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.



Hi Simon,

Thanks for the response. This is actually quite easy to reproduce.

Yakiimo3d http://www.yakiimo3d.com/ has a demo with source

http://yakiimo3d.codeplex.com/releases/view/49570 that can be used to demonstrate and maybe debug the problem.



Performance Results

Windowed 640x480 Disable SLI: 3000 fps

Windowed 640x480 AFR1 90fps

Windowed 640x480 AFR2 90fps

Fullscreen 2560x1600 AFR1 6.97 fps

Fullscreen 2560x1600 AFR2 590 fps



In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.

If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.

As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.



If I disable the UnorderedAccessView bind:

Windowed AFR1 1945fps

Windowed AFR2 1943fps

Windowed Disable SLI: 4500 fps

Fullscreen 2560x1600 Disable SLI 550fps

Fullscreen 2560x1600 AFR1 780fps

Fullscreen 2560x1600 AFR2 780fps



I hope that this helps.



> matt

GPU: EVGA GTX690. Factory clock.
CPU: i7-3970X. Factory clock.
Memory: 32 GB
OS: Windows 7 Ultimate 64 Bit
Audio: Realtek HD Elcheapo.

#5
Posted 09/11/2010 05:09 PM   
[quote name='Simon Green' post='1114832' date='Sep 9 2010, 08:31 AM']DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.

It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.[/quote]

Hi Simon,
Thanks for the response. This is actually quite easy to reproduce.
Yakiimo3d [url="http://www.yakiimo3d.com/"]http://www.yakiimo3d.com/[/url] has a demo with source
[url="http://yakiimo3d.codeplex.com/releases/view/49570"]http://yakiimo3d.codeplex.com/releases/view/49570[/url] that can be used to demonstrate and maybe debug the problem.

Performance Results
Windowed 640x480 Disable SLI: 3000 fps
Windowed 640x480 AFR1 90fps
Windowed 640x480 AFR2 90fps
Fullscreen 2560x1600 AFR1 6.97 fps
Fullscreen 2560x1600 AFR2 590 fps

In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.
If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.
As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.

If I disable the UnorderedAccessView bind:
Windowed AFR1 1945fps
Windowed AFR2 1943fps
Windowed Disable SLI: 4500 fps
Fullscreen 2560x1600 Disable SLI 550fps
Fullscreen 2560x1600 AFR1 780fps
Fullscreen 2560x1600 AFR2 780fps

I hope that this helps.

> matt
[quote name='Simon Green' post='1114832' date='Sep 9 2010, 08:31 AM']DX Compute does work with SLI. SLI almost always works in AFR (alternate frame rendering) mode these days, so you should get a speed up as long as your frames are completely independent.



It's hard to say what your problem is without seeing the code. You can post here, or register as a developer and file a bug there.



Hi Simon,

Thanks for the response. This is actually quite easy to reproduce.

Yakiimo3d http://www.yakiimo3d.com/ has a demo with source

http://yakiimo3d.codeplex.com/releases/view/49570 that can be used to demonstrate and maybe debug the problem.



Performance Results

Windowed 640x480 Disable SLI: 3000 fps

Windowed 640x480 AFR1 90fps

Windowed 640x480 AFR2 90fps

Fullscreen 2560x1600 AFR1 6.97 fps

Fullscreen 2560x1600 AFR2 590 fps



In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.

If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.

As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.



If I disable the UnorderedAccessView bind:

Windowed AFR1 1945fps

Windowed AFR2 1943fps

Windowed Disable SLI: 4500 fps

Fullscreen 2560x1600 Disable SLI 550fps

Fullscreen 2560x1600 AFR1 780fps

Fullscreen 2560x1600 AFR2 780fps



I hope that this helps.



> matt

GPU: EVGA GTX690. Factory clock.
CPU: i7-3970X. Factory clock.
Memory: 32 GB
OS: Windows 7 Ultimate 64 Bit
Audio: Realtek HD Elcheapo.

#6
Posted 09/11/2010 05:09 PM   
[quote name='mdavidson' date='11 September 2010 - 10:09 AM' timestamp='1284224967' post='1115820']
Hi Simon,
Thanks for the response. This is actually quite easy to reproduce.
Yakiimo3d [url="http://www.yakiimo3d.com/"]http://www.yakiimo3d.com/[/url] has a demo with source
[url="http://yakiimo3d.codeplex.com/releases/view/49570"]http://yakiimo3d.codeplex.com/releases/view/49570[/url] that can be used to demonstrate and maybe debug the problem.

Performance Results
Windowed 640x480 Disable SLI: 3000 fps
Windowed 640x480 AFR1 90fps
Windowed 640x480 AFR2 90fps
Fullscreen 2560x1600 AFR1 6.97 fps
Fullscreen 2560x1600 AFR2 590 fps

In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.
If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.
As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.

If I disable the UnorderedAccessView bind:
Windowed AFR1 1945fps
Windowed AFR2 1943fps
Windowed Disable SLI: 4500 fps
Fullscreen 2560x1600 Disable SLI 550fps
Fullscreen 2560x1600 AFR1 780fps
Fullscreen 2560x1600 AFR2 780fps

I hope that this helps.

> matt
[/quote]
[quote name='mdavidson' date='11 September 2010 - 10:09 AM' timestamp='1284224967' post='1115820']

Hi Simon,

Thanks for the response. This is actually quite easy to reproduce.

Yakiimo3d http://www.yakiimo3d.com/ has a demo with source

http://yakiimo3d.codeplex.com/releases/view/49570 that can be used to demonstrate and maybe debug the problem.



Performance Results

Windowed 640x480 Disable SLI: 3000 fps

Windowed 640x480 AFR1 90fps

Windowed 640x480 AFR2 90fps

Fullscreen 2560x1600 AFR1 6.97 fps

Fullscreen 2560x1600 AFR2 590 fps



In my application I'm doing a much heavier render than this demo and I go from 90fps fullscreen 2560x1600 to 10fps windowed 640x480 with AFR2 and fractions of a frame (0.3 - 0.4) a second with AFR1.

If I disable SLI, I get about 110 fps in windowed with the same scene, but I am bound in other places at the moment too.

As I stated earlier, I think the performance regression can be tracked back to the call to SetRenderTargetsAndUndorderedAccessViews.



If I disable the UnorderedAccessView bind:

Windowed AFR1 1945fps

Windowed AFR2 1943fps

Windowed Disable SLI: 4500 fps

Fullscreen 2560x1600 Disable SLI 550fps

Fullscreen 2560x1600 AFR1 780fps

Fullscreen 2560x1600 AFR2 780fps



I hope that this helps.



> matt

#7
Posted 09/06/2011 04:35 AM   
Scroll To Top