Direct3D9 rendering of Decoded Frames without Device to Host copy

Dear All,

Is it possible to render a decoded surface using Direct3D9 directly from the GPU memory, without mapping and copying it back to CPU memory ? Idea is to save the Device to Host copy latency.

any help will be appreciated.

Regards
Paul

Yes, it is possible to do that and the sample apps demonstrate it!
With reference to NVDecodeD3D9,
You need to Create a D3D ARGB texture to render. ImageDX constructor where the texture is created.
You need to map the texture to be accessed by CUDA. Refer ImageDX::map() function which maps the D3D texture for use by CUDA.
Decoded output is NV12 and the to be rendered texture is ARGB. So run a kernel doing the conversion(nv12toargb). Look at function cudaPostProcessFrame.
Use the texture for rendering into a window.