vDWS Output Bandwidth to End User

In a document published by Nvidia titled, “NVIDIA QUADRO VIRTUAL DATA CENTER WORKSTATION” (searchable file name 185532_Nvidia_Quadro_vDWS_SolutionOverview_NV_US_WEB.pdf) it is stated on Page 3 under heading NVIDIA Quadro vDWS Features that the Maximum Hardware Rendered Display is Four 4K at 4096x2160 resolution.

When that information is processed by the Quadro vDWS and pushed out to the user, whether local or remote, what is the bandwidth of the Four 4K data after it leaves the server? What refresh rate? And how is that data measured… i.e. pps (packets per second)?

NVIDIA finally tell the truth about “Turing” generation encoder (“Pascal” is about 50%-100% faster in low latency scenarios due to two encoders on chip, see https://developer.nvidia.com/nvidia-video-codec-sdk hidden under “Additional Performance Results”):



FYI: Be careful of new “NVIDIA vGPU Software 10” (with Linux 440.43 driver). It is 8k resolution release with “NVIDIA engineered” limits (see https://gridforums.nvidia.com/default/topic/258/nvidia-virtual-gpu-technology/documentation-for-vgpu-configs/post/16127/#16127). There are “changes” in low latency encoder behavior. The problem is with low-bandwidth and low-framerate transfers. For example if you press “any key” the output can be delayed up to 12 frames in decoder (this is 2 seconds with 6 FPS) (tested on RaspebrryPI hardware OMX decoder). This is very bad UX !

Older drivers marks NVenc encoded h264 (with NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID) with following SPS/VUI (from video stream analyzer):

...
 <b>num_ref_frames : 1 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 1 </b>
   motion_vectors_over_pic_boundaries_flag : 1 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 1 
   log2_max_mv_length_vertical : 1 
   num_reorder_frames : 0 
   <b>max_dec_frame_buffering : 1 </b>
...

New drivers with the same binary using NVenc:

...
 <b>num_ref_frames : 3 </b>
 vui_parameters_present_flag : 1 
...
 <b>bitstream_restriction_flag : 0</b> 
   motion_vectors_over_pic_boundaries_flag : 0 
   max_bytes_per_pic_denom : 0 
   max_bits_per_mb_denom : 0 
   log2_max_mv_length_horizontal : 0 
   log2_max_mv_length_vertical : 0 
   num_reorder_frames : 0 
   max_dec_frame_buffering : 0 
...

So decoder is allowed to buffer output of decoded frames. RaspberryPI uses 9-12 additional buffer frames and this is not modifiable with OMX_IndexParamImagePoolSize !

Now you must explicitly enable “bitstreamRestrictionFlag” and set “numRefL0” on encoder side to rollback to old low-latency encoder-decoder behavior (and use headers from new CodecSDK).

...encodeCodecConfig.h264Config.h264VUIParameters.bitstreamRestrictionFlag = 1;
...encodeCodecConfig.h264Config.numRefL0 = NV_ENC_NUM_REF_FRAMES_1;

PF 2020 !