HEVC / AVC encoder capabilities

Hi,

We are considering a product design based on the TX2, and have some preliminary questions about the capabilities of the HEVC and AVC encoders.

Our application transmits live HD-quality video over networks that have rapidly varying throughput. It has real-time requirements, meaning it is not acceptable for overall playback latency to vary, so the video cannot pause to buffer.

As a result, our requirements from a video encoder (both HEVC and AVC) are as follows:

  1. Must support a real-time, low latency mode - i.e. a mode where the time required to encode a 1080p60 frame is less than the nominal frame duration (~16.7ms).
  2. While configured in this mode, it must also support:
  • Infinite GOP (i.e. IPPPPP... frame structure).
  • Requesting I-frames on demand.
  • Dynamically changing the bitrate without generating an I-frame. We potentially need to change bitrate as often as every frame in order to match the available network throughput.
  • Not exceeding the requested bitrate over every sliding window of frames equal to the frame rate. e.g. For 1080p60 and a requested bitrate of B, the average bitrate over every sliding window of 60 frames must not exceed B.
  • Setting a hard cap on the maximum size of an encoded frame (we need to adjust this cap at the same time that we dynamically change the bitrate).
  • Interlaced field encoding (1080i60)
  • Changing the encode resolution, framerate, and interlaced/progressive without having to destroy/restart the encoder.
  • The AVC encoder (while configured in a mode that satisfies all of #2), must have an R-D curve (where distortion is measured by a metric such as MS-SSIM) that equals or betters a best-in-class AVC encoder such as x264 that is configured the same way.
  • The HEVC encoder (while configured in a mode that satisfies all of #2), must have an R-D curve (again using MS-SSIM) that beats its own AVC performance by 30% - i.e. for the same distortion, bitrate should be at least 30% smaller.
  • Have an optional video preprocessing pipeline that can perform adaptive deinterlacing (1080i60 => 1080p60) and/or resizing prior to encoding.
  • Provide an API to add custom SEI NALUs to the encoded output (e.g. for closed captioning).
  • Can someone please confirm whether the HEVC and AVC encoders in the TX2 can meet the above requirements?

    Thanks,

    Hi dpks,
    We support gstreamer and tegra_multimedia_api. Please refer to the user guide:
    https://developer.nvidia.com/embedded/dlc/l4t-documentation-28-1

    Both HEVC/AVC decoder can achieve 1080p60, but the time between start encoding and receiving the first encoded frame are not within 16.7ms. In other words, initial delay is not within 16.7ms

    Please refer to the document. We don’t support interlace encoding. HEVC encoder cannot encode B frames.

    I think we don’t support it but would like to double confirm. How is it configured/enabled in x264? Any sample code for reference?

    For a h264 stream which is interlace encoded, it performs decoding and de-interlacing internally and outputs progressive frames.
    For a camera source, de-interlacing is not supported.

    It is not supported, but it can be implemented in app by attaching SEI NALUs to encoded h264 frames.

    Hi DaneLLL,

    Thanks for the reply - followup questions below:

    OK - does that initial delay also occur for any of the operations I listed in the original question (i.e. I-frame on demand, bitrate change, change of encode resolution/framerate)?

    I looked at the document, but it doesn’t give definitive answers for most of my questions. Can you clarify the following:

    It looks like setting num-B-Frames=0 and iframeinterval might be usable for this, but the doc doesn’t say anything about the valid range for iframeinterval. Is there a valid value for iframeinterval that results in infinite GOP?

    It looks like vbv-size is the option that would control this, but the description isn’t specific - can you confirm that: a) It can be used to limit the max size of any one encoded frame; and b) That the value of vbv-size can be changed dynamically, every time we change the bitrate?

    I don’t see anything in the doc that answers the above items.

    An R-D curve (rate-distortion curve) is just a graph where the X-axis is bitrate and the Y-axis is distortion (as measured by an objective metric such as MS-SSIM). It is visual way to show the quality of a video encoder over a range of bitrates.

    Essentially what I’m asking for is an objective comparison that demonstrates the TX2’s AVC encoder generates video quality that is as good as (or better) than a best-in-class AVC encoder like x264, and that the TX2 HEVC encoder is at least 30% better than its AVC encoder.

    Hi dpks,
    Below is encoder property:

    nvidia@tegra-ubuntu:~$ gst-inspect-1.0 omxh264enc
    
    (......skip)
    
    Element Properties:
      name                : The name of the object
                            flags: readable, writable
                            String. Default: "omxh264enc-omxh264enc0"
      parent              : The parent of the object
                            flags: readable, writable
                            Object of type "GstObject"
      control-rate        : Bitrate control method
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOMXVideoEncControlRate" Default: 1, "variable"
                               (0): disable          - Disable
                               (1): variable         - Variable
                               (2): constant         - Constant
                               (3): variable-skip-frames - Variable Skip Frames
                               (4): constant-skip-frames - Constant Skip Frames
      bitrate             : Target bitrate
                            flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4000000
      quant-i-frames      : Quantization parameter for I-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      quant-p-frames      : Quantization parameter for P-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      quant-b-frames      : Quantization parameter for B-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      iframeinterval      : Encoding Intra Frame occurance frequency
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 0
      SliceIntraRefreshEnable: Enable Slice Intra Refresh while encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      SliceIntraRefreshInterval: Set SliceIntraRefreshInterval
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 60
      bit-packetization   : Whether or not Packet size is based upon Number Of bits
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      vbv-size            : virtual buffer size = vbv-size * (bitrate/fps)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 30 Default: 10
      temporal-tradeoff   : Temporal Tradeoff value for encoder
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOmxVideoEncTemporalTradeoffType" Default: 0, "Do not drop frames"
                               (0): Do not drop frames - GST_OMX_VIDENC_DROP_NO_FRAMES
                               (1): Drop 1 in 5 frames - GST_OMX_VIDENC_DROP_1_IN_5_FRAMES
                               (2): Drop 1 in 3 frames - GST_OMX_VIDENC_DROP_1_IN_3_FRAMES
                               (3): Drop 1 in 2 frames - GST_OMX_VIDENC_DROP_1_IN_2_FRAMES
                               (4): Drop 2 in 3 frames - GST_OMX_VIDENC_DROP_2_IN_3_FRAMES
      EnableMVBufferMeta  : Enable Motion Vector Meta data for encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      qp-range            : Qunatization range for P and I frame,
                             Use string with values of Qunatization Range
                             in MinQpP-MaxQpP:MinQpI-MaxQpP:MinQpB-MaxQpB order, to set the property.
                            flags: readable, writable
                            String. Default: "-1,-1:-1,-1:-1,-1"
      MeasureEncoderLatency: Enable Measure Encoder latency Per Frame
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      EnableTwopassCBR    : Enable two pass CBR while encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      preset-level        : HW preset level for encoder
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOMXVideoEncHwPreset" Default: 0, "UltraFastPreset"
                               (0): UltraFastPreset  - UltraFastPreset for high perf
                               (1): FastPreset       - FastPreset
                               (2): MediumPreset     - MediumPreset
                               (3): SlowPreset       - SlowPreset
      EnableStringentBitrate: Enable Stringent Bitrate Mode
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      insert-sps-pps      : Insert H.264 SPS, PPS at every IDR frame
                            flags: readable, writable
                            Boolean. Default: false
      num-B-Frames        : Number of B Frames between two reference frames (not recommended)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 2 Default: 0
      slice-header-spacing: Slice Header Spacing number of macroblocks/bits in one packet
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Long. Range: 0 - 18446744073709551615 Default: 0
      profile             : Set profile for encode
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOmxVideoEncProfileType" Default: 1, "baseline"
                               (1): baseline         - GST_OMX_VIDENC_BASELINE_PROFILE
                               (2): main             - GST_OMX_VIDENC_MAIN_PROFILE
                               (8): high             - GST_OMX_VIDENC_HIGH_PROFILE
      insert-aud          : Insert H.264 Access Unit Delimiter(AUD)
                            flags: readable, writable
                            Boolean. Default: false
    
    Element Actions:
      "force-IDR" :  void user_function (GstElement* object);
    nvidia@tegra-ubuntu:~$
    

    No, it does not happen in I-frame on demand and bitrate change,change of encode framerate.
    Change of encode resolution needs to destroy/restart encoder.

    Set iframeinterval=-2

    It shall be supported in next release

    Here is a sample code about I-frames on demand:
    https://devtalk.nvidia.com/default/topic/1020558/jetson-tx1/h265-decode-failed/post/5196041/#5196041
    It also can be applied to change bitrate on the fly

    Do you mean for 1080p60 6Mbps, each frame must be with 0.1Mbits?

    We don’t support interlace encoding.
    To change resolution, it needs to destroy/restart the encoder.

    Hi DaneLLL,

    Thanks, can you also show the output for omxh265enc?

    Do you have an approximate timeframe for the next release? Also, what will be the name of the option (so that we know what to keep an eye out for)?

    To clarify that last point, when dynamically changing the bitrate, we don’t want an I-frame to be generated. We’ve seen other encoders where an I-frame is generated every time the bitrate is changed.

    No, I mean that if I look at every sliding window of 60 frames over the entire stream of output, the sum of the sizes of all frames in each of those windows does not exceed 6Mbps.

    Do you have an answer to this question?

    Hi dpks,

    nvidia@tegra-ubuntu:~$ gst-inspect-1.0 omxh265enc
    (skip.....)
    Element Properties:
      name                : The name of the object
                            flags: readable, writable
                            String. Default: "omxh265enc-omxh265enc0"
      parent              : The parent of the object
                            flags: readable, writable
                            Object of type "GstObject"
      control-rate        : Bitrate control method
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOMXVideoEncControlRate" Default: 1, "variable"
                               (0): disable          - Disable
                               (1): variable         - Variable
                               (2): constant         - Constant
                               (3): variable-skip-frames - Variable Skip Frames
                               (4): constant-skip-frames - Constant Skip Frames
      bitrate             : Target bitrate
                            flags: readable, writable, changeable in NULL, READY, PAUSED or PLAYING state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4000000
      quant-i-frames      : Quantization parameter for I-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      quant-p-frames      : Quantization parameter for P-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      quant-b-frames      : Quantization parameter for B-frames (0xffffffff=component default)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 4294967295
      iframeinterval      : Encoding Intra Frame occurance frequency
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 0
      SliceIntraRefreshEnable: Enable Slice Intra Refresh while encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      SliceIntraRefreshInterval: Set SliceIntraRefreshInterval
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 4294967295 Default: 60
      bit-packetization   : Whether or not Packet size is based upon Number Of bits
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      vbv-size            : virtual buffer size = vbv-size * (bitrate/fps)
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Integer. Range: 0 - 30 Default: 10
      temporal-tradeoff   : Temporal Tradeoff value for encoder
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOmxVideoEncTemporalTradeoffType" Default: 0, "Do not drop frames"
                               (0): Do not drop frames - GST_OMX_VIDENC_DROP_NO_FRAMES
                               (1): Drop 1 in 5 frames - GST_OMX_VIDENC_DROP_1_IN_5_FRAMES
                               (2): Drop 1 in 3 frames - GST_OMX_VIDENC_DROP_1_IN_3_FRAMES
                               (3): Drop 1 in 2 frames - GST_OMX_VIDENC_DROP_1_IN_2_FRAMES
                               (4): Drop 2 in 3 frames - GST_OMX_VIDENC_DROP_2_IN_3_FRAMES
      EnableMVBufferMeta  : Enable Motion Vector Meta data for encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      qp-range            : Qunatization range for P and I frame,
                             Use string with values of Qunatization Range
                             in MinQpP-MaxQpP:MinQpI-MaxQpP:MinQpB-MaxQpB order, to set the property.
                            flags: readable, writable
                            String. Default: "-1,-1:-1,-1:-1,-1"
      MeasureEncoderLatency: Enable Measure Encoder latency Per Frame
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      EnableTwopassCBR    : Enable two pass CBR while encoding
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      preset-level        : HW preset level for encoder
                            flags: readable, writable, changeable only in NULL or READY state
                            Enum "GstOMXVideoEncHwPreset" Default: 0, "UltraFastPreset"
                               (0): UltraFastPreset  - UltraFastPreset for high perf
                               (1): FastPreset       - FastPreset
                               (2): MediumPreset     - MediumPreset
                               (3): SlowPreset       - SlowPreset
      EnableStringentBitrate: Enable Stringent Bitrate Mode
                            flags: readable, writable, changeable only in NULL or READY state
                            Boolean. Default: false
      slice-header-spacing: Slice Header Spacing number of macroblocks/bits in one packet
                            flags: readable, writable, changeable only in NULL or READY state
                            Unsigned Long. Range: 0 - 18446744073709551615 Default: 0
      insert-aud          : Insert H.265 Access Unit Delimiter(AUD)
                            flags: readable, writable
                            Boolean. Default: false
    
    Element Actions:
      "force-IDR" :  void user_function (GstElement* object);
    

    We are working on it. The property is named ‘peak-bitrate’, which is valid in VBR mode.

    In our implementation, inserting I-frame and changing bitrate are two individual functions.

    It should work in CBR mode. Encoder generate bitrate in the range of +/-5% range for CBR mode. For VBR, max bitrate setting will limit the requested bitrate as asked. But it could generate much lower bitrate. Prefer using CBR mode by setting Bitrate = 90% of the desired birate.

    We don’t have R-D curve for public reference.

    Hi DaneLLL,

    The setting named ‘peak-bitrate’ implies that the units are bits/sec? If so, over what time period is the limit enforced? We are looking for it to be enforced on a per-frame basis, which is what the same setting on other encoders does.

    Even if you can’t share a full curve, can you at least share the min/max range of expected bitrate improvement we can expect with the TX2 HEVC encoder, as compared to its AVC encoder, when both are configured for low latency/realtime?

    Do you mean for 1080p60 6Mbps, if peak-bitrate is set to 9Mbps, each frame must be with 0.15Mbits?

    Comparison of AVC and HEVC encoders
    [input stream]
    https://media.xiph.org/video/derf/y4m/park_joy_1080p50.y4m
    [configuration]
    Hw preset level = FastPreset

    [input stream]
    https://media.xiph.org/video/derf/y4m/pedestrian_area_1080p25.y4m
    [configuration]
    Hw preset level = FastPreset

    External Media

    No, I mean other encoders use units of “bits” (or bytes) for this setting, rather than bits/sec, and the setting is named something like “MaxFrameSize”. So for 1080p60 6Mbps, we can set the MaxFrameSize to something like 50 kilobytes, and the encoder then satisfies the conditions:

    1. No single frame exceeds 50 kilobytes in size
    2. Over every sliding window of 60 frames, the sum of all frames within each window does not exceed 6Mbps

    Excellent, thanks. Since PSNR has poor correlation with how humans perceive video quality, do you have the same table using MS-SSIM as the metric rather than PSNR? If not, can you provide the encoded video files at each bitrate for download, and I can calculate it myself?

    Hi dpks,

    This is not supported. peak-bitrate is in bits/sec

    You may do this by running

    gst-launch-1.0 filesrc location= pedestrian_area_1080p25.y4m ! y4mdec ! omxh264enc preset-level=1 bitrate=16000000 ! qtmux ! filesink location=test.mp4
    

    And set bitrate 4M, 6M, 8M, 10M, 12M, 14M, 16M for all cases.

    We wish you can have a TX1 or TX2 developer kit and try these suggestions.

    Hi DaneLLL,

    Sure, but then it goes back to my original question - over what time period is the peak enforced? Or to ask it another way, if bitrate=6Mbps and peak-bitrate=9Mbps, what is the worst case in terms of frame sizes that I can expect for a sequence of 60 frames?

    Hi dpks,

    The functionality is not supported in current release(r28.1), so you may wait and try in next release.