Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Trends and Recent Developments in Video Coding Standardization

724 views

Published on

Authors: Jens-Rainer Ohm and Mathias Wien.
Slides of our tutorial at ICME 2018

Published in: Engineering
  • Be the first to comment

Trends and Recent Developments in Video Coding Standardization

  1. 1. Trends and Recent Developments in Video Coding Standardization ICME 2018 Tutorial, San Diego, 23.07.2018 Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  2. 2. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 2 1. Introduction and history of video coding standardization (Jens) 2. Source formats and resolutions (Mathias) 3. State of the art in video compression (Mathias) 4. Versatile Video Coding (Jens) 5. Exploratory trends and perspectives (Jens) 6. Coding tools for multi-camera captures (Jens) 7. Summary and outlook Outline
  3. 3. Part I: Introduction and history of video coding standardization ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  4. 4. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 4 Video coding standardization organisations • ISO/IEC MPEG = “Moving Picture Experts Group” (ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11) • ITU-T VCEG = “Video Coding Experts Group” (ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector (ITU-T, a United Nations Organization, formerly CCITT), Study Group 16, Working Party 3, Question 6) • JVT = “Joint Video Team” collaborative team of MPEG & VCEG, responsible for developing AVC (discontinued in 2009) • JCT-VC = “Joint Collaborative Team on Video Coding” team of MPEG & VCEG , responsible for developing HEVC (established January 2010) • JVET = “Joint Video Experts Team” exploring potential for new technology beyond HEVC (established Oct. 2015 as Joint Video Exploration Team, renamed Apr. 2018)
  5. 5. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 5 History of international video coding standardization (1985  2020) H.263/+/++ (1995-2000+) MPEG-4 Visual (1998-2001+) MPEG-1 (1993) ISO/IECITU-T H.120 (1984-1988) H.261 (1990+) H.262 / 13818-2 (1994/95-1998+) H.264 / 14496-10 AVC (2003-2018+) H.265 / 23008-2 HEVC (2013-2018+) Videotelephony Computer SD HD 4K UHD (Advanced Video Coding developed by JVT) (High Efficiency Video Coding developed by JCT-VC) (MPEG-2) H.26x / 23090-3 VVC (2020-...) 8K, 360, ... (Versatile Video Coding to be developed by JVET)
  6. 6. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 6 The scope of video standardization • Only Specifications of the Bitstream, Syntax, and Decoder are standardized: • Permits optimization beyond the obvious • Permits complexity reduction for implementability • Provides no guarantees of quality Pre-Processing Encoding Source Destination Post-Processing & Error Recovery Decoding Scope of Standard
  7. 7. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 7 Hybrid Coding Concept Basis of every standard since H.261
  8. 8. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 8 Input Signal Current Stage Used since early days of video compression standards, e.g. H.261, MPEG-1/-2/-4, H.263, AVS, H.264/AVC, HEVC and also in most proprietary codecs (VC1, VP8 etc.) Hybrid video coding concept
  9. 9. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 9 Input Signal DCT Hybrid video coding concept
  10. 10. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 10 QuantizedInput Signal DCT 010011101001… Hybrid video coding concept
  11. 11. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 11 QuantizedInput Signal DCT 010011101001… Inverse DCT Hybrid video coding concept
  12. 12. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 12 Next Input Signal Reconstruction vs. Hybrid video coding concept
  13. 13. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 13 Next Input Signal Reconstruction 010011101001… vs. Hybrid video coding concept
  14. 14. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 14 Input Signal MC Prediction Residual – = Residual w/o MC Hybrid video coding concept
  15. 15. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 15 Residual DCT Hybrid video coding concept
  16. 16. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 16 Residual DCT Quantized 010011101001… Hybrid video coding concept
  17. 17. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 17 Residual DCT Quantized Inverse DCT Hybrid video coding concept
  18. 18. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 18 Residual MC Prediction Reconstruction + = usw. Hybrid video coding concept
  19. 19. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 19 Performance history of standard generations 0 100 200 300 28 30 32 34 36 38 40 bit rate (kbit/s) PSNR (dB) Foreman 10 Hz, QCIF 100 frames HEVC AVC H.262/MPEG-2 H.261H.263 + MPEG-4 Visual JPEG 35 Bit-rate Reduction: 50%
  20. 20. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 20 • Improvements of motion compensation  Variable partitions & merged partitions  Flexible frame referencing & combined prediction  Sub-sample precision and high performance sub-sample interpolation  More efficient vector prediction & coding, supporting large vector ranges • Improvements of 2D coding  Efficient intra prediction and intra mode coding  Design of transform bases and variable transform block sizes • Loop filtering for artifact reduction  Deblocking, sample-adaptive offset • Improvements of entropy coding  Flexible binarization of syntax elements  Arithmetic coding  Adaptation and usage of context information • These are coupled with encoder optimization  Rate distortion optimization – spend bits where they give best benefit in terms of distortion reduction  Adaptive rate control and perceptually tuned quantization What made this happen over the years?
  21. 21. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 21 • Group of Picture (GoP) structures allowing random access (used since MPEG-1) • Bi-(directional) prediction for better compression performance (used since MPEG-1) Reference picture structures B B B B B B B previous picture references ...... 1 2 3 4 5 6 7 Uni-directional prediction I|P B B P B B P pre-previous picture references Bi-directional prediction ...... 1 2 3 4 5 6 7 I|P a b
  22. 22. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 22 • Hierarchical prediction structures for frame rate scalability and further improved compression performance (used in AVC and HEVC) Reference picture structures 1P I /P00I /P00 3P 3P3P 2P 3P3P 3P3P2P 2P 1P I /P002P 3P 1B I /P00I /P00 3B 3B3B 2B 3B3B 3B3B2B 2B 1B I /P002B 3B L prediction0 L prediction1 L prediction2 L prediction3 L prediction0 L prediction1 L prediction2 L prediction3 a b a
  23. 23. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 23  Coder control is a non-normative part of video codecs  Choose coding parameters at encoder side “What part of the video signal should be coded using what method and parameter settings?”  Constrained problem:  Unconstrained Lagrangian formulation:  l depends on slope of rate-distortion function:  Small value: High rate, low distortion  High value: Low rate, high distortion  Can be applied in motion parameter estimation, mode decision, transform coefficient quantization, … - typically set relationship between l and QP value D - Distortion R - Rate p - Parameter Vector  opt argmin ( ) ( )D Rl   p p p p opt Targetargmin ( ) s.t. ( )D R R  p p p p Coder control
  24. 24. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 24 • Video is continually increasing by resolution  HD existing, UHD (4Kx2K, 8Kx4K) appearing  Mobile services going towards HD/UHD  Stereo, multi-view, 360° video • Devices available to record and display ultra-high resolutions  Becoming affordable for home and mobile consumers • Video has multiple dimensions to grow the data rate  Frame resolution, Temporal resolution  Color resolution, bit depth  Multi-view  Visible distortion still an issue with existing networks • Necessary video data rate grows faster than feasible network transport capacities  Better video compression (than current HEVC) needed in next decade, even after availability of 5G Motivation for improved video compression
  25. 25. Part II: Source formats and resolutions ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  26. 26. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 26 • Sequence of pictures successively captured or rendered • Progressive and interlaced formats • Picture rate measured in pictures per second, unit Hertz (Hz) • Minimum picture rate at 24Hz for impression of fluent motion [Po12]  Standard Definition TV at 50/60Hz interlaced  High Definition (HD) video at 50/60Hz progressive  Ultra HD (UHD) video up to 120Hz  Up to 300Hz considered Structure of a Video Sequence [Po12] Charles Poynton. Digital Video and HD: Algorithms and Interfaces. Waltham, MA, USA: Morgan Kaufman Publishers, 2012.
  27. 27. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 27 • Picture  Set of arrays or a single array of samples with intensity values  Monochrome picture: single intensity array  Color video: usually three intensity arrays ⇒ three color components representing the color  Color sample (all three components) also referred to as a pixel (derived from picture element, sometimes also denoted as pel)  Optional alpha channel to indicate opaqueness (transparency) for mixing applications Pictures, Frames, and Fields
  28. 28. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 29 • Picture  Set of pixel lines, defined number of pixels per line  Shape of pixels not necessarily square, depends on picture format  Examples: Pixel Shape
  29. 29. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 30 • Human visual system less sensitive to color than to structure and texture ⇒ full resolution luma, lower resolution chroma • Chroma sub-sampling types commonly specified by relation between number of luma an chroma samples YCbCr Y : X1 : X2 • With Y: number of luma pixels • Sub-sampling format of chroma components specified by X1 and X 2 • X1 : horizontal sub-sampling • X2 = 0: vertical sub-sampling identical to horizontal sub-sampling • X2 = X1 : no vertical sub-sampling Chroma Sub-Sampling
  30. 30. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 31 • Color Impression  Visible range of spectrum range from 380 nm to 780 nm  Impression of color: intensity density distribution over the visible spectral range  Colors corresponding to single wavelength:  spectral colors or primary colors  Human visual system has three color receptors (cone cells)  Maximum sensitivity in the wavelength areas of red, green and blue  Additional ’gray-scale’ receptors (rod cells): responsive in low lighting conditions Representation of Color Picture source: Wikipedia, artwork by Holly Fischer
  31. 31. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 32 • Visual perception split into perception of brightness (light and dark) and chromaticity (color impression)  Brightness is driven by summarized intensity of observed spectrum  Color impression is driven by shape of intensity distribution • Functional expression to represent perceived color by a mathematical description first standardized in the CIE 1931 Standard Observer • Color as a point in a three-dimensional XYZ space • X,Y,Z values derived from the observed spectrum • Three color matching functions The CIE Standard Observer CIE: Commission internationale de l’éclairage, http://www.cie.co.at Standard Observer specified in ISO11664-1
  32. 32. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 33 • The CIE Standard Observer
  33. 33. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 34 • Normalization for expression of the chromaticity independent observed brightness • Since , therefore • Chromaticity specified by (x,y)-pair • Definition of a standardized white point, e.g. ’white C’, ’white D65’ The CIE Standard Observer [Po12] Charles Poynton. Digital Video and HD: Algorithms and Interfaces. Waltham, MA, USA: Morgan Kaufman Publishers, 2012. [Hu04] Robert G.W. Hunt. The Reproduction of Colour. 6th ed. Chichester, West Sussex, England: Whiley-VCH, 2004.
  34. 34. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 35 • Colour space  Standard Dynamic Range (SDR) video  Contrast approx. 1000 : 0  ITU-R BT.709 colour space  High Dynamic Range (HDR) video  Contrast approx. 1000000 : 0  ITU-R BT.2100 colour space Color Spaces: Standard and Hight Dynamic Range / Wide Color Gamut Figure from N1508: Ajay Luthra, Edouard Francois, and Walt Husak (Eds.). Requirements and Use Cases for HDR and WCG Content Coding. Doc. N15084. Geneva, CH, 111th meeting: MPEG, Feb. 2015. ITU-R BT.709: Parameter values for the HDTV standards for production and international programme exchange. ITU-R, Apr. 2004. URL: http://www.itu.int/rec/R-REC-BT.709/en . ITU-R BT.2020: Parameter values for ultra-high definition television systems for production and international programme exchange. ITU-R, Oct. 2015. URL : http://www.itu.int/rec/R-REC-BT.2020/en ITU-R BT.2100: Image parameter values for high dynamic range television for use in production and international programme exchange. ITU-R, Jun. 2017. URL: http://www.itu.int/rec/R-REC-BT.2100-1-201706-I/en
  35. 35. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 36 Color Spaces: Standard and Hight Dynamic Range / Wide Color Gamut Figure from: Ajay Luthra, Edouard Francois, and Walt Husak (Eds.). Requirements and Use Cases for HDR and WCG Content Coding. Doc. N15084. Geneva, CH, 111th meeting: MPEG, Feb. 2015.
  36. 36. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 37 HDR/WCG Conversion Practices: Scope ITU-T H Suppl. 15 | ISO/IEC TR 23008-14, Conversion and Coding Practices for HDR/WCG Y′CbCr 4:2:0 Video with PQ Transfer Characteristics. ITU-T H Suppl. 18 | ISO/IEC TR 23008-15, Signalling, backward compatibility and display adaptation for HDR/WCG video coding. Figure from: Jonatan Samuelsson et al.: Conversion and Coding Practices for HDR/WCG Y′CbCr 4:2:0 Video with PQ Transfer Characteristics (Draft 4). Doc. JCTVC-Z1017. 26th meeting, Geneva, CH: Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jan 2017.
  37. 37. Part III: State of the Art in Video Compression ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  38. 38. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 39 Comparison of HEVC and the Joint Exploration Test Model (JEM) of JVET • A glimpse on high-level syntax (HEVC) • Coding structures • Walk-through of the coding loop  Intra coding  Inter coding  Transform coding  Loop filters  Entropy coding Outline and Concept for Part III
  39. 39. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 40 • Coded Video Sequence (CVS)  Starts with a random access point (intra-coded picture)  One or more CVSs in a bitstream → Coded Video Sequence Group (CVSG) • Network Abstraction Layer (NAL)  Encapsulation of coded video sequence for transport and storage  Video coding layer (VCL) NAL units  Information directly for reconstruction of samples and pictures  Non-VCL NAL units  Parameter sets  Supplemental enhancement information  ... Network Abstraction Layer and Video Coding Layer
  40. 40. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 41 • RBSP: Raw byte sequence payload  Sequence of bytes comprising the coded NAL unit payload  RBSP stop bit (=’1’) plus zero bits for byte alignment • SODB: String of data bits  Concatenation of bits in the RBSP bytes from MSB to LSB  All bits needed for the decoding process  Only the bits needed for the decoding process NAL Unit Structure NAL unit header
  41. 41. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 47 • Blocks and Units  Block: Square or rectangular area in a color component array  Unit: Collocated blocks of the (three) color components, associated syntax elements and prediction data (e.g. motion vectors) • Picture partitioning  Coding Tree Blocks / Coding Tree Units (CTBs / CTUs)  Each CTU in exactly one slice segment  Independent slice segment: full header, independently decodable  Dependent slice segment: very short header, relies on corresponding independent slice, inherits CABAC state • Slice types  I-slice: Intra prediction only  P-slice: Intra prediction and motion compensation with one reference picture list  B-slice: Intra prediction and motion compensation with one or two reference picture lists HEVC Spatial Coding Structures CABAC: Context-based Adaptive Binary Arithmetic Coding
  42. 42. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 48 Tiles in HEVC • Change scanning order of CTBs in picture • Slices in tiles, or tiles in slices • Reset of prediction and entropy coding → parallel processing
  43. 43. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 49 • Maximum CTU size: 64×64 pixels • Quadtree partitioning of CTB into CBs • If picture size not integer multiple of CTB size:  Implicit CTB partitioning to meet picture size (must be multiple of 8×8 pixels) HEVC: Coding Tree Blocks and Coding Blocks (CBs)
  44. 44. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 50 • Prediction block partitioning of a 2N×2N CB • Transform block partitioning of a CB  Quadtree partitioning of CB → Residual Quad Tree (RQT)  Transform size 4×4 to 32×32  TB size 4×4 to 64×64  PB boundaries inside TBs allowed HEVC: Prediction Blocks (PBs) and Transform Blocks (TBs)
  45. 45. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 51 • QTBT structure removes concept of multiple partition types (TU = PU = CU) • Maximum CTU size: 256×256 pixels (128×128 used in common testing conditions) • Binary trees starting from leaves of quad-tree (with horizontal / vertical split indication) → CU can have either square or rectangular shape • Configuration  MinQTSize, MaxBTSize : minimum quadtree leaf node size / maximum binary tree root node size  MaxBTDepth, MinBTSize : maximum binary tree depth / minimum binary tree leaf node size JEM: Quad-Tree plus Binary Tree Partitioning (QTBT) 1 1 0 1 0 0 Figure from: Jianle Chen et al. Algorithm Description of Joint Exploration Test Model 7. Doc. JVET-G1001. Torino, IT, 7th meeting: Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jul. 2017.
  46. 46. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 52 Intra Prediction
  47. 47. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 53 Intra prediction modes • Planar prediction: mode 0 • DC intra prediction: mode 1 • Numbering from diagonal-up to diagonal-down  Modes 2 – 18: horizontal • Modes 19 – 34: vertical • Horizontal: mode 10 Vertical: mode 26 Intra prediction block size • Intra prediction mode coded per CU • Prediction block size derived from residual quadtree • Boundary samples of neighboring block used for prediction • Efficient representation • Local update of prediction source HEVC Intra Prediction Modes
  48. 48. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 54 • Concept of HEVC as basis  Higher number of prediction modes  Larger maximum block size • Chroma  Prediction modes from neighbors  Derived modes from collocated luma JEM Intra Prediction Modes Figure from: Jianle Chen et al. Algorithm Description of Joint Exploration Test Model 7. Doc. JVET-G1001. Torino, IT, 7th meeting: Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, Jul. 2017.
  49. 49. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 55 • HEVC  2-tap filters  Weight derived from prediction direction • JEM  4-tap filters  Cubic interpolation for blocks with ≤ 64 samples  Gaussian interpolation filters elsewhere  Parameters fixed according to block size  Same filter for all predicted samples, all modes Interpolation Filters for Directional Intra Prediction Modes
  50. 50. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 56 • HEVC  Boundary sample filtering for intra prediction modes 10, 26 (horizontal / vertical)  Local, 1-sample update at boundary perpendicular to prediction direction • JEM  Extended to directional modes  Boundary samples up to four columns or rows  2-tap filter for intra modes 2 & 34  3-tap filter for intra modes 3–6 & 30–33 Intra Prediction Boundary Filtering Figure from: . JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
  51. 51. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 57 • Chroma samples predicted using corresponding reconstructed luma samples 𝑝𝑟𝑒𝑑 𝐶 𝑖, 𝑗 = 𝛼 · 𝑟𝑒𝑐 𝐿′ 𝑖, 𝑗 + 𝛽 • Parameters 𝛼 and 𝛽: minimize regression error between neighbouring reconstructed luma and chroma samples around current block • Further prediction between chroma components with updated parameters 𝑝𝑟𝑒𝑑 𝐶𝑟 ∗ 𝑖, 𝑗 = 𝑝𝑟𝑒𝑑 𝐶𝑟 𝑖, 𝑗 + 𝛼 · 𝑟𝑒𝑠𝑖 𝐶𝑏′ 𝑖, 𝑗 Multiple model CCLM mode (MMLM) • Neighbouring luma samples and neighbouring chroma samples classified into two groups • Linear model for each group JEM: Cross-Component Linear Model Prediction (CCLM) Figures from: JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
  52. 52. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 58 • Combination of the un-filtered boundary reference samples and HEVC style intra prediction with filtered boundary reference samples  Position-dependent weighting of filtered and unfiltered reference, configurable by four weighing parameters (hor/ver + corner)  Filtered reference: linear comination of un- filtered reference and lowpass, configurable weight  Three predefined lowpass filters selectable (3-tap, 5-tap, 7-tap)  Prediction parameters stored per block size JEM: Position Dependent Intra Prediction Combination for Planar Mode (PDPC) Figure from: JVET-G1001: Algorithm Description of Joint Exploration Test Model 7.
  53. 53. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 59 • HEVC  Bi-linear smoothing  Depending on prediction block size Mode-dependent Intra Reference Sample Smoothing (MDIS) • Temporally adopted in JEM (removed in JEM7)  Adaptive reference sample smoothing (ARSS)  3-tap LPF with the coefficients of [1, 2, 1] / 4  5-tap LPF with the coefficients of [2, 3, 6, 3, 2] / 16 Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  54. 54. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 60 Inter Prediction
  55. 55. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 61 Prediction from reference picture lists • Uni-prediction  P-slices only with List0, B-slices with List0 or List1  HEVC: Minimum PB size 8×4 or 4×8 • Bi-prediction, only in B-slices  One predictor from List0, one predictor from List1  HEVC: Minimum prediction block size 8×8 Motion Compensated Prediction
  56. 56. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 62 • Merge mode  Motion vector (MV) derived from candidate set (spatial and temporal neighborhood)  Merge mode candidate index coded  No motion vector difference encoded • Advanced motion vector prediction  Predictor derived from candidate set (spatial and temporal neighborhood)  Predictor index coded  Motion vector difference encoded • Skip mode  Only merge candidate signaled, no residual HEVC: Motion Vector Representation
  57. 57. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 63 • CU: at most one set of motion parameters for each prediction direction • Option to split large CU into sub-CUs  Alternative temporal motion vector prediction (ATMVP)  Fetch multiple sets of motion information from multiple blocks in collocated reference picture  Spatial-temporal motion vector prediction (STMVP)  Derive recursively by temporal motion vector predictor and spatial neighbouring motion vector • ATMVP and STMVP: additional merge candidates (list extended to max 7) JEM: Sub-CU based motion vector prediction Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  58. 58. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 64 • Locally adaptive motion vector resolution (LAMVR) motion vector difference (MVD) coded in units of  quarter luma samples,  integer luma samples, or  four luma samples • Higher motion vector storage accuracy  Internal motion vector storage and merge candidate at 1/16 pel (skip and merge modes only)  SHVC upsampling interpolation filters for the additional fractional pel positions JEM Motion Vector Representation SHVC: Scalable High Efficiency Video Coding, HEVC Annex G
  59. 59. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 65 • Overlapped Block Motion Compensation (OBMC) previously been used in ITU-T H.263 • Switchable on CU level  Motion compensation block boundaries except the right and bottom boundaries of CU  Applied for both the luma and chroma components  Performed at sub-block level for all MC block boundaries JEM: Overlapped Block Motion Compensation Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  60. 60. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 66 • Linear model for illumination changes, using a scaling factor a and an offset b  concept taken from 3D-HEVC • Enabled or disabled adaptively for each inter-mode coded coding unit (CU) • Least square error method employed to derive the parameters a and b • CU in 2N×2N merge mode  LIC flag copied from neighbouring blocks (like merge)  Otherwise, LIC flag at CU level JEM: Local Illumination Compensation (LIC) Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  61. 61. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 67 • Motion vector field (MVF) for CU, applicable MV derived for each 4×4 block at 1/16 pel resolution  Control point motion vector (CPMV) • AF INTER mode  Signaling CPMV difference from predictor  Block width and height ≥ 8 required • AF MERGE mode  Derivation of CPMV from neigborhood JEM: Affine Motion Vector Derivation for MC                 y xxyy y x yyxx x vy w vv x w vv v vy w vv x w vv v 0 0101 0 0101 )()( )()( Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  62. 62. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 68 • Special merge mode based on Frame-Rate Up Conversion (FRUC) techniques Options for  Bilateral matching  Template matching (applicable also for AMVP mode, CU level only) • Motion vector derivation process  Initial motion vector for CU of size 𝑊 × 𝐻  Sub-CU motion refinement for blocks of size 𝑀 × 𝑀 𝑀 = max{4, min{ 𝑊 2 𝐷 , 𝐻 2 𝐷}} JEM: Pattern Matched Motion Vector Derivation (PMMVD) bilateral Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  63. 63. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 69 • Sample-wise motion refinement on top of block-wise motion compensation for bi-prediction • No extra signaling, applied on 4×4 block basis • MVF determined by minimizing difference Δ between points 𝐴 and 𝐵 on trajectory by Taylor expansion Δ = 𝐼(0) − 𝐼0 1 + 𝑣 𝑥 𝜏1 𝜕𝐼 1 𝜕𝑥 + 𝜏0 𝜕𝐼 0 𝜕𝑥 + 𝑣 𝑦 𝜏1 𝜕𝐼 1 𝜕𝑦 + 𝜏0 𝜕𝐼 0 𝜕𝑦 • Limited search window • Optimized search  First vertical, then horizontal search  Memory usage: only access samples inside block JEM: Bi-directional optical flow (BIO) Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  64. 64. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 70 • MVs of bi-prediction refined by bilateral template matching process • Search between bilateral template and reference pictures ⇒ refined MV without further signaling • Applied only with reference pictures with pocRef𝑖 < poccurr < pocRef𝑗 • Not applied if enabled in CU:  LIC,  Affine motion,  FRUC, or  sub-CU merge candidate JEM: Decoder-side Motion Vector Refinement (DMVR) Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  65. 65. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 71 Residual Coding
  66. 66. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 72 • Transform block sizes 4×4, 8×8, 16×16, and 32×32  Integer approximations of the DCT-II transform matrix • Additionally, integer approximation of 4×4 DST-VI transform matrix • ’Single-norm’ design per transform block size → simple quantizer implementation • Not all perfectly orthogonal, leakage below normalization threshold HEVC Core Transforms
  67. 67. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 73 • Quantizer step size Δq derived from quantization parameter QP • Exponentional relation of quantizer step sizes • Double step size every 6 QP Δq QP + 1 = 6 Δ 𝑞 QP • Definition: Δq = 1 for QP = 4, thereby Δq,0 = 2− 4 6, 2− 3 6, 2− 2 6, 2− 1 6, 1, 2 1 6 • Quantizer step sizes for given QP Δq QP = Δq,0 QP mod 6 ⋅ 2 QP 6 Quantizer Implementation
  68. 68. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 74 • Large block-size transforms with high-frequency zeroing  Maximum transform size up to 128 × 128  Coefficients with column / row index > 32 set to 0 if  Block width > 64  Block height > 64, respectively • Adaptive multiple core transform (AMT)  Transform matrices quantized more accurately  Applicable for block sizes ≤ 64 × 64  Indicated by CU flag  Mode-dependent transform-set selection for intra prediction modes JEM Transforms Tables from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  69. 69. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 75 • Motivation  Remaining correlation between coefficients after primary transform!  Dependency on intra prediction mode! • Approach: mode dependent transforms (have been studies as tool for HEVC) • MDNSST Structure:  35×3 non-separable secondary transforms for both 4×4 and 8×8 block size  3 NSST candidates for each intra prediction mode  Application of transposed transform blocks for modes > 34 JEM: Mode-Dependent Non-separable Secondary Transforms (MDNSST) Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  70. 70. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 76 • Only applied to the low frequency coefficients after the primary transform  For blocks ≥ 8 × 8, application of 8 × 8 transform to lowest frequency coefficients of primary transform  For blocks < 8 × 8, application of 4 × 4 transform to lowest frequency coefficients of primary transform • Implementation by Hypercube-Givens Transform (HyGT) • Two rounds for 4 × 4, four rounds for 8 × 8 secondary transforms JEM: Mode-Dependent Non-separable Secondary Transforms (MDNSST) Figures from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  71. 71. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 77 • Searching 𝑁 similar patches in reconstructed region of picture, based on template • Scheme of KLT matrix derivation:  Collection of 𝑁 prediction residuals: 𝑼 = (𝒖 𝟏,𝒖 𝟐,…,𝒖 𝑵)  covariance matrix Σ = 𝑼𝑼 𝑻  Eigenvectors are KLT bases • Application of proposed KLT on 4×4, 8×8, 16×16 and 32×32 coding blocks • Note: Tool not activated in JVET Common Testing Conditions [JVET-G1010] JEM: Signal dependent transform Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  72. 72. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 78 Loop Filtering
  73. 73. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 79 • HEVC deblocking filter also used in JEM  Filtering at prediction and transform block edges on a 8 × 8 grid  Independent operation on 8 × 8 blocks possible  parallel processing enabled • Deblocking filtering  Boundary processed in 4-sample sections (edges)  Filter strength determined based on analysis of top and bottom rows of edge  Normal: Filtering of maximum two samples into block  Strong: Up to four samples into block Deblocking Filter
  74. 74. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 80 • HEVC SAO filtering also used in JEM • Local processing of samples  Depending on local neighborhood (edge offset)  Direction signaled, smoothing only  Depending on sample value (band offset)  Configurable correction of sample intensity values for four transition bands • Operation independent of processed samples → parallel processing • Local filter parameter adaptation • Four different offset values available (plus SAO off) • Dedicated SAO parameters for Y, Cb, Cr  Common SAO mode for chroma components Sample Adaptive Offset Filter (SAO) edge offset band offset
  75. 75. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 81 • First loop filter in the decoding process chain of JEM • Each luma sample in reconstructed TU is replaced by weighted average of itself and its neighbours within TU  sample located at (𝑖, 𝑗), neighbouring sample at (𝑘, 𝑙)  𝐼(𝑖, 𝑗 ) and 𝐼(𝑘, 𝑙): reconstructed intensity value  𝜎 𝑑: spatial parameter (transform size, pred.mode)  𝜎𝑟: range parameter (QP) 𝜔 𝑖, 𝑗, 𝑘, 𝑙 = exp − 𝑖 − 𝑘 2 + 𝑗 − 𝑙 2 2𝜎𝑑 2 − 𝐼 𝑖, 𝑗 − 𝐼 𝑘, 𝑙 2 2𝜎𝑟 2 𝐼 𝐹 𝑖, 𝑗 = σ 𝑘,𝑙 𝐼 𝑘, 𝑙 ⋅ 𝜔(𝑖, 𝑗, 𝑘, 𝑙) σ 𝑘,𝑙 𝜔(𝑖, 𝑗, 𝑘, 𝑙)  Integer implementation with look-up table for division JEM: Bilateral filter Figure from: JVET-F1001: Algorithm Description of Joint Exploration Test Model 6.
  76. 76. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 82 • Luma component  25 filters available for each 2×2 block, based on direction and activity of local gradients  Diamond filter shapes (3 × 3, 5 × 5, 7 × 7)  Classification into 25 classes, based on  Activitiy index  Directionality index • Chroma components  Diamond filter shape 5 × 5  No classification  Single set of filter coefficients • Geometric transformations based on data from classification  Transpose, vertical flip, rotation • Filter coefficients signaled with 1st CTU, FIFO buffering for temporal prediction in inter pictures, 16 candidate sets for intra pictures JEM: Adaptive loop filter (ALF)
  77. 77. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 83 Entropy Coding
  78. 78. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 84 • Fixed length and variable length codes (FLC, VLC)  High-level syntax  Parameter sets, slice segment header  SEI messages  Fixed-length codes, Exp-Golomb codes • Arithmetic coding  Slice level, CTUs  Context-based adaptive coding  Bypass coding (complexity, throughput) Entropy Coding CTU = Coding Tree Unit SEI = Supplemental Enhancement Information
  79. 79. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 85 • VCL NAL Unit  FLC, VLC for header information  CABAC for CTUs  Byte alignment in case of multiple tiles, or with wavefront parallel processing (not present otherwise) Fixed and Variable Length Coding NAL = Network Abstraction Layer VCL = Video Coding Layer CABAC = Context-based Adaptive Binary Arithmetic Coding ba = byte alignment
  80. 80. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 86 • Arithmetic coding engine  Binarization  Context model selection  Binary arithmetic coding  Optimized binarization design  Reduced number of non-bypass bins compared to H.264 | AVC • JEM  Modified context modeling for transform coefficients  Multi-hypothesis probability estimation with context-dependent updating speed  Adaptive initialization for context models Context-Based Adaptive Binary Arithmetic Coding (CABAC)
  81. 81. Part IV: Versatile Video Coding ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  82. 82. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 88 • Experimental software “Joint Exploration Model“ (JEM) developed by JVET  Intended to investigate potential for better compression beyond HEVC  Was initially started extending HEVC software by additional compression tools, or replace existing tools (see previous section) • Substantial benefit was shown over HEVC, both in subjective quality and objective metrics  Proven in "Call for Evidence" (July 2017)  JEM was however not designed for becoming a standard (regarding all design tradeoffs)  Call for Proposals was issued by MPEG and VCEG (October 2017) • Call for Proposals very successful (responses received by April 2018)  32 companies in 21 proponent groups responded  46 category-specific submissions: 22 in SDR, 12 each in HDR and 360° video  All responses clearly better than HEVC, some evidently better than JEM  This marked the starting point for VVC development Steps towards next generation standard – Versatile Video Coding (VVC)
  83. 83. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 90 • Document JVET-H1002 • Test categories  Standard dynamic range (SDR): 5 UHD and 5 HD sequences  High dynamic range (HDR): 3 HLG and 5 PQ sequences  360° video (360): 5 sequences in ERP format • Constraint sets  Constraint set 1 (C1): Random access configuration  Max 1.1s random access intervals, structural delay max 16 pictures  Constraint set 2 (C2): Low delay configuration only evaluated for SDR HD sequences  No picture reordering between input and output • Encoding constraints  No pre-processing, post-processing only within the coding loop  Static quantizer setting with one-time change to meet target bitrate  Relevant optimization methods to be reported Joint Call for Proposals (CfP) on Video Compression with Capability beyond HEVC UHD = Ultra High Definition, HD = High Definition, HLG = Hybrid Log Gamma, PQ = Perceptive Quantization (ITU-T BT2020), ERP = Equirectangular Projection
  84. 84. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 91 • SDR-A: 3840×2160 • SDR-B: 1920×1080 • HDR (PQ HD, HLG 4K) • 360 Video (8K, 6K) VVC CfP Test Sequences FoodMarket4 60p CatRobot1 60p DaylightRoad2 60p ParkRunning3 50p Campfire 30p BasketballDrive 50p Cactus 50p BQTerrace 60p RitualDance 60p MarketPlace 60p Market3 HD50p Hurdles HD50p Starting HD50p ShowGirls2 HD25p Cosmos1 HD24p DayStreet 60p PeopleInShop... SunsetBeach 60p ChairliftRide 30p KiteFlite 30p Harbor 30p Trolley 30p Balboa 60p
  85. 85. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 92 • Category-specific submissions (total 46):  SDR: 22 submissions (8 of which are registered only in this category)  HDR: 12 submissions  360°: 12 submissions (2 of which are registered only in this category) For all categories: HEVC anchors (HM) and JEM anchors • Proposals  Described in JVET input documents JVET-J0011...JVET-J0033  Participation of 32 institutions VVC CfP Responses JVET documents available at http://phenix.it-sudparis.eu/jvet
  86. 86. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 93 • Submissions had to provide coded/decoded sequences  4 rate points each, two constraint conditions "low delay" (LD) and "random access" (RA)  SDR: 5x HD (both LD and RA), 5x UHD-4K (only RA)  HDR: 5x HD (PQ grading), 3x UHD-4K (HLG grading)  360°: 5 sequences 6K/8K for the full panorama • Double stimulus test with two hidden anchors HEVC-HM & JEM  Rate points defined with lowest rate was typically less than "fair" quality for HEVC, but still possible to code  Quality was judged to be distinguishable when confidence intervals were non-overlapping • Evaluation: Three ways of judging benefit:  Mean MOS over all test cases (28x4 test points: 23x4 C1, 5x4 C2 )  Count cases where a proposal was visually better/worse than JEM  Count cases where a proposal was visually better than HEVC (HEVC at higher rate point) • Reports: Input subjective test [JVET-J0080], output CfP results [JVET-J1003] Performance
  87. 87. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 94 • Measured by objective performance (PSNR), best performers report >40% bit rate reduction compared to HEVC, >10% compared to JEM (for SDR case)  Similar ranges for HDR and 360°  Obviously, proposals with more elements show better performance  Some proposals showed similar performance as JEM with significant complexity/run time reduction  2 proposals used some degree of subjective optimization, not measurable by PSNR • Results of subjective tests generally show similar (or even better) tendency  Benefit over HEVC very clear  Benefit over JEM visible at various points  Proposals with subjective optimization also showing benefit in some cases Performance
  88. 88. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 95 • JVET-J1003: Report of subjective evaluation contains 28 plots as shown, one per sequence • Count significant cases of positive/ negative benefit with non-overlapping confidence interval against JEM Performance HM JEM Proposals ranked by MOS (per rate point) +1 credit -1 credit
  89. 89. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 96 • "Mean" and "significance-count" method suggested at least 7 proposals that were obviously better than JEM Performance SDR Pxx 10 Pxx 8 Pxx 8 Pxx 6 Pxx 6 Pxx 6 Pxx 6 Pnn 3 Pnn 3 Pnn 2 Pnn 2 Pnn 1 Pnn 1 JEM 0 Pnn 0 Pnn -1 Pnn -1 Pnn -1 Pnn -2 Pnn -2 Pnn -2 Pnn -3 Pnn -4 HM -36 Pxx 6,53 Pxx 6,46 Pxx 6,41 Pxx 6,37 Pxx 6,33 Pxx 6,33 Pxx 6,26 Pnn 6,23 Pnn 6,17 Pnn 6,15 Pnn 6,13 Pnn 6,11 Pnn 6,04 Pnn 6,04 Pnn 6,03 Pnn 6,03 Pnn 6,01 JEM 6,01 Pnn 6,00 Pnn 5,96 Pnn 5,94 Pnn 5,88 Pnn 5,86 HM 4,57 Mean MOS Significance vs. JEM 60 ... +60
  90. 90. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 97 • Similar tendency in HDR and 360° categories • Mostly same coding tools as in SDR provide good benefit Performance HDR / 360° Mean MOS Signif. vs. JEM Pxx 6,04 Pxx 6,00 Pxx 5,94 Pxx 5,93 Pxx 5,86 Pnn 5,85 Pnn 5,80 Pnn 5,67 JEM 5,62 Pnn 5,60 Pnn 5,59 Pnn 5,45 Pnn 5,11 HM 4,14 Pxx 7 Pxx 3 Pxx 2 Pxx 2 Pxx 2 Pnn 1 Pnn 1 JEM 0 Pnn 0 Pnn 0 Pnn -1 Pnn -1 Pnn -6 HM -20 32 ... +32 Mean MOS Signif. vs. JEM Pxx 6,20 Pxx 6,19 Pxx 6,06 Pxx 6,03 Pxx 5,99 Pxx 5,96 Pxx 5,86 Pnn 5,69 Pnn 5,67 Pnn 5,51 Pnn 5,45 JEM 5,11 HM 3,79 Pnn 3,45 Pxx 9 Pxx 9 Pxx 8 Pnn 7 Pxx 7 Pxx 6 Pxx 5 Pxx 4 Pnn 2 Pnn 1 Pnn 1 JEM 0 HM -9 Pnn -12 20 ... +20HDR 360°
  91. 91. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 98 • How often are best performing proposals better than HEVC at higher rate? • Note: R11 Mbit/s; R2 1.6 Mbit/s; R3 2.8 Mbit/s; R4 4.6 Mbit/s Performance compared to HEVC Pbest vs HM R1 vs R2 R1 vs R3 R1 vs R4 R2 vs R3 R2 vs R4 R3 vs R4 SDR UHD 60% 40% 0% 80% 0% 20% SDR HD/RA 40% 0% 0% 20% 0% 20% SDR HD-/LD 40% 0% 0% 0% 0% 0% HLG 67% 0% 0% 67% 0% 33% PQ 40% 0% 0% 40% 0% 20% 360° 40% 20% 0% 20% 0% 60% Rate saving  37.5%  65%  78%  43%  35%  39%
  92. 92. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 99 • How often is HEVC better than best performing proposals at lower rate? - Note: 1-xx% means that best performing proposal is equal or better • Note: R11 Mbit/s; R2 1.6 Mbit/s; R3 2.8 Mbit/s; R4 4.6 Mbit/s Performance compared to HEVC HM vs Pbest R1 vs R2 R1 vs R3 R1 vs R4 R2 vs R3 R2 vs R4 R3 vs R4 SDR UHD 0% 0% 60% 0% 0% 0% SDR HD/RA 0% 60% 100% 0% 80% 0% SDR HD-/LD 0% 60% 80% 0% 80% 0% HLG 0% 0% 100% 0% 67% 0% PQ 0% 60% 100% 0% 60% 0% 360° 0% 40% 80% 0% 40% 0% Rate saving  37.5%  65%  78%  43%  65%  39%
  93. 93. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 100 • The subjective quality of best performing proposals is always equal or sometimes better (~1/3 of cases) than HEVC at next higher rate point, over all categories (with approx. 40% less rate) • The subjective quality of best performing proposals is always equal or sometimes better (~1/5 of cases) than HEVC at 2nd higher rate point, in SDR-UHD category (with approx. 65% less rate) • Though it is not always the same proposal that performs best at a given rate point, it can be anticipated that merits of different proposals can be combined • 50% (or more) bit rate reduction with same quality will probably be achievable by the new standard Performance compared to HEVC
  94. 94. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 101 • New elements (some come with high complexity):  Decoder side estimation for mode/MV derivation and sample prediction both in intra and inter coding (JEM)  Finer partitioning: Asymmetric, geometric  Neural networks for prediction, loop filtering, upsampling, (encoder control)  Additional elements using template matching  Intra block copy / current picture referencing  Additional non-linear, de-noising and statistics-based loop filters  Additional linear and non-linear elements in prediction • HDR specific:  New adaptive reshaping and quantization, also in-loop  HDR-specific modifications of existing tools, e.g. deblocking • 360-video specific:  Variants of projection formats, geometry-corrected face boundary padding  Modification and disabling of existing tools at face boundaries CfP analysis: What was proposed?
  95. 95. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 102 • VVC Working Draft 1 / Test Model 1 (VTM1): basic approach built on "reduced HEVC" starting point • VTM Block structure  Unified tree (coding block unites prediction and transform)  CTU size 128x128, rectangular blocks (dyadic sizes), smallest luma size 4x4  Maximum transform size 64x64 • VTM: Some removed elements of HEVC:  Mode dependent transform (DST-VII), mode dependent scan  Strong intra smoothing  Sign data hiding in transform coding  Unnecessary high-level syntax (e.g. VPS)  Tiles and wavefront  Quantization weighting VVC Working Draft and Test Model 1
  96. 96. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 103 • Report of Results from the Call for Proposals on Video Compression with Capability beyond HEVC [JVET-J1003]  Documentation of results per sequence, marking HM and JEM anchors, not identifying individual proponents  Assessment of qualitative (and as far as possible quantitative) benefit of submitted technology compared to anchors • Working Draft 1 of Versatile Video Coding [JVET-J1001]  "Reduced" HEVC plus quad/binary/ternary tree structure • Test Model 1 of Versatile Video Coding (VTM 1) [JVET-J1002]  Corresponding encoder and algorithm description Documents issued after CfP Results
  97. 97. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 104 • Benchmark Set (BMS) was defined in addition to VTM, including the following well-known JEM tools: • 65 intra prediction modes • Coefficient coding • AMT + 4x4 NSST • Affine motion • Geometry based adaptive loop filter • Subblock merge candidate (ATMVP) • Adaptive motion vector precision • Decoder motion vector refinement • LM Chroma mode • Purpose: Testing benefit of technology against better performing set  Holding extra potential features we aren’t so sure about yet  Superset of VTM; should have significant gain over the VTM  Unveils in CEs whether gains are independent, or how much gain remains when a tool is combined with a set of more performant tools  Can be a common basis for further CE tests of modified versions of features  Not necessarily ultra-low complexity, but encoder needs to be runnable in reasonable amount of time Benchmark Set and its role
  98. 98. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 105 • The only fundamental new element of version 1 • Simple multi-type tree split, can be alternated Quad/binary/ternary partitioning Example: Figures from: JVET-J1001
  99. 99. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 106 • PSNR-based Common Test Conditions (CTC) BD-Rate savings relative to HEVC reference software (10 bit) • Note that gain over HEVC with CTC is lower than with CfP test set (other sequences, higher rates, lower resolutions) Performance of VTM1 and initial BMS compared to HEVC vs HM16.18 VTM BMS 4k UHD 10% 28% 1080p 8% 22% WVGA 6% 19% Average 8% 23% Decode time 0.8× 2× Encode time 2× 9×
  100. 100. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 107 • Working Draft 2 of Versatile Video Coding [JVET-K1001]  Normative text specification  No descriptive text of building blocks "borrowed" from HEVC: These would anyway be placeholders which are likely to be replaced later  Starting from this meeting, precise specification of more substantial newly adopted building blocks is being added (see subsequent slides) • Test Model 2 of Versatile Video Coding (VTM 2) [JVET-K1002]  Encoder and algorithm description  Has corresponding software implementation Latest status (from last week)
  101. 101. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 108 • QT/BT/TT no longer “placeholder” • Remove unnecessary partitioning restrictions • Implicit splitting at picture boundaries • Separate trees for intra slices • Position Dependent Prediction Combination • Cross Component Linear Model • 87 intra modes (wide angles included), 3 MPM, TU binarization • Affine MC (4x4 fixed subblock size, 4/6 parameter model switching at CU level) • Affine MV coding  list construction contains inheritance and derivation spatial/temporal  improved difference coding • Adaptive motion vector resolution (AMVR) • Subblock MC (4x4) from ATMVP merge, 8x8 granularity motion vector storage [High precision] Latest status (from last week): New elements of WD2 / VTM2
  102. 102. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 109 • Multiple transform selection (all are DCT/DST types) for intra and inter • Increase max QP from 51 to 63 • Modified entropy coding supporting dependent quantization • Sign data hiding reinvoked from HEVC • Adaptive loop filter  4x4 classification based (gradient strength & orientation) for luma  7x7 luma, 5x5 chroma filters)  enabling flag at CTU level • Basic high-level syntax (SPS, PPS, slice) • Update of BMS contains  generalized Bi prediction (kind of local weighted prediction)  Decoder-side estimation: BIO, simplified bilateral matching  Current picture referencing (aka intra block copy) Latest status (from last week): New elements of WD2 / VTM2
  103. 103. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 110 • For rectangular blocks, prediction directions witch angles beyond 45/135 degrees are reasonable • This can be implemented by adding modes at both ends • VTM2 uses a total of 85 directional intra modes now (plus DC and planar) Wide angular modes Figures from JVET-K0500
  104. 104. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 111 • Alternating between two quantizers based on state transition rule allows to select an optimum sequence of reconstruction values (e.g. by trellis-like search) • Decoder needs to implement the sequential state transition rule • CABAC contexts needs to be modified as well for this case (greater than 0/1/2/... would have different meaning depending on Q0/Q1) Dependent quantization 0 1 2 3 Q0 Q1 (k & 1) == 1 (k & 1) == 1 (k & 1) == 1 (k & 1) == 1 (k & 1) == 0 (k & 1) == 0 start state current state next state for … (k & 1) == 0 (k & 1) == 1 0 0 2 1 2 0 2 1 3 3 3 1 -9Δ -8Δ 8Δ3Δ2Δ 4Δ 5Δ 6Δ 7Δ-Δ-6Δ-7Δ -5Δ -4Δ -3Δ -2Δ Δ0 9Δ 0 1 4-2 1-4 -3 0 -1 Q0 t 2 3 2 3 4 5-1-2-3-4-5 Q1 A AA BA B B A B DC C D C DDCDCD Figures from JVET-K0071
  105. 105. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 112 • Ongoing investigations on  Improved merge, intra prediction, etc.  Decoder-side estimation with low complexity  Multi-hypothesis prediction and OBMC  Diagonal and other geometric partitioning  Secondary transforms  New approaches of loop filtering, reconstruction and prediction filtering (denoising, non-local, diffusion based, bilateral, etc.)  Current picture referencing, template matching, palette mode  Neural networks for loop filtering and prediction • Core experiments (CE) process  coordinated effort to investigate performance, complexity impact of proposed elements  typically based on a specific technology proposed, or combination of several technologies  allows detailed study / cross-checks by other interested parties  allows identifying which elements of a proposal are useful, if it is nit useful at all, or if further improvements are needed Further promising fields
  106. 106. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 113 • Motivation: Towards object-oriented coding  Follow object boundaries more closely  Less coding artifacts where it matters • Prediction, transform and coding driven by actual object shape under RD-constraint  Inter- and intra-predicted segments for handling of disocclusions  Overlapped wedge based filtering at partition boundary  Shape-adaptive DCT for spatially localized transform coding Geometric Partitioning (GEO) Source: M. Bläser, J. Sauer, and M. Wien, “Description of SDR and 360o video coding technology proposal by RWTH Aachen University,” Doc. JVET-J0023, Joint Video Experts Team of ITU-T VCEG and ISO/IEC MPEG, San Diego, USA, 10th meeting, Apr. 2018
  107. 107. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 114 • GEO available for all block sizes ≥ 8×8 luma samples • Partitioning is represented by two coordinate points 𝑃0 and 𝑃1 on the block boundary • Prediction of two coordinate points 𝑃0 and 𝑃1 from 16 pre-defined templates (scaled for non-square blocks)  Alternative: Spatial or temporal prediction  Refinement: block size dependent offset • Integration with AMVP, MERGE, FRUC (no AFFINE (yet)) GEO: Partitioning Coding and Prediction
  108. 108. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 115 Results for GEO JEM 7.0 JEM 7.0 + GEO • Visual improvements at object boundaries  Sharper contours  Less staircase-effect  More background details • Objective gains (BD-rate savings)  Against HEVC: ~33% on C1, ~25% on C2  Against JEM: ~0.8% for both, C1 and C2 JEM 7.0
  109. 109. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 116 Results for GEO JEM 7.0 JEM 7.0 + GEO • Visual improvements at object boundaries  Sharper contours  Less staircase-effect  More background details • Objective gains (BD-rate savings)  Against HEVC: ~33% on C1, ~25% on C2  Against JEM: ~0.8% for both, C1 and C2 JEM 7.0 + GEO
  110. 110. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 117 • CE1: Partitioning • CE2: Adaptive loop filter • CE3: Intra prediction and mode coding • CE4: Inter prediction and MV coding • CE5: Arithmetic coding engine • CE6: Transforms and transform signalling • CE7: Quantization and coefficient coding • CE8: Current picture referencing • CE9: Decoder side MV derivation • CE10: Combined and multi-hypothesis prediction • CE11: Deblocking • CE12: Mapping for HDR content • CE13: Coding tools for omnidirectional video • CE14: Post-reconstruction filtering • CE15: Palette mode Current Core Experiments
  111. 111. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 118 • Technically similar elements to HEVC/JEM/VVC or JVET study  Partitioning: 128x128 "superblock" with equivalent to quad/binary sub-splits (no 1:2:1 ternary)  Directional intra prediction, 56 directional modes, DC and "true motion" mode  Chroma from luma prediction  Intra block copy  Up to 7 reference frames (allows similar structure to hierarchical B)  Spatial/temporal motion vector referencing  Affine motion compensation (pixel based)  OBMC  DCT/DST based transforms, and skip  Adaptive arithmetic coder  Context-based transform coefficient coding  Film grain synthesis  Adaptive loop filter (Wiener like)  Deblocking AOM's AV1
  112. 112. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 119 • Other elements  Recursive-filtering intra predictor  Prediction based on color palette  Wedge-based prediction, 16 diagonal/asymmetric modes for square/rectangular blocks, similar to GEO  Difference-modulated prediction (based on difference between two references)  Contrast enhancement/deringing loop filter  Self-guided filter (somewhat similar to bilateral & diffusion filters)  Super-resolution coding mode (with coding at lower res.) • Performance  Owners report 20% average bit rate reduction (PSNR based) compared to X.265-style HEVC encoder, set of full HD sequences  Other reports indicate much less gain, or even losses compared to HM encoder (using sequences from JVET's CTC)  According to the same reports, JEM performs significantly better than AV1  Some of those may not have used the newest JEM version, though AOM's AV1
  113. 113. Part V: Exploratory trends and perspectives ICME 2018 Tutorial: Trends and Recent Developments in Video Coding Standardization Jens-Rainer Ohm Mathias Wien Institute of Communication Engineering Institute of Imaging and Computer Vision RWTH Aachen University, Germany RWTH Aachen University, Germany ohm@ient.rwth-aachen.de wien@lfb.rwth-aachen.de
  114. 114. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 121 • PSNR mostly used for video quality assessment  targeting Pixel fidelity which does not necessarily reflect subjective quality • Specific artifacts produced by video codecs:  blockiness, blur and banding  motion jerkiness  time-varying edge noise ("mosquito effect") • Alternative metrics may be clustered into  full reference quality metrics  reduced reference quality metrics  no-reference quality metrics • Note that also subjective testing methods require some reference (e.g. impairment compared to original or another anchor)  full reference metrics are most reliable and are also typically used for encoder decisions • Note: Subsequent slide gives an example (SSIM) – not claimed that this is the best! Quality metrics
  115. 115. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 122 • Example of another full-reference metric which better matches subjective quality at least for images • Structural SIMilarity Index (SSIM) [Wang et al. 2004] measures the structural distortion by exploring three components: Luminance, Contrast and Structural changes.  Luminance:  Contrast:  Structure comparison: • Numerous variants:  Computation separately for regions  Weighting by amount of motion and frame averaging for video  Computation in complex wavelet domain for frequency weighting (MS-SSIM, multi-scale) Perceptually adapted quality metrics example: SSIM 1 2 2 1 2 ( , ) x y x y C l x y C         2 2 2 2 2 ( , ) x y x y C c x y C         3 3 2 ( , ) xy x y C s x y C       ( , ) [ ( , )] .[ ( , )] .[ ( , )]SSIM x y l x y c x y s x y   
  116. 116. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 123 • Textures with large amount of detail and/or motion are often extremely challenging for video codecs • On the other hand, the exact pixel-wise appearance is largely irrelevant for human observers, whereas degradation of visual quality is critical • Textures in videos can be static or dynamically changing over time  Static textures basically rigid (but may be moving globally)  Dynamic textures have high amount of irregular local motion  Examples: water, smoke, head-and-shoulder sequences • Both categories should have some stationarity properties in space and/or time, for allowing modelling as random process expressed by parametric description – examples:  Spectral properties  Moments (marginal statistics and covariance statistics)  Random field models • In case of dynamic texture, modelling the motion properties is relevant as well, can also be understood as a random field with certain amount of variation Perceptual coding: Texture analysis and synthesis
  117. 117. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 124 • Example below is based on a parametric statistical description in complex wavelet domain (steerable pyramid), with lowpass baseband and four directional orientations in bandpass layers [Portilla, Simoncelli 2000] • Efficient coding of parameters needed for synthesis by [Thakur, Ray 2016] • Marginal statistics expressed as scalar values • Auto and cross correlation statistics compressed via DCT Static texture synthesis Reference HEVC Intra Coding 0.223bpp Thakur et al. 0.213bpp
  118. 118. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 125 MVF MV T(i,j) Dense OF between adjacent frames Analyse Motion Distribution Discard non-probable MV combinations T original frames MVF MV T'(i,j) Compressed MCM Mc MCM M Discard Intermediate Frames Derive Motion Vectors Invert MVF Synthesized MVF T-2 synthesized frames Frame Warping and Blending Analysis Synthesis Source: Chubach et al. 2017 Dynamic texture synthesis method
  119. 119. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 126 HEVC 6 of 8 frames synthesized Dynamic texture synthesis vs. HEVC at same rate
  120. 120. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 127 • Recently, many signal processing tasks are solved by employing machine learning, deep learning and convolutional neural networks (CNN) • Advantages for video compression could be as follows: • Systematic approach of optimizing with big data sets (rather than hand-crafted design) • Detection and exploitation of nonlinear dependencies in images and video • Inclusion of perceptual criteria by mimicking human observer behaviour • On the downside, both training and running e.g. CNN algorithms e.g. for encoder decisions or at the decoder may be overly complex • Types of NN that have been proposed for image/video compression • Autoencoders • Adversarial networks • Recurrent networks, particularly based on LSTM (long short-term memory) elements Learning based approaches: Overview
  121. 121. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 128 • An autoencoder is a deep (convolutional) neural network with a sparse hidden layer that represents the code • The encoder typically performs subsequent filtering and downsampling steps on input x per layer (note conceptual similarity with transform coding!) • The decoder performs complementary upsampling steps and generates output y • Encoder and decoder are trained jointly such that • Difference between x and y is minimized w.r.t. some distortion • Code z is as sparse (minimum amount of information) as possible • Use Bayes formula P(z|x) P(x|z)P(z) and minimize Kullback Leibler divergence of conditional probabilities to achieve the latter [Kingma, Welling 2014] Convolutional Neural Networks: Autoencoders (AE) Source: Wikipedia x y z=F(x) y=G(z)
  122. 122. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 129 • Generator net G generates samples y from random variables z (G would be the decoder, z the code) • Discriminator net D decides whether the samples could match with real-world images x which stem from an unknown distribution P(x) • Generator and discriminator nets are trained iteratively, optimizing following function • Minimax optimization: • Train D such that V is maximized • Train G such that V is minimized • Problem: There is no corresponding mapping from x to z (no encoder) • Solution (e.g. [Santurkar et al. 2017]): Combination AE and GAN, i.e. train F(x) from AE joint with G(z) and D(⋅) Convolutional Neural Networks: Generative Adversarial Networks (GAN) Source: Slideshare.net – K. McGuinness z x y G(z) D(x) or D(y)
  123. 123. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 130 • Variable-rate and variable-size coding not straightforward • Option to operate over small patches / blocks • Train separate for different content complexity • Code residual differences • Cost functions for rate distortion optimization not straightforward to implement • Option to re-formulate rate constraint as energy minimization problem • Hybrid solutions where conventional entropy coding is operated after network output at encoder • None of these solutions may lead to a consistent optimum, and may require to be driven by some external decision mechanism Convolutional Neural Networks: General problems and possible solutions
  124. 124. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 131 • Autoencoder could be interpreted as a monolithic non-linear transform (though operating with local kernels) – see previously used notation in light green below • A similar approach is proposed in [Ballé et al. 2017], with additional criteria for rate distortion optimization and quantization / entropy coding on the sparse representation (called y here) • Perceptual optimization based on nonlinear "generalized divisive normalization" and L2 norm minimization in nonlinear space • Authors report significantly improvement on detail structures, also improved MS-SSIM compared to conventional codecs – transform optimized based on cost criterion below: Trained non-linear transforms (x) (y) (z)F(x) G(z') (z') Source: Ballé et al. 2017
  125. 125. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 132 • All methods discussed so far were developed for still image coding, and could be used in intra coding for video • Main problem: Motion compensation is a very effective tool, and can hardly be trained into a network (or would be tremendously more complex than conventional motion estimation) • Some work on using CNN for  Sub-pel interpolation  Resolution up-conversion  Post-processing  Texture synthesis and inpainting • It is also not as simple to train for perceptual criteria in video NN for video
  126. 126. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 133 • NN-based approaches were so far more successful in still image coding rather than video coding  Perceptual criteria also better understood for images • In video coding, motion compensation is a most effective key component  Requires motion estimation for which "conventional" algorithms appear to be less complex  Analogy: Eye tracking – the brain processes a motion compensated input • CNN have been demonstrated to provide benefit in context of video coding for  Resolution up-conversion  Post-processing and loop filtering  Intra coding  Encoder optimization, in particular partitioning which is basically a segmentation problem NN for video
  127. 127. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 134 • Switching to lower resolution is common (an necessary) when data rate is low • Video is locally varying by detail, and may not require encoding at full resolution everywhere • Lower resolution may also be useful with high motion, motion blur, etc. • Need to code less information in such irrelevant areas can save data rate • Tools "Reduced Resolution Update" or "Dynamic Resolution Conversion" were included in MPEG-4 part 2 and H.263+, but not well understood by that time • Requires tools for  downsampling when generating prediction from reference  signalling the coding with variable resolution  upsampling for generating full-resolution picture • Three examples shown subsequently:  Down/Up-sampling using neural networks / conventional filters  Coding B pictures of dynamic texture with low resolution  Dictionary-based super-resolution upsampling Variable-resolution coding
  128. 128. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 135 • Basic idea of dynamic resolution coding:  Downsample and code by lower resolution (less bitrate cost)  Upsample at decoder side to full resolution  Encoder decides using full res, conventional or CNN-based down- and upsampling  CNN-based could generate super-resolution upsampling, sharper edges, etc. • Can be implemented in combination with intra and inter prediction coding • Operated on block by block basis CNN for resolution up-conversion Figure from JVET-J0032
  129. 129. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 136 • Loop filtering is common in video coding  removes compression artifacts from reconstruction  improves prediction from reconstructed frames • Generally, signal-adaptive and non-linear filters  e.g., de-blocking, de-ringing, de-banding  edge-adaptive & Wiener optimized  bi-lateral filters  ... • CNN reconstruction provides additional gain (3-5% rate red.) and might replace some conventional filters • Can be operated on block basis, parallel processing possible CNN for loop filtering Figures from JVET-I0022 Process Unit Block7 2*padding_size Block6 Block1 Block2 Block3 Block4 Block5 Block8 Block9 Block10 2*padding_size padding_size Block11 Block12 Block13 Block14 Block15 Block16 Block17 Block18 Block19 Block20 2*padding_size padding_size Conv1 (5, 5, 45) Conv2 (3, 3, 54) Conv3 (3, 3, 58) Conv4 (3, 3, 48) Conv5 (3, 3, 51) Conv6 (3, 3, 40) Conv7 (3, 3, 31) Convolution8 (3, 3, 1) Normalized QP MapNormalized Y/U/V Concat Summation ConvL (M,N,KL) ConvolutionL (M,N,KL) ReLU M: kernel width N: kernel height KL: kernel number
  130. 130. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 137 • Neural networks were demonstrated to provide improved intra prediction, compared to conventional directional and planar modes • Mostly fully connected networks have been used for this purpose (no convolutional layers) • Average rate reductions of 4-5% (for intra coding) have been reported • Examples of prediction demonstrate the benefit of non-linear processing Neural networks for intra prediction Figure from JVET-J0037 Figures from Li et al. IEEE-TCSVT, July 2018
  131. 131. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 138 • Key pictures coded with full resolution • Non-key pictures coded with reduced resolution • Upsampling based on motion-compensated steerable pyramid Variable-resolution coding for dynamic texture (Thakur et al. 2017) Ref pic L0 Ref pic L1 Lowpass Lowpass Lowpass Original Pictures Reconstructed key Pictures Predicting Non-Key Pictures
  132. 132. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 139 • Motion vectors initially estimated from downsampled lowpass key pictures, refined and applied in bandpass and highpass components of non-key pictures • Authors report significant bit rate saving (20-30% average) for dynamic texture content, whereas subjective quality is preserved compared to full-resolution coding Variable-resolution coding for dynamic texture (Thakur et al. 2017) Motion Estimation Motion Compensation Bandpass Current LowpassReference Lowpass HighpassHighpass Bandpass Key picture Non-key picture
  133. 133. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 140 • Low and high-resolution dictionaries trained jointly with sparsity constraint (large data base) • Up-converter searches low number of matching dictionary bases in low res, and applies the corresponding bases from the high res dictionary Low-resolution coding with dictionary-based up-conversion (Schneider et al. 2017)
  134. 134. Trends and Recent Developments in Video Coding Standardization | Tutorial at ICME 2018 | San Diego, CA, USA | Jens-Rainer Ohm and Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | Lehrstuhl für Bildverarbeitung | 23.07.2018 141 • Scheme run with overlapping blocks • Provides sharp reconstruction of structures and edges • Authors report 2-3% rate gain when used in upsampling for HEVC scalable coding Low-resolution coding with dictionary-based up-conversion (Schneider et al. 2017)

×