Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360° Video

Versatile Video Coding – Video Compression beyond
HEVC: Coding Tools for SDR and 360°Video
AAU Klagenfurt, May 14th, 2018
Mathias Wien
Institut für Nachrichtentechnik
RWTH Aachen University
wien@ient.rwth-aachen.de

Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360°Video |
Mathias Wien | RWTH Aachen University | Institut für Nachrichtentechnik | 14.05.2018
2
• Introduction
 Standardization development and process
• Versatile Video Coding Development
 Joint Call for Proposals Outcome
• Coding Tools
 Versatile Video Coding Test Model
• Tools proposed by RWTH
 Geometric Partitioning
 360° Tools
• Summary and Outlook
 Next steps
Outline

3
INTRODUCTION

4
Video coding standardization organisations
• ISO/IEC MPEG = “Moving Picture Experts Group”
(ISO/IEC JTC 1/SC 29/WG 11 = International Standardization Organization and International Electrotechnical
Commission, Joint Technical Committee 1, Subcommittee 29, Working Group 11)
• ITU-T VCEG = “Video Coding Experts Group”
(ITU-T SG16/Q6 = International Telecommunications Union – Telecommunications Standardization Sector
[United Nations Organization, formerly CCITT], Study Group 16, Working Party 3, Question 6)
• JVT = “Joint Video Team” collaborative team of MPEG & VCEG, responsible for developing Advanced Video
Coding (AVC) (discontinued in 2009), documents and software publicly available
• JCT-VC = “Joint Collaborative Team on Video Coding” team of MPEG & VCEG , responsible for
developing High Efficiency Video Coding (HEVC) (established January 2010), documents and software
publicly available
• JVET = “Joint Video Exploration Team” exploring potential for new technology beyond HEVC (established
Oct. 2015) – renamed to “Joint Video Experts Team” responsible for developing Versatile Video Coding
(VVC) from April 2018, documents and software publicly available

5
History of international video coding standardization
H.263/+/++
(1995-2000+)
MPEG-4
Visual
(1998-2001+)
MPEG-1
(1993)
ISO/IECITU-T
H.120
(1984-1988)
H.261
(1990+)
H.262 / 13818-2
(1994/95-1998+)
H.264 / 14496-10
AVC
(2003-2008+)
H.265 / 23008-2
HEVC
(2013-2016+)
Videotelephony
Computer
SD HD 4K UHD
(Advanced Video Coding) (High Efficiency
Video Coding)
(MPEG-2)
H.26x / 23090-3
VVC
(2020-...)
8K, 360, ...
(Versatile Video Coding)

6
The scope of video standardization
• Only Specifications of the Bitstream, Syntax, and Decoder are standardized:
• Permits optimization beyond the obvious
• Permits complexity reduction for implementability
• Provides no guarantees of quality
Pre-Processing Encoding
Source
Destination
Post-Processing
& Error Recovery
Decoding
Scope of Standard

7
Hybrid Coding Concept
Basis of every standard since H.261

8
Performance history of standard generations
0 100 200 300
28
30
32
34
36
38
40
bit rate (kbit/s)
PSNR
(dB)
Foreman
10 Hz, QCIF
100 frames
HEVC
AVC H.262/MPEG-2 H.261H.263 +
MPEG-4 Visual
JPEG
35
Bit-rate Reduction: 50%

9
• Video is continually increasing by resolution
 HD existing, UHD (4Kx2K, 8Kx4K) appearing
 Mobile services going towards HD/UHD
 Stereo, multi-view, 360° video
• Devices available to record and display ultra-high resolutions
 Becoming affordable for home and mobile consumers
• Video has multiple dimensions to grow the data rate
 Frame resolution, Temporal resolution
 Color resolution, bit depth
 Multi-view
 Visible distortion still an issue with existing networks
• Necessary video data rate grows faster than feasible network transport capacities
 Better video compression (50% rate of current HEVC) needed, even after availability of 5G
Motivation for improved video compression

10
VVC DEVELOPMENT

11
• Exploration activity in the JVET
starting Oct/2015
 Investigation of tools integrated on top of HEVC:
Joint Exploration Model (JEM) software
 Larger block sizes / transforms, improved intra / inter
prediction tools, decoder-side derivation / refinement
methods, adaptive loop filters, …(*)
Objective gains >25% measurable on test set
 But: Evaluation software only, significant increase of
encoder run times
 Joint Call for Evidence (issued Mar/2017, evaluated Jul/2017):
Significant compression gains asserted
 Joint Call for Proposals (issued Oct/2017, evaluated Apr/2018):
Kick-off for Versatile Video Coding (VVC)
Steps towards next generation standard – Versatile Video Coding (VVC)
Figure from: JVET AHG report: Tool evaluation (AHG1) [JVET-H0001](*) Details on JEM coding tools in VCAS lecture

12
• VVC should be applicable for many types of data
 SDR and HDR up to extremly high resolutions
 All kind of camera generated content
 Computer generated content
 Non-camera video modalities e.g. medical data
 360°, lightfield, depth, and volumetric video
• VVC should support flexible random and localized access
 Low delay, random access, trick modes
 Error resilience, video buffer, system layer interface
 Possible support for scalability and multi-view
Steps towards next generation standard – Versatile Video Coding (VVC)

13
• Document JVET-H1002
• Test categories
 Standard dynamic range (SDR): 5 UHD and 5 HD sequences
 High dynamic range (HDR): 3 HLG and 5 PQ sequences
 360° video (360): 5 sequences in ERP format
• Constraint sets
 Constraint set 1 (C1): Random access configuration
 Max 1.1s random access intervals, structural delay max 16 pictures
 Constraint set 2 (C2): Low delay configuration only evaluated for SDR HD sequences
 No picture reordering between input and output
• Encoding constraints
 No pre-processing, post-processing only within the coding loop
 Static quantizer setting with one-time change to meet target bitrate
 Relevant optimization methods to be reported
Joint Call for Proposals (CfP) on Video Compression with Capability beyond HEVC
UHD = Ultra High Definition, HD = High Definition, HLG = Hybrid Log Gamma, PQ = Perceptive Quantization (ITU-T BT2020), ERP = Equirectangular Projection

14
• SDR-A: 3840×2160
• SDR-B: 1920×1080
• HDR (PQ HD, HLG 4K)
• 360 Video (8K, 6K)
VVC CfP Test Sequences
FoodMarket4 60p CatRobot1 60p DaylightRoad2 60p ParkRunning3 50p Campfire 30p
BasketballDrive 50p Cactus 50p BQTerrace 60p RitualDance 60p MarketPlace 60p
Market3 HD50p Hurdles HD50p Starting HD50p ShowGirls2 HD25p Cosmos1 HD24p
DayStreet 60p PeopleInShop... SunsetBeach 60p
ChairliftRide 30p KiteFlite 30p Harbor 30p Trolley 30p Balboa 60p

15
• Category-specific submissions (total 46):
 SDR: 22 submissions (8 of which are registered only in this category)
 HDR: 12 submissions
 360°: 12 submissions (2 of which are registered only in this category)
For all categories: HEVC anchors (HM) and JEM anchors
• Proposals described in input documents JVET-J0011...JVET-J0033
 Participation of 32 institutions
• Evaluation: Double stimulus test
 Rate points: lowest rate was typically less than "fair" quality for HEVC, but still possible to code
 Three ways of judging benefit:
 Mean MOS over all test cases (28x4 test points: 23x4 C1, 5x4 C2 )
 Count cases where a proposal was visually better/worse than JEM
 Count cases where a proposal was visually better than HEVC (HEVC at higher rate point)
• Reports: Input subjective test [JVET-J0080], output CfP results [JVET-J1003]
VVC CfP Responses

16
• Objective performance: best performers report
>40% bit rate reduction compared to HEVC,
>10% compared to JEM (for SDR case)
 2 proposals used some degree of subjective optimization
 1 proposal used large-segment multipass encoding
 Similar ranges for HDR and 360°
 Obviously, proposals with more elements show better performance
 Nevertheless, some proposals show similar performance as JEM with significant complexity/run time
reduction vs. JEM
• Subjective tests generally show similar (or even better) tendency
 Benefit over HEVC very clear
 Benefit over JEM visible at various points
Performance

17
• JVET-J1003:
Report of subjective
evaluation contains
28 plots as shown,
one per sequence
• Count significant
cases of positive/
negative benefit
with non-overlapping
confidence interval
against JEM
Performance
HM
JEM
Rate1...4
Proposals ranked by MOS (per rate point)
+1 credit
-1 credit

18
• "Mean" and "significance-count"
method suggested at least 7
proposals that were obviously
better than JEM
Performance SDR
Pxx 10
Pxx 8
Pxx 8
Pxx 6
Pxx 6
Pxx 6
Pxx 6
Pnn 3
Pnn 3
Pnn 2
Pnn 2
Pnn 1
Pnn 1
JEM 0
Pnn 0
Pnn -1
Pnn -1
Pnn -1
Pnn -2
Pnn -2
Pnn -2
Pnn -3
Pnn -4
HM -36
Pxx 6,53
Pxx 6,46
Pxx 6,41
Pxx 6,37
Pxx 6,33
Pxx 6,33
Pxx 6,26
Pnn 6,23
Pnn 6,17
Pnn 6,15
Pnn 6,13
Pnn 6,11
Pnn 6,04
Pnn 6,04
Pnn 6,03
Pnn 6,03
Pnn 6,01
JEM 6,01
Pnn 6,00
Pnn 5,96
Pnn 5,94
Pnn 5,88
Pnn 5,86
HM 4,57
Mean MOS Significance vs. JEM
60 ... +60

19
• Similar
tendency
in HDR
and 360°
categories
• Mostly same
coding tools
as in SDR
provide good
benefit
Performance HDR / 360°
Mean MOS Signif. vs. JEM
Pxx 6,04
Pxx 6,00
Pxx 5,94
Pxx 5,93
Pxx 5,86
Pnn 5,85
Pnn 5,80
Pnn 5,67
JEM 5,62
Pnn 5,60
Pnn 5,59
Pnn 5,45
Pnn 5,11
HM 4,14
Pxx 7
Pxx 3
Pxx 2
Pxx 2
Pxx 2
Pnn 1
Pnn 1
JEM 0
Pnn 0
Pnn 0
Pnn -1
Pnn -1
Pnn -6
HM -20
32 ... +32
Mean MOS Signif. vs. JEM
Pxx 6,20
Pxx 6,19
Pxx 6,06
Pxx 6,03
Pxx 5,99
Pxx 5,96
Pxx 5,86
Pnn 5,69
Pnn 5,67
Pnn 5,51
Pnn 5,45
JEM 5,11
HM 3,79
Pnn 3,45
Pxx 9
Pxx 9
Pxx 8
Pnn 7
Pxx 7
Pxx 6
Pxx 5
Pxx 4
Pnn 2
Pnn 1
Pnn 1
JEM 0
HM -9
Pnn -12
20 ... +20HDR 360°

20
• Comparison of proposals to HEVC at higher rate points
 Subjective quality of best performing proposals always equal or even better (about 1/3 of cases) than HEVC
at next higher rate point, over all categories (with approx. 40% less rate)
 Subjective quality of best performing proposals always equal or even better (about 1/5 of cases) than HEVC
at second next higher rate point, in SDR-UHD category (with approx. 65% less rate)
• Highest rate point HEVC may be close to transparent quality in many cases, difficult to become better
• Though not always the same proposal performing best at a given rate point, it can be anticipated that merits of
different proposals could be combined
 50% (or more) bit rate reduction with same quality will probably be achievable
Performance compared to HEVC

21
CODING TOOLS

22
• In terms of large architecture: Most proposals similar, no deviation from
hybrid coding mainstream
• Most improvements from further refinements of well-known building
blocks of HEVC and JEM
 Partititioning: Quad/binary, augmented by ternary tree and finer
 Intra prediction using
 directional modes, DC and planar
 sample smoothing with various adaptation
 inheritance of chroma modes and chroma sample prediction from luma
 Inter prediction: advanced motion vector prediction, affine models, sub-
block partitioning, switchable primary transforms, mostly DCT/DST variants
 Secondary transforms targeting specific cases of prediction residual
characteristics
 Adaptive loop filter based on local classification, some new variants
 Quantization / context-adaptive arithmetic coding
CfP analysis: What was proposed?

23
• New elements (some come with high complexity):
 Decoder side estimation for mode/MV derivation and sample prediction both in intra and inter coding (JEM)
 Finer partitioning: Asymmetric, geometric
 Neural networks for prediction, loop filtering, upsampling, (encoder control)
 Additional elements using template matching
 Intra block copy / current picture referencing
 Additional non-linear, de-noising and statistics-based loop filters
 Additional linear and non-linear elements in prediction
• HDR specific:
 New adaptive reshaping and quantization, also in-loop
 HDR-specific modifications of existing tools, e.g. deblocking
• 360-video specific:
 Variants of projection formats, geometry-corrected face boundary padding
 Modification and disabling of existing tools at face boundaries
CfP analysis: What was proposed?

24
• VVC Working Draft 1 / Test Model 1 (VTM1): basic approach
• VTM Block structure
 Unified tree (coding block unites prediction and transform)
 CTU size 128x128, rectangular blocks (dyadic sizes),
smallest luma size 4x4
 Maximum transform size 64x64
• VTM: Some removed elements of HEVC:
 Mode dependent transform (DST-VII), mode dependent scan
 Strong intra smoothing
 Sign data hiding in transform coding
 Unnecessary high-level syntax (e.g. VPS)
 Tiles and wavefront
 Quantization weighting
VVC Test Model and Benchmark Set
• Benchmark Set defined in addition to
VTM, including the following well-known
JEM tools:
• 65 intra prediction modes
• Coefficient coding
• AMT + 4x4 NSST
• Affine motion
• GALF
• Subblock merge candidate (ATMVP)
• Adaptive motion vector precision
• Decoder motion vector refinement
• LM Chroma mode
Purpose: testing benefit of technology
against better performing set

25
• Prediction block partitioning of a 2N×2N CB
• Transform block partitioning of a CB
 Quadtree partitioning of CB → Residual Quad Tree (RQT)
 Transform size 4×4 to 32×32
 TB size 4×4 to 64×64
 PB boundaries inside TBs allowed
HEVC: Prediction Blocks (PBs) and Transform Blocks (TBs)

27
• Simple ternary-tree split was used in several proposals, can be alternated
with binary split
• Further proposed variants of partitioning included
 Asymmetric binary split modes
 Diagonal and geometric (wedge-shaped) split modes
Block Partitioning: Quadtree – Ternary Tree – Binary Tree
Example:
(source: JVET-J1002)

28
TOOLS PROPOSED BY RWTH

29
• Motivation: Towards object-oriented coding
 Follow object boundaries more closely
 Less coding artifacts where it matters
• Prediction, transform and coding driven by actual object
shape under RD-constraint
 Inter- and intra-predicted segments for handling of
disocclusions
 Overlapped wedge based filtering at partition boundary
 Shape-adaptive DCT for spatially localized transform
coding
RWTH Proposal: Geometric Partitioning (GEO)
Source: M. Bläser, J. Sauer, and M. Wien, “Description of SDR and 360o video coding technology proposal
by RWTH Aachen University,” Doc. JVET-J0023, Joint Video Experts Team of ITU-T VCEG and ISO/IEC MPEG, San Diego, USA, 10th meeting, Apr. 2018

30
• GEO available for all block sizes ≥ 8×8 luma samples
• Partitioning is represented by two coordinate points 𝑃0 and 𝑃1 on the block boundary
• Prediction of two coordinate points 𝑃0 and 𝑃1 from 16 pre-defined templates (scaled for non-square blocks)
 Alternative: Spatial or temporal prediction
 Refinement: block size dependent offset
• Integration with AMVP, MERGE, FRUC
(no AFFINE (yet))
GEO: Partitioning Coding and Prediction

31
• No transform-tree in JEM 7.0  localization of residual error for larger blocks required
• ΔSA-DCT adapted from MPEG 4 software for blocks up to 128×128
• Currently floating point implementation – integer transform targeted
• SA-DCT signaled as additional transform choice next to full block DCT ( 4 total GEO transform modes)
• Coding of transform coefficients (TSBs, significance flags) with regard to shape
GEO: Shape-Adaptive DCT for Geometric Partitions
Segment with high prediction error
Segment with low prediction error
Example of 64×32 residual block

32
Results for GEO
JEM 7.0 JEM 7.0 + GEO
• Visual improvements at object boundaries
 Sharper contours
 Less staircase-effect
 More background details
• Objective gains (BD-rate savings)
 Against HEVC: ~33% on C1, ~25% on C2
 Against JEM: ~0.8% for both, C1 and C2
JEM 7.0

33
Results for GEO
JEM 7.0 JEM 7.0 + GEO
• Visual improvements at object boundaries
 Sharper contours
 Less staircase-effect
 More background details
 Against HEVC: ~33% on C1, ~25% on C2
 Against JEM: ~0.8% for both, C1 and C2
JEM 7.0 + GEO

34
• Motivation: Special characteristics of 360 content
 360° symmetry not exploited by current codecs
 Motion across face boundaries possible
 Geometric distortions
 Motion compensation suboptimal
 Not correctly treated by loop filters
 Here shown for cube, but similar problems for
other coding formats
• Proposal: Doing things correctly that “broke” for
360°content
 Face extension for motion estimation and
compensation
 Loop filtering over continuous boundaries
according to 3D arrangement
RWTH Proposal: 360°Coding Tools
Source: M. Bläser, J. Sauer, and M. Wien, “Description of SDR and 360o video coding technology proposal by RWTH Aachen University,”
Doc. JVET-J0023, Joint Video Experts Team of ITU-T VCEG and ISO/IEC MPEG, San Diego, USA, 10th meeting, Apr. 2018

35
360°coding tools - Face extension for cube projection (EAC/CMP/ACP)
𝑯 𝐵2𝐴 =
0 0 𝑓2
0 𝑓 0
−1 0 0
𝑓 =
face width
2
• Approach can be transferred to other coding formats

36
• Reference samples of blocks at face
boundaries changed
 Original: Samples from top or left block are
used
 Modified: Samples are chosen according to
3D cube geometry
• Approach can be transferred to other coding
formats
360°coding tools - Corrected deblocking filter (DBF)

37
 Against HEVC anchor: ~31%
E2E WS-PSNR
 Agains JEM (same projection
format): ~1.6%
 Gains higher for sequences with
high motion
Results for 360°coding tools
JEM deblocking Proposed deblocking

38
SUMMARY AND OUTLOOK

39
• Report of Results from the Call for Proposals on Video Compression with Capability beyond HEVC
[JVET-J1003]
 Documentation of results per sequence, marking HM and JEM anchors, not identifying individual proponents
 Assessment of qualitative (and as far as possible quantitative) benefit of submitted technology compared to
anchors
• Working Draft 1 of Versatile Video Coding [JVET-J1001]
 "Reduced" HEVC plus quad/binary/ternary tree structure
• Test Model 1 of Versatile Video Coding (VTM 1) [JVET-J1002]
 Corresponding encoder and algorithm description
Documents issued after CfP Results

40
• CE1: Partitioning
• CE2: In-loop filters
• CE3: Intra prediction and mode coding
• CE4: Inter prediction and MV coding
• CE5: Arithmetic coding engine
• CE6: Transforms and transform signalling
• CE7: Quantization and coefficient coding
• CE8: Current picture referencing
• CE9: Decoder side MV derivation
• CE10: Combined and multi-hypothesis prediction
• CE11: Composite reference pictures
• CE12: Mapping for HDR content
• CE13: Projection formats
Core Experiments defined by JVET

41
• Call for Proposals demonstrated availability of significant compression benefit
 HEVC out-performed by virtually all proposals
 Subjective results suggest initial rate savings of 40+% over HEVC at starting point
• Versatile Video Coding (VVC): First Working Draft and Test Model defined
 Reduced initial tool set
 Step-by-step integration of tools
 Evaluation of concurring variants of tools
 Consideration of algorithmic complexity
 Further fast progress expected, goal: finalization 2020
Summary and Outlook

42
• Document archives (publicly accessible)
 JVET / VVC:
 http://phenix.it-sudparis.eu/jvet
 http://ftp3.itu.ch/av-arch/jvet-site
 JCT-VC / HEVC:
 http://phenix.it-sudparis.eu/jct
 http://ftp3.itu.ch/av-arch/jctvc-site
• Software for HEVC, JEM, and 360 Video (publicly accessible):
 https://jvet.hhi.fraunhofer.de/svn/svn_VVCSoftware_VTM
 https://jvet.hhi.fraunhofer.de/svn/svn_VVCSoftware_BMS
 https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/
 https://jvet.hhi.fraunhofer.de/svn/svn_360Lib/
 https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
Further Information

Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360° Video

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360° Video

Similar to Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360° Video (20)

More from Förderverein Technische Fakultät

More from Förderverein Technische Fakultät (20)

Recently uploaded

Recently uploaded (20)

Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR and 360° Video