In live streaming applications, typically a fixed set of bitrate-resolution pairs (known as bitrate ladder) is used for simplicity and efficiency in order to avoid the additional encoding run-time required to find optimum resolution-bitrate pairs for every video content. However, an optimized bitrate ladder may result in (i) decreased storage or delivery costs or/and (ii) increased Quality of Experience (QoE). This paper introduces a perceptually-aware per-title encoding (PPTE) scheme for video streaming applications. In this scheme, optimized bitrate-resolution pairs are predicted online based on Just Noticeable Difference (JND) in quality perception to avoid adding perceptually similar representations in the bitrate ladder. To this end, Discrete Cosine Transform(DCT)-energy-based low-complexity spatial and temporal features for each video segment are used. Experimental results show that, on average, PPTE yields bitrate savings of 16.47% and 27.02% to maintain the same PSNR and VMAF, respectively, compared to the reference HTTP Live Streaming (HLS) bitrate ladder without any noticeable additional latency in streaming accompanied by a 30.69% cumulative decrease in storage space for various representations.
Perceptually-aware Per-title Encoding for Adaptive Video Streaming
1. Perceptually-aware Per-title Encoding for Adaptive Video
Streaming
Vignesh V Menon1, Hadi Amirpour1, Mohammad Ghanbari1,2, and Christian Timmerer1
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
School of Computer Science and Electronic Engineering, University of Essex, UK
19 July 2022
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 1
2. Outline
1 Introduction
2 PPTE
3 Results
4 Conclusion
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 2
3. Introduction
Per-title Encoding
In HAS, each video is encoded at a fixed set of bitrate-resolution pairs, referred to as bitrate
ladder.
The “one-size-fits-all” can be optimized per title to increase the Quality of Experience
(QoE) or decrease the bitrate of the representations as introduced for VoD services.1
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
Dolls-540p
Dolls-1080p
Park-540p
Park-1080p
Figure: Rate-Distortion (RD) curves using VMAF as the quality metric of Dolls and Park sequences of
MCML dataset encoded at 540p and 1080p resolutions.
1
J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016,
pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 3
4. Introduction
Motivation for Percetually-aware Per-title Encoding
The selection of bitrate-resolution pairs (i.e., (rt, bt) where t ≥ 0) from the convex-hull is
a challenging task.
The increased number of selected bitrate-resolution pairs for the bitrate ladder may improve
QoE, but leads to an increase in storage and bandwidth requirements.2
Furthermore, the selected bitrate-resolution pairs from the convex-hull for the bitrate ladder
may not always be perceptually different in video quality.
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
50
60
70
80
90
100
VMAF
360p
432p
540p
720p
1080p
1440p
2160p
Figure: The HLS bitrate ladder of Characters sequence of MCML dataset.
2
Tianchi Huang et al. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey: Association for Computing
Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url: https://doi.org/10.1145/3458306.3458873.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 4
5. Introduction
Target of the paper
b0 b1 b2 b3 b4 b5 b6
Bitrate
v0
v1=v0 + vJ(v0)
v2=v1 + vJ(v1)
v3=v2 + vJ(v2)
v4=v3 + vJ(v3)
v5=v4 + vJ(v4)
v6=v5 + vJ(v5)
VMAF
vmax
r1
r0
r2
r3
r4
r5
r6
Figure: The ideal bitrate ladder envisioned in this paper. The blue line denotes the corresponding
rate-distortion curve, while the red dotted line denotes VMAF=vmax . When the VMAF value is greater
than vmax , the video stream is deemed to be perceptually lossless.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 5
6. PPTE
Perceptually-aware Per-Title Encoding (PPTE)
Input Title
Feature Extraction
Bitrate Ladder Prediction
Resolutions (R)
Average JND (vJ)
Bitrate Range
{bmin, bmax }
Maximum
VMAF {vmax }
Per-title Encoding
Segments
(E,h) pairs
(r, b) pairs
Figure: PPTE architecture.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 6
7. PPTE Phase 1: Feature Extraction
PPTE
Phase 1: Feature Extraction
Compute texture energy per block
A DCT-based energy function is used to determine the block-wise feature of each frame
defined as:
Hk =
w−1
X
i=0
w−1
X
j=0
e|( ij
wh
)2−1|
|DCT(i, j)| (1)
where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when
i + j > 0, and 0 otherwise.
The energy values of blocks in a frame is averaged to determine the energy per frame.3
E =
C−1
X
k=0
Hp,k
C · w2
(2)
3
Michael King et al. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on Multimedia and Expo. 2007,
pp. 1647–1650. doi: 10.1109/ICME.2007.4284983.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 7
8. PPTE Phase 1: Feature Extraction
PPTE
Phase 1: Feature Extraction
hp: SAD of the block level energy values of frame p to that of the previous frame p − 1.
hp =
C−1
X
k=0
| Hp,k, Hp−1,k |
C · w2
(3)
where C denotes the number of blocks in frame p.
Latency
Speed of feature extraction = 370fps for UHD video with 8 CPU threads and x86 SIMD
optimization
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 8
9. PPTE Phase 1: Feature Extraction
PPTE
Phase 2: Bitrate ladder Prediction
Step 1: b0 = bmin
vr,b0 = A0,r log
q
h
E · b2
0
+ A1,r
v0 = max(vr,b0 )
r0 = arg maxr∈R(vr,b0 )
(r0, b0) is the first point of the bitrate ladder
A0,r and A1,r
Parameters trained using linear regression
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 9
10. PPTE Phase 1: Feature Extraction
PPTE
Phase 2: Bitrate ladder Prediction
Step 2: t = 1
for t ≥ 1 do
vt = vt−1 + vJ(vt−1)
br,vt =
r
q
E
h e
vt −A1,r
A0,r
bt = min(br,vt )
rt = arg minr∈R(br,vt )
if bt bmax or vt vmax then
End of the algorithm
else
(rt, bt) is the (t + 1)th
point of the bitrate ladder.
t = t + 1
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 10
11. Results
Results
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
VMAF
HLS Ladder
Proposed Scheme
(a) IntoTree
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
20
40
60
80
VMAF
HLS Ladder
Proposed Scheme
(b) DaylightRoad2
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS Ladder
Proposed Scheme
(c) TreeShade
Figure: Comparison of RD curves for encoding the IntoTree, DaylightRoad2, and TreeShade sequences
using the HLS bitrate ladder and PPTE.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 11
12. Results
Results
40 50 60 70
E
10
20
30
40
50
h
10
15
20
25
30
35
40
45
50
|
S|
Figure: ∆S results for various values of E and
h.
40 50 60 70
E
10
20
30
40
50
h
5
10
15
20
25
30
35
40
|BDR
V
|
Figure: Bjøntegaard delta rate w.r.t VMAF
(BDRV ) results for various values of E and h.
∆S = 1 −
P
bopt
P
bref
(4)
where bref and bopt represent the sum of bitrates of all representations in the fixed bitrate
ladder and the optimized bitrate ladder, respectively.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 12
13. Results
Results
Table: Results of PPTE against HLS bitrate ladder.
Dataset Video SI TI E h BDRV BDRP ∆S Avg. JND
JVET4 DaylightRoad2 40.51 16.21 54.78 20.35 -23.84% -10.88% -40.32% 6.99
JVET FoodMarket4 38.26 17.68 60.61 22.67 -19.22% -6.21% -28.13% 6.72
MCML5 Characters 50.43 29.85 42.66 21.06 -74.60% -71.70% -53.69% 3.82
MCML Crowd 33.76 10.13 56.74 15.89 -30.12% -15.63% -31.06% 7.85
MCML Lake 42.04 11.84 47.89 21.11 -38.00% -0.37% -44.83% 5.03
MCML Park 22.63 8.17 40.55 9.22 -10.47% -10.50% -15.35% 6.28
SJTU6 Fountains 43.37 11.42 63.30 26.83 -32.73% -2.18% -29.65% 5.80
SJTU RushHour 29.14 16.21 56.12 25.11 -20.50% -7.34% -42.73% 6.92
SJTU TrafficFlow 33.57 13.8 56.64 28.00 -53.34% -42.89% -44.83% 5.95
SJTU TreeShade 52.88 5.29 60.24 11.31 -48.38% -39.02% -31.06% 6.74
VGEG7 IntoTree 324.41 12.09 45.77 30.94 -26.23% -7.08% -40.32% 4.92
VGEG OldTownCross 29.66 11.62 50.31 27.64 -33.77% -25.07% -28.13% 5.86
VGEG ParkJoy 62.78 27.00 76.32 41.10 -15.68% -2.39% -18.16% 5.19
Average -27.02% -16.47% -30.69% 5.85
*These sequences were used for training.
4
Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018.
5
Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE
Transactions on Circuits and Systems for Video Technology 28.7 (2018), pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504.
6
L. Song et al. “The SJTU 4K Video Sequence Dataset”. In: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013) (July 2013).
7
European Broadcasting Union (EBU). “The SVT High Definition Multi Format Test Set”. In: Feb. 2006. url:
https://tech.ebu.ch/docs/hdtv/svt-multiformat-conditions-v10.pdf.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 13
14. Conclusion
Conclusion
This paper proposed a perceptually-aware online per-title encoding (PPTE) scheme for live
streaming applications.
PPTE includes an algorithm that predicts the optimal resolution-bitrate pairs for every video
segment based on JND in visual quality perception.
Live streaming using PPTE requires 16.47% fewer bits to maintain the same PSNR and
27.02% fewer bits to maintain the same VMAF compared to the reference HLS bitrate
ladder.
The improvement in the compression efficiency is achieved with an average storage reduc-
tion of 30.69% compared to the reference HLS bitrate ladder.
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 14
15. Q A
Q A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 15