Perceptually-aware Per-title Encoding for Adaptive Video Streaming

Perceptually-aware Per-title Encoding for Adaptive Video
Streaming
Vignesh V Menon1, Hadi Amirpour1, Mohammad Ghanbari1,2, and Christian Timmerer1
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
School of Computer Science and Electronic Engineering, University of Essex, UK
19 July 2022
Vignesh V Menon Perceptually-aware Per-title Encoding for Adaptive Video Streaming 1

Outline
1 Introduction
2 PPTE
3 Results
4 Conclusion

Introduction
Per-title Encoding
In HAS, each video is encoded at a fixed set of bitrate-resolution pairs, referred to as bitrate
ladder.
The “one-size-fits-all” can be optimized per title to increase the Quality of Experience
(QoE) or decrease the bitrate of the representations as introduced for VoD services.1
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
Dolls-540p
Dolls-1080p
Park-540p
Park-1080p
Figure: Rate-Distortion (RD) curves using VMAF as the quality metric of Dolls and Park sequences of
MCML dataset encoded at 540p and 1080p resolutions.
1
J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016,
pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605.

Introduction
Motivation for Percetually-aware Per-title Encoding
The selection of bitrate-resolution pairs (i.e., (rt, bt) where t ≥ 0) from the convex-hull is
a challenging task.
The increased number of selected bitrate-resolution pairs for the bitrate ladder may improve
QoE, but leads to an increase in storage and bandwidth requirements.2
Furthermore, the selected bitrate-resolution pairs from the convex-hull for the bitrate ladder
may not always be perceptually different in video quality.
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
50
60
70
80
90
100
VMAF
360p
432p
540p
720p
1080p
1440p
2160p
Figure: The HLS bitrate ladder of Characters sequence of MCML dataset.
2
Tianchi Huang et al. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey: Association for Computing
Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url: https://doi.org/10.1145/3458306.3458873.

Introduction
Target of the paper
b0 b1 b2 b3 b4 b5 b6
Bitrate
v0
v1=v0 + vJ(v0)
v2=v1 + vJ(v1)
v3=v2 + vJ(v2)
v4=v3 + vJ(v3)
v5=v4 + vJ(v4)
v6=v5 + vJ(v5)
VMAF
vmax
r1
r0
r2
r3
r4
r5
r6
Figure: The ideal bitrate ladder envisioned in this paper. The blue line denotes the corresponding
rate-distortion curve, while the red dotted line denotes VMAF=vmax . When the VMAF value is greater
than vmax , the video stream is deemed to be perceptually lossless.

PPTE
Perceptually-aware Per-Title Encoding (PPTE)
Input Title
Feature Extraction
Bitrate Ladder Prediction
Resolutions (R)
Average JND (vJ)
Bitrate Range
{bmin, bmax }
Maximum
VMAF {vmax }
Per-title Encoding
Segments
(E,h) pairs
(r, b) pairs
Figure: PPTE architecture.

PPTE Phase 1: Feature Extraction
PPTE
Phase 1: Feature Extraction
Compute texture energy per block
A DCT-based energy function is used to determine the block-wise feature of each frame
defined as:
Hk =
w−1
X
i=0
w−1
X
j=0
e|( ij
wh
)2−1|
|DCT(i, j)| (1)
where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when
i + j > 0, and 0 otherwise.
The energy values of blocks in a frame is averaged to determine the energy per frame.3
E =
C−1
X
k=0
Hp,k
C · w2
(2)
3
Michael King et al. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on Multimedia and Expo. 2007,
pp. 1647–1650. doi: 10.1109/ICME.2007.4284983.

PPTE
Phase 1: Feature Extraction
hp: SAD of the block level energy values of frame p to that of the previous frame p − 1.
hp =
C−1
X
k=0
| Hp,k, Hp−1,k |
C · w2
(3)
where C denotes the number of blocks in frame p.
Latency
Speed of feature extraction = 370fps for UHD video with 8 CPU threads and x86 SIMD
optimization

PPTE
Phase 2: Bitrate ladder Prediction
Step 1: b0 = bmin
vr,b0 = A0,r log
q
h
E · b2
0

+ A1,r
v0 = max(vr,b0 )
r0 = arg maxr∈R(vr,b0 )
(r0, b0) is the first point of the bitrate ladder
A0,r and A1,r
Parameters trained using linear regression

PPTE
Phase 2: Bitrate ladder Prediction
Step 2: t = 1
for t ≥ 1 do
vt = vt−1 + vJ(vt−1)
br,vt =
r
q
E
h e
vt −A1,r
A0,r

bt = min(br,vt )
rt = arg minr∈R(br,vt )
if bt bmax or vt vmax then
End of the algorithm
else
(rt, bt) is the (t + 1)th
point of the bitrate ladder.
t = t + 1

Results
Results
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
VMAF
HLS Ladder
Proposed Scheme
(a) IntoTree
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
20
40
60
80
VMAF
HLS Ladder
Proposed Scheme
(b) DaylightRoad2
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS Ladder
Proposed Scheme
(c) TreeShade
Figure: Comparison of RD curves for encoding the IntoTree, DaylightRoad2, and TreeShade sequences
using the HLS bitrate ladder and PPTE.

Results
Results
40 50 60 70
E
10
20
30
40
50
h
10
15
20
25
30
35
40
45
50
|
S|
Figure: ∆S results for various values of E and
h.
40 50 60 70
E
10
20
30
40
50
h
5
10
15
20
25
30
35
40
|BDR
V
|
Figure: Bjøntegaard delta rate w.r.t VMAF
(BDRV ) results for various values of E and h.
∆S = 1 −
P
bopt
P
bref
(4)
where bref and bopt represent the sum of bitrates of all representations in the fixed bitrate
ladder and the optimized bitrate ladder, respectively.

Results
Results
Table: Results of PPTE against HLS bitrate ladder.
Dataset Video SI TI E h BDRV BDRP ∆S Avg. JND
JVET4 DaylightRoad2 40.51 16.21 54.78 20.35 -23.84% -10.88% -40.32% 6.99
JVET FoodMarket4 38.26 17.68 60.61 22.67 -19.22% -6.21% -28.13% 6.72
MCML5 Characters 50.43 29.85 42.66 21.06 -74.60% -71.70% -53.69% 3.82
MCML Crowd 33.76 10.13 56.74 15.89 -30.12% -15.63% -31.06% 7.85
MCML Lake 42.04 11.84 47.89 21.11 -38.00% -0.37% -44.83% 5.03
MCML Park 22.63 8.17 40.55 9.22 -10.47% -10.50% -15.35% 6.28
SJTU6 Fountains 43.37 11.42 63.30 26.83 -32.73% -2.18% -29.65% 5.80
SJTU RushHour 29.14 16.21 56.12 25.11 -20.50% -7.34% -42.73% 6.92
SJTU TrafficFlow 33.57 13.8 56.64 28.00 -53.34% -42.89% -44.83% 5.95
SJTU TreeShade 52.88 5.29 60.24 11.31 -48.38% -39.02% -31.06% 6.74
VGEG7 IntoTree 324.41 12.09 45.77 30.94 -26.23% -7.08% -40.32% 4.92
VGEG OldTownCross 29.66 11.62 50.31 27.64 -33.77% -25.07% -28.13% 5.86
VGEG ParkJoy 62.78 27.00 76.32 41.10 -15.68% -2.39% -18.16% 5.19
Average -27.02% -16.47% -30.69% 5.85
*These sequences were used for training.
4
Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018.
5
Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE
Transactions on Circuits and Systems for Video Technology 28.7 (2018), pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504.
6
L. Song et al. “The SJTU 4K Video Sequence Dataset”. In: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013) (July 2013).
7
European Broadcasting Union (EBU). “The SVT High Definition Multi Format Test Set”. In: Feb. 2006. url:
https://tech.ebu.ch/docs/hdtv/svt-multiformat-conditions-v10.pdf.

Conclusion
Conclusion
This paper proposed a perceptually-aware online per-title encoding (PPTE) scheme for live
streaming applications.
PPTE includes an algorithm that predicts the optimal resolution-bitrate pairs for every video
segment based on JND in visual quality perception.
Live streaming using PPTE requires 16.47% fewer bits to maintain the same PSNR and
27.02% fewer bits to maintain the same VMAF compared to the reference HLS bitrate
ladder.
The improvement in the compression efficiency is achieved with an average storage reduc-
tion of 30.69% compared to the reference HLS bitrate ladder.

Q A
Q A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)

Perceptually-aware Per-title Encoding for Adaptive Video Streaming

Recommended

Recommended

More Related Content

Similar to Perceptually-aware Per-title Encoding for Adaptive Video Streaming

Similar to Perceptually-aware Per-title Encoding for Adaptive Video Streaming (20)

More from Alpen-Adria-Universität

More from Alpen-Adria-Universität (20)

Recently uploaded

Recently uploaded (20)

Perceptually-aware Per-title Encoding for Adaptive Video Streaming