SlideShare a Scribd company logo
1 of 53
Download to read offline
Content-adaptive Video Coding for HTTP Adaptive Streaming
Vignesh V Menon
Alpen-Adria-Universität, Klagenfurt, Austria
Supervisor : Univ.-Prof. DI Dr. Christian Timmerer, Alpen-Adria-Universität, Klagenfurt, Austria
Advisor : Assoc.-Prof. DI Dr. Klaus Schoeffmann, Alpen-Adria-Universität, Klagenfurt, Austria
Date : Jan 15, 2024
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 1
Outline
1 Introduction
2 Video complexity analysis
3 Online per-title encoding
4 Live variable bitrate encoding
5 Conclusions and Future Directions
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 2
Introduction
Introduction
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 3
Introduction
Introduction
HTTP Adaptive Streaming (HAS)1
Source: https://bitmovin.com/adaptive-streaming/
Why Adaptive Streaming?
Adapt for a wide range of devices.
Adapt for a broad set of Internet speeds.
1
A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019),
pp. 562–585.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 4
Introduction
Introduction
HTTP Adaptive Streaming (HAS)
Network
Bandwidth
Time
Display
Received
representation
HTTP server
Bitrate
ladder
Encoders
.
.
.
.
.
.
.
.
.
.
.
.
Encoded
representations
bitrate
increase
Video source Input video
segment
Figure: HTTP adaptive streaming (HAS) concept.
What HAS does?
Each source video is split into segments.
Encoded at multiple bitrates, resolutions, and codecs.
Delivered to the client based on the device capability, network speed etc.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 5
Introduction
Research questions
1 How to adapt the speed and compression performance of the video encoder based on
content complexity?
Determine the video’s content-adaptive spatial and temporal features, which are used to influ-
ence encoder decisions like slice-type, quantization parameter, block partitioning, etc.
By leveraging CAE algorithms, adaptive bitrate encoding, and intelligent analysis, an encoder’s
speed, and compression performance can be effectively adapted to the complexity of the video
content, leading to optimized encoding results.2
2 How to improve the compression efficiency of bitrate ladder encoding in live-streaming
applications?
Minimize the time to compute the convex hull for each title by analyzing the video content
complexity features.3
Dynamically configure the encoding parameters on the fly to sustain a target encoding speed
according to the content for efficient live streaming.4
2
Sriram Sethuraman, Nithya V. S., and Venkata Narayanababu Laveti D. “Non-iterative Content-Adaptive Distributed Encoding Through ML Techniques”. In:
SMPTE 2017 Annual Technical Conference and Exhibition. 2017, pp. 1–8. doi: 10.5594/M001783. url: https://doi.org/10.5594/M001783.
3
J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016,
pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605. url: https://doi.org/10.1109/ICIP.2016.7532605.
4
Pradeep Ramachandran et al. “Content adaptive live encoding with open source codecs”. In: Proceedings of the 11th ACM Multimedia Systems Conference.
May 2020, pp. 345–348. doi: 10.1145/3339825.3393580. url: https://doi.org/10.1145/3339825.3393580.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 6
Introduction Research questions
Research questions
3 How to provide fast and rate-efficient multi-bitrate and multi-resolution bitrate ladder en-
coding in adaptive streaming applications?
Sharing encoder analysis information such as motion vectors, scene complexity, and frame
types across different representations can avoid redundant calculations and minimize encoding
operations, improving encoding efficiency and reducing computational overhead5,
.6
Facilitate adaptive streaming optimizations on the server side, enabling efficient resource allo-
cation and bandwidth management.
5
J. De Praeter et al. “Fast simultaneous video encoder for adaptive streaming”. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing
(MMSP). Oct. 2015, pp. 1–6. doi: 10.1109/MMSP.2015.7340802. url: https://doi.org/10.1109/MMSP.2015.7340802.
6
Vignesh V Menon et al. “EMES: Efficient Multi-Encoding Schemes for HEVC-Based Adaptive Bitrate Streaming”. In: ACM Trans. Multimedia Comput.
Commun. Appl. New York, NY, USA: Association for Computing Machinery, Dec. 2022. doi: 10.1145/3575659. url: https://doi.org/10.1145/3575659.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 7
Introduction Target of this study
Target of this study
Representations
Target encoder/ codec
Video complexity
feature extraction
Set of resolutions
Maximum bitrate
Optimized
bitrate ladder
and encoding
parameters
prediction
Features
Minimum bitrate
Target JND function
Maximum quality
Set of framerates
Input video
Scene
detection
Scenes
Target encoding speed
Encoders
Encoding parameters
RQ 1, 2
RQ 1
RQ 3
RQ 2
Bitrate ladder
Figure: The ideal video compression system for HAS targeted in this dissertation.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 8
Introduction Thesis organization
Thesis organization
Video complexity analysis
RQ 1, 2
Content-adaptive encoding
optimizations
RQ 1
Online per-title encoding
optimizations
RQ 1, 2
Multi-encoding optimizations
RQ 3
Chapter 2 Video complexity analyzer
Chapter 3
Scene detection algorithm
Fast intra CU depth
prediction algorithm
Chapter 4
Online resolution
prediction scheme
Online framerate
prediction scheme
Online encoding preset
prediction scheme
Just noticeable difference
aware bitrate ladder
prediction scheme
Chapter 5
Live variable bitrate
encoding scheme
Chapter 6
Efficient multi-encoding
schemes
Contribution
class 1
Contribution
class 2
Contribution
class 3
Contribution
class 4
Improve
compression
efficiency
Improve
encoding
speed
Contribution
1
Contribution
2
Contribution
3
Contribution
4
Contribution
5
Contribution
6
Contribution
7
Contribution
8
Contribution
9
Figure: Thesis organization.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 9
Video complexity analysis
Video complexity analysis
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 10
Video complexity analysis Introduction
Introduction
Video complexity analysis is a critical step for numerous applications.
Content-based retrieval: in multimedia archives, surveillance systems, and digital li-
braries7
Video summarization: in video browsing, news aggregation, and event documentation8,9
Action recognition and scene understanding: in domains ranging from sports analytics
and surveillance to robotics and human-computer interaction10
Quality assessment: in streaming services, video conferencing, and multimedia content
distribution11,12
7
Wei Jiang et al. “Similarity-based online feature selection in content-based image retrieval”. In: IEEE Transactions on Image Processing. Vol. 15. 3. 2006,
pp. 702–712. doi: 10.1109/TIP.2005.863105. url: https://doi.org/10.1109/TIP.2005.863105.
8
Parul Saini et al. “Video summarization using deep learning techniques: a detailed analysis and investigation”. In: Artificial Intelligence Review. Mar. 2023.
doi: 10.1007/s10462-023-10444-0. url: https://doi.org/10.1007/s10462-023-10444-0.
9
Naveed Ejaz, Tayyab Bin Tariq, and Sung Wook Baik. “Adaptive key frame extraction for video summarization using an aggregation mechanism”. In: Journal
of Visual Communication and Image Representation. Vol. 23. 7. 2012, pp. 1031–1040. doi: https://doi.org/10.1016/j.jvcir.2012.06.013. url:
https://www.sciencedirect.com/science/article/pii/S1047320312001095.
10
N. Barman et al. “No-Reference Video Quality Estimation Based on Machine Learning for Passive Gaming Video Streaming Applications”. In: IEEE Access.
Vol. 7. 2019, pp. 74511–74527. doi: 10.1109/ACCESS.2019.2920477. url: https://doi.org/10.1109/ACCESS.2019.2920477.
11
S. Zadtootaghaj et al. “NR-GVQM: A No Reference Gaming Video Quality Metric”. In: 2018 IEEE International Symposium on Multimedia (ISM). 2018,
pp. 131–134. doi: 10.1109/ISM.2018.00031. url: https://doi.org/10.1109/ISM.2018.00031.
12
S. Göring, R. Rao, and A. Raake. “nofu — A Lightweight No-Reference Pixel Based Video Quality Model for Gaming Content”. In: 2019 Eleventh
International Conference on Quality of Multimedia Experience (QoMEX). 2019, pp. 1–6. doi: 10.1109/QoMEX.2019.8743262. url:
https://doi.org/10.1109/QoMEX.2019.8743262.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 11
Video complexity analysis State-of-the-art features
State-of-the-art features13
Spatial information (SI)
SI is a prominent indicator, portraying the peak spatial intricacy present within a video.
SI = max{std[Sobel(F(p))]} (1)
Temporal information (TI)
TI manifests as the maximum temporal variance observable between consecutive frames in a
video sequence.
D(p) = F(p)) − F(p − 1)) (2)
TI = max{std[D(p)]} (3)
13
ITU-T. “P.910 : Subjective video quality assessment methods for multimedia applications”. In: Nov. 2021. url:
https://www.itu.int/rec/T-REC-P.910-202111-I/en.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 12
Video complexity analysis Proposed features
Proposed features
The luma and chroma brightness of non-overlapping blocks k of size w × w pixels for each
frame p is defined as:
Lc,p,k =
p
DCTc(1, 1) ∀c ∈ [0, 2] (4)
A DCT-energy function is introduced to determine the luma and chroma texture of every non-
overlapping block k in each frame p, which is defined as:
Hc,p,k =
w
X
i=1
w
X
j=1
e[( ij
w2 )2−1]
|DCTc(i − 1, j − 1)| ∀c ∈ [0, 2] (5)
where DCTc(i, j) is the (i, j)th DCT coefficient when i + j > 2, and 0 otherwise.14 The block-
wise texture per frame is averaged to determine the luma and chroma texture features (EY, EU,
EV) per video segment.
14
Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone,
Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url:
https://doi.org/10.1145/3524273.3532896.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 13
Video complexity analysis Proposed features
Proposed features
The block-wise difference of the luma texture energy of each frame compared to its previous
frame is calculated as:
hp,k =
| HY,p,k − HY,p−1,k |
w2
(6)
(a) original frame (b) EY (c) h (d) LY
Figure: Heatmap depiction of the luma texture information {EY, h, LY } extracted from the second
frame of CoverSong 1080P 0a86 video of Youtube UGC Dataset.15
15
Yilin Wang, Sasi Inguva, and Balu Adsumilli. “YouTube UGC Dataset for Video Compression Research”. In: 2019 IEEE 21st International Workshop on
Multimedia Signal Processing (MMSP). Sept. 2019. doi: 10.1109/mmsp.2019.8901772. url: https://doi.org/10.1109/mmsp.2019.8901772.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 14
Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
EY
0.17 0.24 0.3 0.34
0.86 0.86 0.85 0.84
0.2
0.4
0.6
0.8
(a) x264
QP22 QP27 QP32 QP37
SI
EY
0.18 0.24 0.32 0.37
0.86 0.88 0.87 0.85
0.2
0.4
0.6
0.8
(b) x265
Figure: PCC between the spatial complexity features (SI and EY) and bitrate in All Intra configuration16
with medium preset of x264 and x265 encoders for the VCD dataset.17
Bitrate in AI configuration is considered the spatial complexity’s ground truth.
EY correlates better with the spatial complexity than the state-of-the-art SI feature.
16
F. Bossen. “Common test conditions and software reference configurations”. In: JCTVC-L1100. Vol. 12. 2013, p. 7. url:
http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281.
17
Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. New York, NY,
USA: Association for Computing Machinery, 2022. isbn: 9781450392839. doi: 10.1145/3524273.3532892. url: https://doi.org/10.1145/3524273.3532892.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 15
Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
TI
EY
h
0 0.01 0.02 0.03
0.53 0.53 0.55 0.55
0.59 0.48 0.45 0.4
0.55 0.61 0.72 0.76
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(a) veryslow
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.01 0.01 0.03 0.04
0.54 0.55 0.56 0.57
0.6 0.52 0.47 0.43
0.57 0.64 0.73 0.76 0.1
0.2
0.3
0.4
0.5
0.6
0.7
(b) medium
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.06 0.06 0.08 0.08
0.54 0.56 0.58 0.61
0.67 0.58 0.52 0.47
0.58 0.65 0.73 0.78
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(c) ultrafast
Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with bitrate
in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for the VCD dataset.
The correlation of EY with bitrate increases as QP decreases. Similarly, the correlation of
h with bitrate decreases as QP decreases.
EY and h correlate well with the LDP configuration’s RD complexity and encoding run-time
complexity.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 16
Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.2 0.21 0.22 0.23
0.71 0.72 0.74 0.76
0.47 0.35 0.26 0.21
0.67 0.71 0.77 0.81
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(a) veryslow
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.22 0.23 0.26 0.28
0.66 0.68 0.73 0.77
0.41 0.33 0.28 0.27
0.59 0.66 0.74 0.78 0.3
0.4
0.5
0.6
0.7
(b) medium
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.25 0.28 0.32 0.35
0.69 0.71 0.73 0.73
0.5 0.43 0.38 0.35
0.61 0.62 0.64 0.63 0.3
0.4
0.5
0.6
0.7
(c) ultrafast
Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with
encoding time in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for
the VCD dataset.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 17
Video complexity analysis Analysis of features
Analysis of features
2160p 1080p 720p
2160p
1080p
720p
1 0.43 0.45
0.43 1 0.82
0.45 0.82 1
0.5
0.6
0.7
0.8
0.9
1.0
(a) SI
2160p 1080p 720p
2160p
1080p
720p
1 0.94 0.91
0.94 1 0.99
0.91 0.99 1
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
(b) EY
Figure: PCC between the spatial complexity features across multiple resolutions for the VCD dataset.
EY exhibits better correlation across resolutions, facilitating optimizations, including computa-
tions in lower resolutions.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 18
Video complexity analysis Performance optimizations
Performance optimizations18
1 x86 SIMD optimization
2 Multi-threading optimization
3 Low-pass analysis optimization
SITI VCA
no opt
VCA
SIMD opt
VCA
SIMD opt +
2 threads
VCA
SIMD opt +
4 threads
VCA
SIMD opt +
8 threads
VCA
SIMD opt +
8 threads +
low-pass DCT opt
0
50
100
150
200
250
300
350
Speed
(in
fps)
Figure: Speed of the proposed video complexity analysis using various performance optimizations.
18
Vignesh V Menon et al. “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming”. In: Proceedings of the First International
Workshop on Green Multimedia Systems. GMSys ’23. Vancouver, BC, Canada: Association for Computing Machinery, 2023, 16–18. isbn: 9798400701962. doi:
10.1145/3593908.3593942. url: https://doi.org/10.1145/3593908.3593942.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 19
Online per-title encoding
Online per-title encoding
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 20
Online per-title encoding Dynamic resolution encoding
Dynamic resolution encoding
Dynamic resolution per-title encoding schemes are based on the fact that one resolution
performs better than others in a scene for a given bitrate range, and these regions depend
on the video complexity.
Dynamic resolution prediction seeks to strike an equilibrium between delivering optimal
visual quality and conserving bandwidth resources.
The adaptive streaming system can anticipate the ideal resolution for each segment in real-
time by harnessing predictive models, often based on historical data19 or machine learning
techniques.20
Dynamic resolution optimization approaches developed in the industry– from Bitmovin,21
MUX,22 and CAMBRIA23 are proprietary.
19
Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In:
2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775.
20
Madhukar Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “Combining Video Quality Metrics To Select Perceptually Accurate Resolution In A Wide Quality
Range: A Case Study”. In: 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 2164–2168. doi: 10.1109/ICIP42928.2021.9506310.
21
Gernot Zwantschko. “What is Per-Title Encoding? How to Efficiently Compress Video”. In: Bitmovin Developers Blog. Nov. 2020. url:
https://bitmovin.com/what-is-per-title-encoding/.
22
Jon Dahl. “Instant Per-Title Encoding”. In: Mux Video Education Blog. Apr. 2018. url: https://www.mux.com/blog/instant-per-title-encoding.
23
Capella. “Save Bandwidth and Improve Viewer Quality of Experience with Source Adaptive Bitrate Ladders”. In: CAMBRIA FTC. url:
https://capellasystems.net/wp-content/uploads/2021/01/CambriaFTC_SABL.pdf.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 21
Online per-title encoding Dynamic resolution encoding
Convex-hull estimation
(a) (b)
Figure: Rate-Distortion (RD) curves using VMAF24
as the quality metric of (a) Beauty and Golf sequences
of UVG and BVI datasets encoded at 540p and 1080p resolutions, and (b) Lake sequence of MCML
dataset encoded at a set of bitrates and resolutions to determine the convex hull.
24
Zhi Li et al. “VMAF: The Journey Continues”. In: Netflix Technology Blog. Oct. 2018. url:
https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 22
Online per-title encoding Dynamic resolution encoding
Online Resolution Prediction Scheme (ORPS) architecture25
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of resolutions
Set of bitrates
Optimized resolution
prediction
Features
2
Input video segment Representations
CBR
encoding
3
Figure: Encoding using ORPS for adaptive live streaming.
The encoding process is carried out only for the predicted bitrate-resolution pairs for each segment as constant
bitrate (CBR) encodings, thereby eliminating the need to encode in all bitrates and resolutions to find the
optimized bitrate-resolution pairs to yield maximum VMAF.
25
V. V. Menon et al. “OPTE: Online Per-Title Encoding for Live Video Streaming”. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). 2022, pp. 1865–1869. doi: 10.1109/ICASSP43922.2022.9746745. url:
https://doi.org/10.1109/ICASSP43922.2022.9746745.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 23
Online per-title encoding Dynamic resolution encoding
Proposed optimized resolution estimation
Figure: Optimized resolution (ˆ
rt ) prediction for a given target bitrate (bt ). v̂t is the maximum value among the
v̂r,b̂t
values output from the predicted models trained for resolutions r1, . . . , r˜
r . The resolution corresponding to
the maximum predicted VMAF is chosen as ˆ
rt .
VMAF is modeled as a function of the spatiotemporal features {EY, h, LY }, target resolution
(rt), target encoding bitrate (bt), encoding framerate (ft) and encoding preset (pt):
v(rt ,bt ,ft ,pt ) = fV (EY, h, LY, rt, bt, ft, pt) ; (7)
Random Forest regression models are trained for every supported resolution to predict VMAF.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 24
Online per-title encoding Dynamic resolution encoding
Evaluation of ORPS
Table: Experimental parameters used to evaluate ORPS.
Parameter Symbol Values
Set of resolution heights (in pixels) R {360, 432, 540, 720, 1080, 1440, 2160}
Target bitrate (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800
Set of framerates (in fps) F 30 50 60
Set of presets [x265] P {0 (ultrafast)}
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(d) Wood s000
Figure: The resolution predictions of representative video segments. HLS CBR encoding is represented
using the green line, and ORPS CBR encoding is represented using the red line.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 25
Online per-title encoding Dynamic resolution encoding
Evaluation of ORPS
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
ORPS CBR
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
HLS CBR
ORPS CBR
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS CBR
ORPS CBR
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
ORPS CBR
(d) Wood s000
Figure: RD curves of representative segments.
Dataset Video f SI TI EY h BDRV BDRP
SJTU
BundNightScape 30 48.82 7.06 54.90 11.62 -61.22% -60.86%
Fountains 30 43.37 11.42 60.90 23.02 -32.93% -8.49%
TrafficFlow 30 33.57 13.80 58.93 15.83 -50.54% -40.90%
TreeShade 30 52.88 5.29 80.19 8.83 -47.76% -38.55%
SVT
CrowdRun 50 50.77 22.33 96.55 33.33 -8.50% -1.90%
DucksTakeOff 50 47.77 15.10 119.12 30.88 -2.99% -2.79%
IntoTree 50 24.41 12.09 74.45 21.95 -26.50% -5.75%
OldTownCross 50 29.66 11.62 92.75 22.06 -30.91% -22.53%
ParkJoy 50 62.78 27.00 102.80 52.1 -12.08% -2.62%
JVET
CatRobot 60 44.45 11.84 56.36 14.25 -13.43% -5.95%
DaylightRoad2 60 40.51 16.21 66.40 20.13 -27.52% -9.35%
FoodMarket4 60 38.26 17.68 50.71 20.71 -18.11% -3.74%
Table: Results of ORPS against HLS bitrate ladder
CBR encoding.
On average, ORPS necessitates 17.28% fewer
bits to uphold identical PSNR values while re-
quiring 22.79% fewer bits to retain the same
VMAF values compared to the HLS bitrate lad-
der CBR encoding.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 26
Online per-title encoding Dynamic framerate encoding
Motivation for variable framerate (VFR)26
encoding
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
VMAF
120fps
60fps
30fps
24fps
(a) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
VMAF
120fps
60fps
30fps
24fps
(b) Lips
Figure: RD curves of UHD encodings of two representative HFR sequences from UVG dataset for
multiple framerates.
Dynamic framerate per-title encoding schemes are based on the fact that one framerate performs
better than others in a scene for a given bitrate range, and these regions depend on the video
complexity.
26
Alex Mackin et al. “Investigating the impact of high frame rates on video compression”. In: 2017 IEEE International Conference on Image Processing (ICIP).
2017, pp. 295–299. doi: 10.1109/ICIP.2017.8296290. url: https://doi.org/10.1109/ICIP.2017.8296290.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 27
Online per-title encoding Dynamic framerate encoding
State-of-the-art VFR encoding
Temporal
downsampling
Temporal
upsampling
Spatial
downsampling
Spatial
upsampling
Encoder
Decoder
Reconstructed video segment
2160p 120fps
2160p 60fps 1080p 60fps
1080p 60fps
2160p 60fps
Input video segment
2160p 120fps
CBR 8.1 Mbps
Framerate
selection
60 fps
Source
Display
Figure: Block diagram of a variable framerate (VFR) coding scheme27
in the context of video encoding
for adaptive streaming. This example encodes a video segment of UHD resolution and native framerate
120fps in representation (1080p, 8.1 Mbps) with the selected framerate of 60 fps. It also illustrates the
corresponding operations on the client side. Red dashed blocks indicate the additional steps introduced
compared to the traditional bitrate ladder encoding.
27
G. Herrou et al. “Quality-driven Variable Frame-Rate for Green Video Coding in Broadcast Applications”. In: IEEE Transactions on Circuits and Systems for
Video Technology. 2020, pp. 1–1. doi: 10.1109/TCSVT.2020.3046881. url: https://doi.org/10.1109/TCSVT.2020.3046881.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 28
Online per-title encoding Dynamic framerate encoding
Online Framerate Prediction Scheme (OFPS) architecture28
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of representations
Set of framerates
Optimized framerate
prediction
Features
2
Input video segment
Representations
CBR
encoding
3
Figure: Encoding architecture using OFPS for adaptive live streaming.
28
V. V. Menon et al. “CODA: Content-aware Frame Dropping Algorithm for High Frame-rate Video Streaming”. In: 2022 Data Compression Conference
(DCC). 2022, pp. 475–475. doi: 10.1109/DCC52660.2022.00086. url: https://doi.org/10.1109/DCC52660.2022.00086.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 29
Online per-title encoding Dynamic framerate encoding
Proposed optimized framerate estimation
Figure: Optimized framerate (ˆ
ft) prediction for a given target representation (rt, bt). v̂t is the maximum
value among the v̂rt ,bt ,f values output from the predicted models trained for framerates f1, . . . , f˜
f . The
framerate corresponding to the maximum predicted VMAF is chosen as ˆ
ft.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 30
Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
Table: Experimental parameters used to evaluate OFPS.
Parameter Symbol Values
Set of resolution heights (in pixels) R {1080, 2160}
Set of target bitrates (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800
Set of framerates (in fps) F {20, 24, 30, 40, 60, 90, 120}
Set of presets [x265] P {8 (veryslow)}
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(a) Beauty
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(b) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(c) ShakeNDry
Figure: Optimized framerate prediction results of representative sequences of UVG dataset. Please note
that the optimized framerate in various bitrates differs depending on the content complexity.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 31
Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
10
20
30
40
50
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(a) Beauty
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
10
15
20
25
30
35
40
45
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(b) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(c) ShakeNDry
Figure: RD curves and encoding times of representative sequences of UVG dataset.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 32
Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
Table: Results of OFPS-based encodings.
Dataset Resolution Video fmax BDRP BDRV ∆T
JVET
1080p CatRobot 60 -11.23% -12.22% -23.93%
1080p DaylightRoad2 60 -9.24% -8.93% -9.33%
1080p FoodMarket4 60 -5.73% -7.12% -10.80%
2160p CatRobot 60 -13.91% -15.43% -27.62%
2160p DaylightRoad2 60 -10.36% -11.21% -12.97%
2160p FoodMarket4 60 -7.37% -6.91% -12.01%
UVG
1080p Beauty 120 -8.18% -20.01% -18.01%
1080p Bosphorus 120 -15.66% -17.58% -23.03%
1080p Lips 120 0.00% 0.00% 0.00%
1080p HoneyBee 120 -16.96% -10.87% -30.11%
1080p Jockey 120 -0.10% -1.22% -5.45%
1080p ReadySteadyGo 120 -2.32% -5.00% -19.76%
1080p ShakeNDry 120 -11.15% -34.41% -25.59%
1080p YachtRide 120 -18.35% -9.15% -12.17%
2160p Beauty 120 -18.97% -24.83% -38.43%
2160p Bosphorus 120 -27.63% -26.93% -23.90%
2160p Lips 120 -27.12% -34.22% -19.13%
2160p HoneyBee 120 -36.14% -42.37% -28.91%
2160p Jockey 120 -2.92% -2.20% -13.47%
2160p ReadySteadyGo 120 -0.36% -2.92% -18.30%
2160p ShakeNDry 120 -22.82% -28.46% -33.66%
2160p YachtRide 120 -7.01% -4.69% -11.60%
BVI-HFR
1080p catch 120 -7.84% -8.99% -14.88%
1080p golf side 120 -4.10% -3.23% -9.87%
Average (1080p) -8.53% -10.67% -15.61%
Average (2160p) -15.87% -18.20% -21.82%
On average, UHD encoding using
OFPS requires 15.87% fewer bits to
maintain the same PSNR and 18.20%
fewer bits to keep the same VMAF as
compared to the original framerate
encoding.
An overall encoding time reduction of
21.82% is also observed.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 33
Online per-title encoding Perceptually-aware bitrate ladder prediction
Perceptual redundancy in bitrate ladder
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
50
60
70
80
90
100
VMAF
(a)
b1 b2 b3 b4 b5 b6 b7
Bitrate
v1
v2 =v1 +vJ(v1)
v3 =v2 +vJ(v2)
v4 =v3 +vJ(v3)
v5 =v4 +vJ(v4)
v6 =v5 +vJ(v5)
vmax
v7 =v6 +vJ(v6)
Quality
metric
(b)
Figure: Rate distortion curves of (a) the HLS bitrate ladder encoding of Characters sequence of MCML dataset,29
(b) the ideal bitrate ladder targeted.
Having many perceptually redundant representations for the bitrate ladder may not result in improved quality
of experience, but it may lead to increased storage and bandwidth costs.30
29
Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE
Transactions on Circuits and Systems for Video Technology. Vol. 28. 7. 2018, pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504. url:
https://doi.org/10.1109/TCSVT.2017.2683504.
30
Tianchi Huang, Rui-Xiao Zhang, and Lifeng Sun. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey:
Association for Computing Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url:
https://doi.org/10.1145/3458306.3458873.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 34
Online per-title encoding Perceptually-aware bitrate ladder prediction
JND-aware bitrate ladder prediction scheme (JBLS) architecture31
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of resolutions
Maximum bitrate
Bitrate ladder
prediction
Features
2
Input video segment
Minimum bitrate
Target JND function
Maximum VMAF
Set of framerates
Representations
CBR
encoding
Figure: Online encoding architecture using JBLS for adaptive streaming.
31
V. V. Menon et al. “Perceptually-Aware Per-Title Encoding for Adaptive Video Streaming”. In: 2022 IEEE International Conference on Multimedia and Expo
(ICME). Los Alamitos, CA, USA: IEEE Computer Society, July 2022, pp. 1–6. doi: 10.1109/ICME52920.2022.9859744. url:
https://doi.ieeecomputersociety.org/10.1109/ICME52920.2022.9859744.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 35
Online per-title encoding Perceptually-aware bitrate ladder prediction
First RD point estimation
Figure: Estimation of the first point of the bitrate ladder.
v̂1 is the maximum value among the v̂r,f ,b̂1
values output
from the predicted models trained for resolutions r1, .., r˜
r in
R, and framerates f1, .., f˜
f in F. The resolution-framerate
pair corresponding to the VMAF v̂1 is chosen as ˆ
r1 and ˆ
f1,
respectively.
Step 1:
b̂1 ← bmin
Determine v̂r,f ,b̂1
∀r ∈ R, f ∈ F
v̂1 ← max(v̂r,f ,b̂1
)
(ˆ
r1, ˆ
f1) ← arg maxr∈R,f ∈F (v̂r,f ,b̂1
)
(ˆ
r1, ˆ
f1, b̂1) is the first point of the bitrate ladder.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 36
Online per-title encoding Perceptually-aware bitrate ladder prediction
Remaining RD points estimation
Figure: Estimation of the tth
point (t ≥ 2) of the bitrate
ladder. log(b̂t ) is the minimum value among the log(b̂r,v̂t )
values output from the predicted models trained for reso-
lutions r1, .., rM . The resolution corresponding to log(b̂t )
is chosen as ˆ
rt .
Step 2:
while b̂t−1 < bmax and v̂t−1 < vmax do
v̂t ← v̂t−1 + vJ (v̂t−1)
Determine b̂r,f ,v̂t ∀r ∈ R, f ∈ F
b̂t ← min(b̂r,f ,v̂t )
(ˆ
rt , ˆ
ft ) ← arg minr∈R,f ∈F (b̂r,f ,v̂t )
(ˆ
rt , ˆ
ft , b̂t ) is the tth
point of the bitrate ladder.
t ← t + 1
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 37
Online per-title encoding Perceptually-aware bitrate ladder prediction
Evaluation of JBLS
Table: Experimental parameters used to evaluate JBLS.
Parameter Symbol Values
Set of resolution heights (in pixels) R 360 432 540 720 1080 1440 2160
Set of framerates (in fps) F {30}
Set of presets [x265] P {0 (ultrafast)}
Minimum target bitrate (in Mbps) bmin 0.145
Maximum target bitrate (in Mbps) bmax 16.8
Average target JND vJ 2 4 6
Maximum VMAF threshold vmax 98 96 94
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS CBR
JBLS
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(d) Wood s000
Figure: RD curves of representative video sequences (segments) using the HLS bitrate ladder CBR
encoding (green line), and JBLS encoding (red line). JND is considered as six VMAF points in these
plots.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 38
Online per-title encoding Perceptually-aware bitrate ladder prediction
Evaluation of JBLS
Table: Average results of JBLS compared to the HLS bitrate ladder CBR encoding.
Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T
JBLS (vJ =2)32 -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73%
JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19%
JBLS (vJ =6)33 -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35%
Live streaming using JBLS requires 12.94% fewer bits to maintain the same PSNR and
17.94% fewer bits to maintain the same VMAF compared to the reference HLS bitrate
ladder.
The improvement in the compression efficiency is achieved with an average storage reduc-
tion of 42.48% and an average encoding time reduction of 25.35% compared to HLS bitrate
ladder CBR encoding, considering a JND of six VMAF points.
32
Andreas Kah et al. “Fundamental relationships between subjective quality, user acceptance, and the VMAF metric for a quality-based bit-rate ladder design for
over-the-top video streaming services”. In: Applications of Digital Image Processing XLIV. vol. 11842. International Society for Optics and Photonics. SPIE,
2021, 118420Z. doi: 10.1117/12.2593952. url: https://doi.org/10.1117/12.2593952.
33
Jan Ozer. “Finding the Just Noticeable Difference with Netflix VMAF”. In: Sept. 2017. url:
https://streaminglearningcenter.com/codecs/finding-the-just-noticeable-difference-with-netflix-vmaf.html.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 39
Live variable bitrate encoding
Live variable bitrate encoding
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 40
Live variable bitrate encoding Two-pass encoding
Two-pass encoding
Encoder Encoder
statistics
Input video
control
parameters
analysis of
first pass
statistics
bitstream
first pass second pass
Figure: Two-pass encoding architecture.
Two-pass encoding introduces adaptability and content awareness into the encoding pro-
cess.34
In the first pass, the encoder analyzes the entire video sequence to gain insights into its
complexity, motion, and spatial detail.
In the second pass, based on the insights from the first pass, the encoder dynamically adjusts
the bitrate allocation for each segment, prioritizing quality where needed and optimizing
compression elsewhere.35
34
Chengsheng Que, Guobin Chen, and Jilin Liu. “An Efficient Two-Pass VBR Encoding Algorithm for H.264”. In: 2006 International Conference on
Communications, Circuits and Systems. Vol. 1. 2006, pp. 118–122. doi: 10.1109/ICCCAS.2006.284599.
35
Ivan Zupancic et al. “Two-pass rate control for UHDTV delivery with HEVC”. In: 2016 Picture Coding Symposium (PCS). 2016, pp. 1–5. doi:
10.1109/PCS.2016.7906322.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 41
Live variable bitrate encoding Two-pass encoding
LiveVBR architecture36
Representations
Target encoder/ codec
1
4
Video Complexity
Feature Extraction
Set of resolutions
Maximum bitrate
Perceptually-optimized
bitrate ladder
prediction
cVBR
encoding
Features
2
Input video segment
Minimum bitrate
Target JND function
Maximum VMAF
Optimized
CRF prediction
3
Set of framerates
First pass
Second pass
Figure: Live encoding architecture featuring LiveVBR envisioned in this chapter.
36
Vignesh V Menon et al. “JND-aware Two-pass Per-title Encoding Scheme for Adaptive Live Streaming”. In: IEEE Transactions on Circuits and Systems for
Video Technology. 2023, pp. 1–1. doi: 10.1109/TCSVT.2023.3290725. url: https://doi.org/10.1109/TCSVT.2023.3290725.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 42
Live variable bitrate encoding Optimized CRF prediction
Live-VBR
cVBR encoding of the bitrate ladder37
Figure: Optimized CRF estimation for the tth
representation to achieve the target bitrate b̂t using a
prediction model trained for resolution ˆ
rt, and framerate ˆ
ft.
Optimized CRF is determined for the selected (r, b, f ) pairs.
cVBR encoding for the (r, b, f , c) pairs is performed.
37
Vignesh V Menon et al. “ETPS: Efficient Two-Pass Encoding Scheme for Adaptive Live Streaming”. In: 2022 IEEE International Conference on Image
Processing (ICIP). 2022, pp. 1516–1520. doi: 10.1109/ICIP46576.2022.9897768. url: https://doi.org/10.1109/ICIP46576.2022.9897768.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 43
Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
Table: Input parameters of LiveVBR used in the experiments.
Parameter Symbol Values
Set of resolutions (height in pixels) R { 360, 432, 540, 720, 1080, 1440, 2160 }
Set of framerates (in fps) F { 30 }
Set of presets P { ultrafast }
Minimum bitrate (in Mbps) bmin 0.145
Maximum bitrate (in Mbps) bmax 16.80
Average target JND vJ {2, 4, 6 }
Maximum VMAF threshold vmax {98, 96,94 }
Table: Comparison of other per-title encoding methods with LiveVBR, regarding the target scenario, number of
per-encodings, encoding type, and the additional computational overhead to determine the convex-hull.
Method Target scenario Number of pre-encodings Encoding type ∆TC
Bruteforce38 VoD ˜
r × c̃ cVBR 4596.77%
Katsenou et al.39 VoD (˜
r − 1) × 2 CQP 120.57%
FAUST40 VoD 1 CBR 48.65%
Bhat et al.41 VoD 1 CBR 67.82%
ORPS Live 0 CBR 0.30%
JBLS Live 0 CBR 0.33%
LiveVBR Live 0 cVBR 0.41%
38
De Cock et al., “Complexity-based consistent-quality encoding in the cloud”.
39
A. V. Katsenou, J. Sole, and D. R. Bull. “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming”. In: 2019 Picture Coding Symposium
(PCS). 2019. doi: 10.1109/PCS48520.2019.8954529.
40
Anatoliy Zabrovskiy et al. “FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning”. In: 2021 30th Conference of
Open Innovations Association FRUCT. 2021, pp. 292–302. doi: 10.23919/FRUCT53335.2021.9599963. url:
https://doi.org/10.23919/FRUCT53335.2021.9599963.
41
M. Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “A Case Study of Machine Learning Classifiers for Real-Time Adaptive Resolution Prediction in Video
Coding”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102934.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 44
Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
Table: Average results of the encoding schemes compared to the HLS bitrate ladder CBR encoding.
Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T
Bruteforce (vJ =2) -23.09% -43.23% 1.34 dB 10.61 -25.99% 4732.33%
Bruteforce (vJ =4) -28.15% -42.75% 1.70 dB 10.08 -59.07% 4732.33%
Bruteforce (vJ =6) -25.36% -40.73% 1.67 dB 9.19 -70.50% 4732.33%
ORPS CBR -17.28% -22.79% 0.98 dB 3.79 0.07% 15.74%
JBLS (vJ =2) -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73%
JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19%
JBLS (vJ =6) -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35%
HLS cVBR -35.25% -32.33% 2.09 dB 6.59 -9.39% 1.64%
ORPS cVBR -34.42% -42.67% 2.90 dB 9.51 -1.34% 62.73%
LiveVBR (vJ =2) -14.25% -29.14% 1.36 dB 7.82 23.57% 184.62%
LiveVBR (vJ =4) -18.41% -32.48% 1.41 dB 8.31 -56.38% 26.14%
LiveVBR (vJ =6) -18.80% -32.59% 1.34 dB 8.34 -68.96% -18.58%
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 45
Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(d) Wood s000
Figure: RD curves of representative video sequences (segments) using the considered encoding schemes.
JND is considered as six VMAF points in these plots.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 46
Conclusions and Future Directions
Conclusions and Future Directions
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 47
Conclusions and Future Directions Contributions
Contributions
Video complexity analysis:
Efficient DCT-energy-based spatial and temporal complexity features are proposed to ana-
lyze video complexity accurately and quickly. These features are suitable for live-streaming
applications as they are low complexity and significantly correlate to video coding parame-
ters.
Online per-title encoding optimizations:
Online resolution prediction scheme (ORPS) predicts optimized resolution yielding the high-
est perceptual quality using the video content complexity of the segment and the predefined
set of target bitrates.
Online framerate prediction scheme (OFPS) predicts the optimized framerate yielding the
highest perceptual quality using the video content complexity of the segment and the
predefined set of target bitrates.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 48
Conclusions and Future Directions Contributions
Contributions
Just noticeable difference (JND)-aware bitrate ladder prediction scheme (JBLS) predicts
optimized bitrate-resolution-framerate pairs such that there is a perceptual quality difference
of one JND between representations.
Constrained variable bitrate (cVBR) implementation of JBLS, i.e., LiveVBR, yields an av-
erage bitrate reduction of 18.80% and 32.59% for the same PSNR and VMAF, respectively,
compared to the HLS CBR bitrate ladder encoding using x265. For a target JND of six
VMAF points, it was observed that the application of LiveVBR resulted in a 68.96% reduc-
tion in storage space and an 18.58% reduction in encoding time, with a negligible impact
on streaming latency.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 49
Conclusions and Future Directions Reproducibility
Reproducibility
VCA is available at https://github.com/cd-athena/VCA. This initiative translates the
proposed video complexity analysis into a practical open-source implementation.
The open-source Python code of LiveVBR is available at https://github.com/cd-athena/
LiveVBR.
Video complexity
analysis
Encoder
Features
Input video segment Encoded bitstream
Application
Figure: Content-adaptive encoding using VCA.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 50
Conclusions and Future Directions Limitations
Limitations
1 Dynamic network conditions: The interplay between content complexity and real-time
network fluctuations is not extensively addressed.
2 Generalization across video genres: The generalization of the framework to highly spe-
cialized genres or unique content types may present challenges.
3 Real-time implementation challenges: While developed and evaluated offline, the content-
adaptive video coding framework poses challenges in real-time implementation, considering
computational efficiency and latency constraints.
4 Subjective quality assessment: Incorporating subjective quality assessment methods,
such as user studies or crowdsourced evaluations, could offer a more comprehensive under-
standing of the framework’s impact on viewer satisfaction.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 51
Conclusions and Future Directions Future directions
Future directions
1 Address the escalated runtime complexity inherent in encoding representations for multiple
codecs and representations, all while upholding the compression efficiency of the system.
2 Achieve zero-latency encoding in adaptive live-streaming scenarios using new-generation
codecs by synergizing dynamic resolution, bitrate, framerate, and encoding resource con-
figuration.42
3 Extend the per-title encoding schemes proposed in this dissertation to scenarios involving
transcoding in networking servers.43
42
Vignesh V Menon et al. “Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming”. In: 2022 Picture Coding Symposium (PCS). 2022,
pp. 253–257. doi: 10.1109/PCS56426.2022.10018034. url: https://doi.org/10.1109/PCS56426.2022.10018034.
43
Reza Farahani. “CDN and SDN Support and Player Interaction for HTTP Adaptive Video Streaming”. In: Proceedings of the 12th ACM Multimedia Systems
Conference. Istanbul, Turkey: Association for Computing Machinery, 2021, 398–402. isbn: 9781450384346. doi: 10.1145/3458305.3478464. url:
https://doi.org/10.1145/3458305.3478464.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 52
Q & A
Q & A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@ieee.org)
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 53

More Related Content

Similar to Content-adaptive Video Coding for HTTP Adaptive Streaming

Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Vignesh V Menon
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVignesh V Menon
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Alpen-Adria-Universität
 
HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?Alpen-Adria-Universität
 
HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)Alpen-Adria-Universität
 
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...Minh Nguyen
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - Poster
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - PosterQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - Poster
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - PosterDanieleLorenzi6
 
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive StreamingMachine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive StreamingAlpen-Adria-Universität
 
Towards User-centric Video Transmission in Next Generation Mobile Networks
Towards User-centric Video Transmission in Next Generation Mobile NetworksTowards User-centric Video Transmission in Next Generation Mobile Networks
Towards User-centric Video Transmission in Next Generation Mobile NetworksFörderverein Technische Fakultät
 
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...Behrouz Jedari
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingAlpen-Adria-Universität
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfVignesh V Menon
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoAlpen-Adria-Universität
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksIOSR Journals
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesAlpen-Adria-Universität
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesAlpen-Adria-Universität
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrievaleSAT Journals
 
Policy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player EnvironmentMinh Nguyen
 
CAdViSE or how to find the Sweet Spots of ABR Systems
CAdViSE or how to find the Sweet Spots of ABR SystemsCAdViSE or how to find the Sweet Spots of ABR Systems
CAdViSE or how to find the Sweet Spots of ABR SystemsAlpen-Adria-Universität
 

Similar to Content-adaptive Video Coding for HTTP Adaptive Streaming (20)

Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
 
VCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdfVCIP_MCBE_presentation.pdf
VCIP_MCBE_presentation.pdf
 
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
Energy-Efficient Multi-Codec Bitrate-Ladder Estimation for Adaptive Video Str...
 
Adaptive Video over ICN @ IETF'87
Adaptive Video over ICN @ IETF'87Adaptive Video over ICN @ IETF'87
Adaptive Video over ICN @ IETF'87
 
HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?
 
HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)HTTP Adaptive Streaming – Quo Vadis? (2023)
HTTP Adaptive Streaming – Quo Vadis? (2023)
 
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - Poster
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - PosterQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - Poster
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming - Poster
 
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive StreamingMachine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
 
Towards User-centric Video Transmission in Next Generation Mobile Networks
Towards User-centric Video Transmission in Next Generation Mobile NetworksTowards User-centric Video Transmission in Next Generation Mobile Networks
Towards User-centric Video Transmission in Next Generation Mobile Networks
 
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
Delay Analysis of Layered Video Caching in Crowdsourced Heterogeneous Wireles...
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video Streaming
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
 
Video Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor NetworksVideo Streaming Compression for Wireless Multimedia Sensor Networks
Video Streaming Compression for Wireless Multimedia Sensor Networks
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrieval
 
Policy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player Environment
 
CAdViSE or how to find the Sweet Spots of ABR Systems
CAdViSE or how to find the Sweet Spots of ABR SystemsCAdViSE or how to find the Sweet Spots of ABR Systems
CAdViSE or how to find the Sweet Spots of ABR Systems
 

More from Alpen-Adria-Universität

VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances
VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instancesVEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances
VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instancesAlpen-Adria-Universität
 
GREEM: An Open-Source Energy Measurement Tool for Video Processing
GREEM: An Open-Source Energy Measurement Tool for Video ProcessingGREEM: An Open-Source Energy Measurement Tool for Video Processing
GREEM: An Open-Source Energy Measurement Tool for Video ProcessingAlpen-Adria-Universität
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Alpen-Adria-Universität
 
VEEP: Video Encoding Energy and CO₂ Emission Prediction
VEEP: Video Encoding Energy and CO₂ Emission PredictionVEEP: Video Encoding Energy and CO₂ Emission Prediction
VEEP: Video Encoding Energy and CO₂ Emission PredictionAlpen-Adria-Universität
 
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...Alpen-Adria-Universität
 
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Vid...
Empowerment of Atypical Viewers  via Low-Effort Personalized Modeling  of Vid...Empowerment of Atypical Viewers  via Low-Effort Personalized Modeling  of Vid...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Vid...Alpen-Adria-Universität
 
Optimizing Video Streaming for Sustainability and Quality: The Role of Prese...
Optimizing Video Streaming  for Sustainability and Quality: The Role of Prese...Optimizing Video Streaming  for Sustainability and Quality: The Role of Prese...
Optimizing Video Streaming for Sustainability and Quality: The Role of Prese...Alpen-Adria-Universität
 
Machine Learning Based Resource Utilization Prediction in the Computing Conti...
Machine Learning Based Resource Utilization Prediction in the Computing Conti...Machine Learning Based Resource Utilization Prediction in the Computing Conti...
Machine Learning Based Resource Utilization Prediction in the Computing Conti...Alpen-Adria-Universität
 
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
 
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...Alpen-Adria-Universität
 
Multi-access Edge Computing for Adaptive Video Streaming
Multi-access Edge Computing for Adaptive Video StreamingMulti-access Edge Computing for Adaptive Video Streaming
Multi-access Edge Computing for Adaptive Video StreamingAlpen-Adria-Universität
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentAlpen-Adria-Universität
 
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...Alpen-Adria-Universität
 
Energy Consumption in Video Streaming: Components, Measurements, and Strategies
Energy Consumption in Video Streaming: Components, Measurements, and StrategiesEnergy Consumption in Video Streaming: Components, Measurements, and Strategies
Energy Consumption in Video Streaming: Components, Measurements, and StrategiesAlpen-Adria-Universität
 
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...Alpen-Adria-Universität
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningAlpen-Adria-Universität
 
Optimizing QoE and Latency of Live Video Streaming Using Edge Computing a...
Optimizing  QoE and Latency of  Live Video Streaming Using  Edge Computing  a...Optimizing  QoE and Latency of  Live Video Streaming Using  Edge Computing  a...
Optimizing QoE and Latency of Live Video Streaming Using Edge Computing a...Alpen-Adria-Universität
 
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming ApplicationsSARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming ApplicationsAlpen-Adria-Universität
 
Immersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to HolographyImmersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to HolographyAlpen-Adria-Universität
 
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...Alpen-Adria-Universität
 

More from Alpen-Adria-Universität (20)

VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances
VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instancesVEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances
VEED: Video Encoding Energy and CO2 Emissions Dataset for AWS EC2 instances
 
GREEM: An Open-Source Energy Measurement Tool for Video Processing
GREEM: An Open-Source Energy Measurement Tool for Video ProcessingGREEM: An Open-Source Energy Measurement Tool for Video Processing
GREEM: An Open-Source Energy Measurement Tool for Video Processing
 
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
Optimal Quality and Efficiency in Adaptive Live Streaming with JND-Aware Low ...
 
VEEP: Video Encoding Energy and CO₂ Emission Prediction
VEEP: Video Encoding Energy and CO₂ Emission PredictionVEEP: Video Encoding Energy and CO₂ Emission Prediction
VEEP: Video Encoding Energy and CO₂ Emission Prediction
 
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Video...
 
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Vid...
Empowerment of Atypical Viewers  via Low-Effort Personalized Modeling  of Vid...Empowerment of Atypical Viewers  via Low-Effort Personalized Modeling  of Vid...
Empowerment of Atypical Viewers via Low-Effort Personalized Modeling of Vid...
 
Optimizing Video Streaming for Sustainability and Quality: The Role of Prese...
Optimizing Video Streaming  for Sustainability and Quality: The Role of Prese...Optimizing Video Streaming  for Sustainability and Quality: The Role of Prese...
Optimizing Video Streaming for Sustainability and Quality: The Role of Prese...
 
Machine Learning Based Resource Utilization Prediction in the Computing Conti...
Machine Learning Based Resource Utilization Prediction in the Computing Conti...Machine Learning Based Resource Utilization Prediction in the Computing Conti...
Machine Learning Based Resource Utilization Prediction in the Computing Conti...
 
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream
 
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
Network-Assisted Delivery of Adaptive Video Streaming Services through CDN, S...
 
Multi-access Edge Computing for Adaptive Video Streaming
Multi-access Edge Computing for Adaptive Video StreamingMulti-access Edge Computing for Adaptive Video Streaming
Multi-access Edge Computing for Adaptive Video Streaming
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...
VE-Match: Video Encoding Matching-based Model for Cloud and Edge Computing In...
 
Energy Consumption in Video Streaming: Components, Measurements, and Strategies
Energy Consumption in Video Streaming: Components, Measurements, and StrategiesEnergy Consumption in Video Streaming: Components, Measurements, and Strategies
Energy Consumption in Video Streaming: Components, Measurements, and Strategies
 
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...
Exploring the Energy Consumption of Video Streaming: Components, Challenges, ...
 
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine LearningVideo Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning
 
Optimizing QoE and Latency of Live Video Streaming Using Edge Computing a...
Optimizing  QoE and Latency of  Live Video Streaming Using  Edge Computing  a...Optimizing  QoE and Latency of  Live Video Streaming Using  Edge Computing  a...
Optimizing QoE and Latency of Live Video Streaming Using Edge Computing a...
 
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming ApplicationsSARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
SARENA: SFC-Enabled Architecture for Adaptive Video Streaming Applications
 
Immersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to HolographyImmersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to Holography
 
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...
LLL-CAdViSE: Live Low-Latency Cloud-based Adaptive Video Streaming Evaluation...
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Content-adaptive Video Coding for HTTP Adaptive Streaming

  • 1. Content-adaptive Video Coding for HTTP Adaptive Streaming Vignesh V Menon Alpen-Adria-Universität, Klagenfurt, Austria Supervisor : Univ.-Prof. DI Dr. Christian Timmerer, Alpen-Adria-Universität, Klagenfurt, Austria Advisor : Assoc.-Prof. DI Dr. Klaus Schoeffmann, Alpen-Adria-Universität, Klagenfurt, Austria Date : Jan 15, 2024 Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 1
  • 2. Outline 1 Introduction 2 Video complexity analysis 3 Online per-title encoding 4 Live variable bitrate encoding 5 Conclusions and Future Directions Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 2
  • 3. Introduction Introduction Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 3
  • 4. Introduction Introduction HTTP Adaptive Streaming (HAS)1 Source: https://bitmovin.com/adaptive-streaming/ Why Adaptive Streaming? Adapt for a wide range of devices. Adapt for a broad set of Internet speeds. 1 A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019), pp. 562–585. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 4
  • 5. Introduction Introduction HTTP Adaptive Streaming (HAS) Network Bandwidth Time Display Received representation HTTP server Bitrate ladder Encoders . . . . . . . . . . . . Encoded representations bitrate increase Video source Input video segment Figure: HTTP adaptive streaming (HAS) concept. What HAS does? Each source video is split into segments. Encoded at multiple bitrates, resolutions, and codecs. Delivered to the client based on the device capability, network speed etc. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 5
  • 6. Introduction Research questions 1 How to adapt the speed and compression performance of the video encoder based on content complexity? Determine the video’s content-adaptive spatial and temporal features, which are used to influ- ence encoder decisions like slice-type, quantization parameter, block partitioning, etc. By leveraging CAE algorithms, adaptive bitrate encoding, and intelligent analysis, an encoder’s speed, and compression performance can be effectively adapted to the complexity of the video content, leading to optimized encoding results.2 2 How to improve the compression efficiency of bitrate ladder encoding in live-streaming applications? Minimize the time to compute the convex hull for each title by analyzing the video content complexity features.3 Dynamically configure the encoding parameters on the fly to sustain a target encoding speed according to the content for efficient live streaming.4 2 Sriram Sethuraman, Nithya V. S., and Venkata Narayanababu Laveti D. “Non-iterative Content-Adaptive Distributed Encoding Through ML Techniques”. In: SMPTE 2017 Annual Technical Conference and Exhibition. 2017, pp. 1–8. doi: 10.5594/M001783. url: https://doi.org/10.5594/M001783. 3 J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016, pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605. url: https://doi.org/10.1109/ICIP.2016.7532605. 4 Pradeep Ramachandran et al. “Content adaptive live encoding with open source codecs”. In: Proceedings of the 11th ACM Multimedia Systems Conference. May 2020, pp. 345–348. doi: 10.1145/3339825.3393580. url: https://doi.org/10.1145/3339825.3393580. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 6
  • 7. Introduction Research questions Research questions 3 How to provide fast and rate-efficient multi-bitrate and multi-resolution bitrate ladder en- coding in adaptive streaming applications? Sharing encoder analysis information such as motion vectors, scene complexity, and frame types across different representations can avoid redundant calculations and minimize encoding operations, improving encoding efficiency and reducing computational overhead5, .6 Facilitate adaptive streaming optimizations on the server side, enabling efficient resource allo- cation and bandwidth management. 5 J. De Praeter et al. “Fast simultaneous video encoder for adaptive streaming”. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP). Oct. 2015, pp. 1–6. doi: 10.1109/MMSP.2015.7340802. url: https://doi.org/10.1109/MMSP.2015.7340802. 6 Vignesh V Menon et al. “EMES: Efficient Multi-Encoding Schemes for HEVC-Based Adaptive Bitrate Streaming”. In: ACM Trans. Multimedia Comput. Commun. Appl. New York, NY, USA: Association for Computing Machinery, Dec. 2022. doi: 10.1145/3575659. url: https://doi.org/10.1145/3575659. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 7
  • 8. Introduction Target of this study Target of this study Representations Target encoder/ codec Video complexity feature extraction Set of resolutions Maximum bitrate Optimized bitrate ladder and encoding parameters prediction Features Minimum bitrate Target JND function Maximum quality Set of framerates Input video Scene detection Scenes Target encoding speed Encoders Encoding parameters RQ 1, 2 RQ 1 RQ 3 RQ 2 Bitrate ladder Figure: The ideal video compression system for HAS targeted in this dissertation. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 8
  • 9. Introduction Thesis organization Thesis organization Video complexity analysis RQ 1, 2 Content-adaptive encoding optimizations RQ 1 Online per-title encoding optimizations RQ 1, 2 Multi-encoding optimizations RQ 3 Chapter 2 Video complexity analyzer Chapter 3 Scene detection algorithm Fast intra CU depth prediction algorithm Chapter 4 Online resolution prediction scheme Online framerate prediction scheme Online encoding preset prediction scheme Just noticeable difference aware bitrate ladder prediction scheme Chapter 5 Live variable bitrate encoding scheme Chapter 6 Efficient multi-encoding schemes Contribution class 1 Contribution class 2 Contribution class 3 Contribution class 4 Improve compression efficiency Improve encoding speed Contribution 1 Contribution 2 Contribution 3 Contribution 4 Contribution 5 Contribution 6 Contribution 7 Contribution 8 Contribution 9 Figure: Thesis organization. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 9
  • 10. Video complexity analysis Video complexity analysis Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 10
  • 11. Video complexity analysis Introduction Introduction Video complexity analysis is a critical step for numerous applications. Content-based retrieval: in multimedia archives, surveillance systems, and digital li- braries7 Video summarization: in video browsing, news aggregation, and event documentation8,9 Action recognition and scene understanding: in domains ranging from sports analytics and surveillance to robotics and human-computer interaction10 Quality assessment: in streaming services, video conferencing, and multimedia content distribution11,12 7 Wei Jiang et al. “Similarity-based online feature selection in content-based image retrieval”. In: IEEE Transactions on Image Processing. Vol. 15. 3. 2006, pp. 702–712. doi: 10.1109/TIP.2005.863105. url: https://doi.org/10.1109/TIP.2005.863105. 8 Parul Saini et al. “Video summarization using deep learning techniques: a detailed analysis and investigation”. In: Artificial Intelligence Review. Mar. 2023. doi: 10.1007/s10462-023-10444-0. url: https://doi.org/10.1007/s10462-023-10444-0. 9 Naveed Ejaz, Tayyab Bin Tariq, and Sung Wook Baik. “Adaptive key frame extraction for video summarization using an aggregation mechanism”. In: Journal of Visual Communication and Image Representation. Vol. 23. 7. 2012, pp. 1031–1040. doi: https://doi.org/10.1016/j.jvcir.2012.06.013. url: https://www.sciencedirect.com/science/article/pii/S1047320312001095. 10 N. Barman et al. “No-Reference Video Quality Estimation Based on Machine Learning for Passive Gaming Video Streaming Applications”. In: IEEE Access. Vol. 7. 2019, pp. 74511–74527. doi: 10.1109/ACCESS.2019.2920477. url: https://doi.org/10.1109/ACCESS.2019.2920477. 11 S. Zadtootaghaj et al. “NR-GVQM: A No Reference Gaming Video Quality Metric”. In: 2018 IEEE International Symposium on Multimedia (ISM). 2018, pp. 131–134. doi: 10.1109/ISM.2018.00031. url: https://doi.org/10.1109/ISM.2018.00031. 12 S. Göring, R. Rao, and A. Raake. “nofu — A Lightweight No-Reference Pixel Based Video Quality Model for Gaming Content”. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). 2019, pp. 1–6. doi: 10.1109/QoMEX.2019.8743262. url: https://doi.org/10.1109/QoMEX.2019.8743262. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 11
  • 12. Video complexity analysis State-of-the-art features State-of-the-art features13 Spatial information (SI) SI is a prominent indicator, portraying the peak spatial intricacy present within a video. SI = max{std[Sobel(F(p))]} (1) Temporal information (TI) TI manifests as the maximum temporal variance observable between consecutive frames in a video sequence. D(p) = F(p)) − F(p − 1)) (2) TI = max{std[D(p)]} (3) 13 ITU-T. “P.910 : Subjective video quality assessment methods for multimedia applications”. In: Nov. 2021. url: https://www.itu.int/rec/T-REC-P.910-202111-I/en. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 12
  • 13. Video complexity analysis Proposed features Proposed features The luma and chroma brightness of non-overlapping blocks k of size w × w pixels for each frame p is defined as: Lc,p,k = p DCTc(1, 1) ∀c ∈ [0, 2] (4) A DCT-energy function is introduced to determine the luma and chroma texture of every non- overlapping block k in each frame p, which is defined as: Hc,p,k = w X i=1 w X j=1 e[( ij w2 )2−1] |DCTc(i − 1, j − 1)| ∀c ∈ [0, 2] (5) where DCTc(i, j) is the (i, j)th DCT coefficient when i + j > 2, and 0 otherwise.14 The block- wise texture per frame is averaged to determine the luma and chroma texture features (EY, EU, EV) per video segment. 14 Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone, Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url: https://doi.org/10.1145/3524273.3532896. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 13
  • 14. Video complexity analysis Proposed features Proposed features The block-wise difference of the luma texture energy of each frame compared to its previous frame is calculated as: hp,k = | HY,p,k − HY,p−1,k | w2 (6) (a) original frame (b) EY (c) h (d) LY Figure: Heatmap depiction of the luma texture information {EY, h, LY } extracted from the second frame of CoverSong 1080P 0a86 video of Youtube UGC Dataset.15 15 Yilin Wang, Sasi Inguva, and Balu Adsumilli. “YouTube UGC Dataset for Video Compression Research”. In: 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). Sept. 2019. doi: 10.1109/mmsp.2019.8901772. url: https://doi.org/10.1109/mmsp.2019.8901772. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 14
  • 15. Video complexity analysis Analysis of features Analysis of features QP22 QP27 QP32 QP37 SI EY 0.17 0.24 0.3 0.34 0.86 0.86 0.85 0.84 0.2 0.4 0.6 0.8 (a) x264 QP22 QP27 QP32 QP37 SI EY 0.18 0.24 0.32 0.37 0.86 0.88 0.87 0.85 0.2 0.4 0.6 0.8 (b) x265 Figure: PCC between the spatial complexity features (SI and EY) and bitrate in All Intra configuration16 with medium preset of x264 and x265 encoders for the VCD dataset.17 Bitrate in AI configuration is considered the spatial complexity’s ground truth. EY correlates better with the spatial complexity than the state-of-the-art SI feature. 16 F. Bossen. “Common test conditions and software reference configurations”. In: JCTVC-L1100. Vol. 12. 2013, p. 7. url: http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281. 17 Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. New York, NY, USA: Association for Computing Machinery, 2022. isbn: 9781450392839. doi: 10.1145/3524273.3532892. url: https://doi.org/10.1145/3524273.3532892. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 15
  • 16. Video complexity analysis Analysis of features Analysis of features QP22 QP27 QP32 QP37 SI TI EY h 0 0.01 0.02 0.03 0.53 0.53 0.55 0.55 0.59 0.48 0.45 0.4 0.55 0.61 0.72 0.76 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (a) veryslow QP22 QP27 QP32 QP37 SI TI EY h 0.01 0.01 0.03 0.04 0.54 0.55 0.56 0.57 0.6 0.52 0.47 0.43 0.57 0.64 0.73 0.76 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (b) medium QP22 QP27 QP32 QP37 SI TI EY h 0.06 0.06 0.08 0.08 0.54 0.56 0.58 0.61 0.67 0.58 0.52 0.47 0.58 0.65 0.73 0.78 0.1 0.2 0.3 0.4 0.5 0.6 0.7 (c) ultrafast Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with bitrate in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for the VCD dataset. The correlation of EY with bitrate increases as QP decreases. Similarly, the correlation of h with bitrate decreases as QP decreases. EY and h correlate well with the LDP configuration’s RD complexity and encoding run-time complexity. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 16
  • 17. Video complexity analysis Analysis of features Analysis of features QP22 QP27 QP32 QP37 SI TI EY h 0.2 0.21 0.22 0.23 0.71 0.72 0.74 0.76 0.47 0.35 0.26 0.21 0.67 0.71 0.77 0.81 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (a) veryslow QP22 QP27 QP32 QP37 SI TI EY h 0.22 0.23 0.26 0.28 0.66 0.68 0.73 0.77 0.41 0.33 0.28 0.27 0.59 0.66 0.74 0.78 0.3 0.4 0.5 0.6 0.7 (b) medium QP22 QP27 QP32 QP37 SI TI EY h 0.25 0.28 0.32 0.35 0.69 0.71 0.73 0.73 0.5 0.43 0.38 0.35 0.61 0.62 0.64 0.63 0.3 0.4 0.5 0.6 0.7 (c) ultrafast Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with encoding time in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for the VCD dataset. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 17
  • 18. Video complexity analysis Analysis of features Analysis of features 2160p 1080p 720p 2160p 1080p 720p 1 0.43 0.45 0.43 1 0.82 0.45 0.82 1 0.5 0.6 0.7 0.8 0.9 1.0 (a) SI 2160p 1080p 720p 2160p 1080p 720p 1 0.94 0.91 0.94 1 0.99 0.91 0.99 1 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 (b) EY Figure: PCC between the spatial complexity features across multiple resolutions for the VCD dataset. EY exhibits better correlation across resolutions, facilitating optimizations, including computa- tions in lower resolutions. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 18
  • 19. Video complexity analysis Performance optimizations Performance optimizations18 1 x86 SIMD optimization 2 Multi-threading optimization 3 Low-pass analysis optimization SITI VCA no opt VCA SIMD opt VCA SIMD opt + 2 threads VCA SIMD opt + 4 threads VCA SIMD opt + 8 threads VCA SIMD opt + 8 threads + low-pass DCT opt 0 50 100 150 200 250 300 350 Speed (in fps) Figure: Speed of the proposed video complexity analysis using various performance optimizations. 18 Vignesh V Menon et al. “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming”. In: Proceedings of the First International Workshop on Green Multimedia Systems. GMSys ’23. Vancouver, BC, Canada: Association for Computing Machinery, 2023, 16–18. isbn: 9798400701962. doi: 10.1145/3593908.3593942. url: https://doi.org/10.1145/3593908.3593942. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 19
  • 20. Online per-title encoding Online per-title encoding Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 20
  • 21. Online per-title encoding Dynamic resolution encoding Dynamic resolution encoding Dynamic resolution per-title encoding schemes are based on the fact that one resolution performs better than others in a scene for a given bitrate range, and these regions depend on the video complexity. Dynamic resolution prediction seeks to strike an equilibrium between delivering optimal visual quality and conserving bandwidth resources. The adaptive streaming system can anticipate the ideal resolution for each segment in real- time by harnessing predictive models, often based on historical data19 or machine learning techniques.20 Dynamic resolution optimization approaches developed in the industry– from Bitmovin,21 MUX,22 and CAMBRIA23 are proprietary. 19 Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775. 20 Madhukar Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “Combining Video Quality Metrics To Select Perceptually Accurate Resolution In A Wide Quality Range: A Case Study”. In: 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 2164–2168. doi: 10.1109/ICIP42928.2021.9506310. 21 Gernot Zwantschko. “What is Per-Title Encoding? How to Efficiently Compress Video”. In: Bitmovin Developers Blog. Nov. 2020. url: https://bitmovin.com/what-is-per-title-encoding/. 22 Jon Dahl. “Instant Per-Title Encoding”. In: Mux Video Education Blog. Apr. 2018. url: https://www.mux.com/blog/instant-per-title-encoding. 23 Capella. “Save Bandwidth and Improve Viewer Quality of Experience with Source Adaptive Bitrate Ladders”. In: CAMBRIA FTC. url: https://capellasystems.net/wp-content/uploads/2021/01/CambriaFTC_SABL.pdf. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 21
  • 22. Online per-title encoding Dynamic resolution encoding Convex-hull estimation (a) (b) Figure: Rate-Distortion (RD) curves using VMAF24 as the quality metric of (a) Beauty and Golf sequences of UVG and BVI datasets encoded at 540p and 1080p resolutions, and (b) Lake sequence of MCML dataset encoded at a set of bitrates and resolutions to determine the convex hull. 24 Zhi Li et al. “VMAF: The Journey Continues”. In: Netflix Technology Blog. Oct. 2018. url: https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 22
  • 23. Online per-title encoding Dynamic resolution encoding Online Resolution Prediction Scheme (ORPS) architecture25 Target encoder/ codec 1 Video Complexity Feature Extraction Set of resolutions Set of bitrates Optimized resolution prediction Features 2 Input video segment Representations CBR encoding 3 Figure: Encoding using ORPS for adaptive live streaming. The encoding process is carried out only for the predicted bitrate-resolution pairs for each segment as constant bitrate (CBR) encodings, thereby eliminating the need to encode in all bitrates and resolutions to find the optimized bitrate-resolution pairs to yield maximum VMAF. 25 V. V. Menon et al. “OPTE: Online Per-Title Encoding for Live Video Streaming”. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022, pp. 1865–1869. doi: 10.1109/ICASSP43922.2022.9746745. url: https://doi.org/10.1109/ICASSP43922.2022.9746745. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 23
  • 24. Online per-title encoding Dynamic resolution encoding Proposed optimized resolution estimation Figure: Optimized resolution (ˆ rt ) prediction for a given target bitrate (bt ). v̂t is the maximum value among the v̂r,b̂t values output from the predicted models trained for resolutions r1, . . . , r˜ r . The resolution corresponding to the maximum predicted VMAF is chosen as ˆ rt . VMAF is modeled as a function of the spatiotemporal features {EY, h, LY }, target resolution (rt), target encoding bitrate (bt), encoding framerate (ft) and encoding preset (pt): v(rt ,bt ,ft ,pt ) = fV (EY, h, LY, rt, bt, ft, pt) ; (7) Random Forest regression models are trained for every supported resolution to predict VMAF. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 24
  • 25. Online per-title encoding Dynamic resolution encoding Evaluation of ORPS Table: Experimental parameters used to evaluate ORPS. Parameter Symbol Values Set of resolution heights (in pixels) R {360, 432, 540, 720, 1080, 1440, 2160} Target bitrate (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800 Set of framerates (in fps) F 30 50 60 Set of presets [x265] P {0 (ultrafast)} 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 360p 540p 720p 1080p 1440p 2160p Resolution height (in pixels) HLS CBR ORPS CBR (a) Bunny s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 360p 540p 720p 1080p 1440p 2160p Resolution height (in pixels) HLS CBR ORPS CBR (b) Characters s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 360p 540p 720p 1080p 1440p 2160p Resolution height (in pixels) HLS CBR ORPS CBR (c) HoneyBee s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 360p 540p 720p 1080p 1440p 2160p Resolution height (in pixels) HLS CBR ORPS CBR (d) Wood s000 Figure: The resolution predictions of representative video segments. HLS CBR encoding is represented using the green line, and ORPS CBR encoding is represented using the red line. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 25
  • 26. Online per-title encoding Dynamic resolution encoding Evaluation of ORPS 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF HLS CBR ORPS CBR (a) Bunny s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 80 90 100 VMAF HLS CBR ORPS CBR (b) Characters s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF HLS CBR ORPS CBR (c) HoneyBee s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF HLS CBR ORPS CBR (d) Wood s000 Figure: RD curves of representative segments. Dataset Video f SI TI EY h BDRV BDRP SJTU BundNightScape 30 48.82 7.06 54.90 11.62 -61.22% -60.86% Fountains 30 43.37 11.42 60.90 23.02 -32.93% -8.49% TrafficFlow 30 33.57 13.80 58.93 15.83 -50.54% -40.90% TreeShade 30 52.88 5.29 80.19 8.83 -47.76% -38.55% SVT CrowdRun 50 50.77 22.33 96.55 33.33 -8.50% -1.90% DucksTakeOff 50 47.77 15.10 119.12 30.88 -2.99% -2.79% IntoTree 50 24.41 12.09 74.45 21.95 -26.50% -5.75% OldTownCross 50 29.66 11.62 92.75 22.06 -30.91% -22.53% ParkJoy 50 62.78 27.00 102.80 52.1 -12.08% -2.62% JVET CatRobot 60 44.45 11.84 56.36 14.25 -13.43% -5.95% DaylightRoad2 60 40.51 16.21 66.40 20.13 -27.52% -9.35% FoodMarket4 60 38.26 17.68 50.71 20.71 -18.11% -3.74% Table: Results of ORPS against HLS bitrate ladder CBR encoding. On average, ORPS necessitates 17.28% fewer bits to uphold identical PSNR values while re- quiring 22.79% fewer bits to retain the same VMAF values compared to the HLS bitrate lad- der CBR encoding. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 26
  • 27. Online per-title encoding Dynamic framerate encoding Motivation for variable framerate (VFR)26 encoding 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 80 90 VMAF 120fps 60fps 30fps 24fps (a) HoneyBee 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 VMAF 120fps 60fps 30fps 24fps (b) Lips Figure: RD curves of UHD encodings of two representative HFR sequences from UVG dataset for multiple framerates. Dynamic framerate per-title encoding schemes are based on the fact that one framerate performs better than others in a scene for a given bitrate range, and these regions depend on the video complexity. 26 Alex Mackin et al. “Investigating the impact of high frame rates on video compression”. In: 2017 IEEE International Conference on Image Processing (ICIP). 2017, pp. 295–299. doi: 10.1109/ICIP.2017.8296290. url: https://doi.org/10.1109/ICIP.2017.8296290. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 27
  • 28. Online per-title encoding Dynamic framerate encoding State-of-the-art VFR encoding Temporal downsampling Temporal upsampling Spatial downsampling Spatial upsampling Encoder Decoder Reconstructed video segment 2160p 120fps 2160p 60fps 1080p 60fps 1080p 60fps 2160p 60fps Input video segment 2160p 120fps CBR 8.1 Mbps Framerate selection 60 fps Source Display Figure: Block diagram of a variable framerate (VFR) coding scheme27 in the context of video encoding for adaptive streaming. This example encodes a video segment of UHD resolution and native framerate 120fps in representation (1080p, 8.1 Mbps) with the selected framerate of 60 fps. It also illustrates the corresponding operations on the client side. Red dashed blocks indicate the additional steps introduced compared to the traditional bitrate ladder encoding. 27 G. Herrou et al. “Quality-driven Variable Frame-Rate for Green Video Coding in Broadcast Applications”. In: IEEE Transactions on Circuits and Systems for Video Technology. 2020, pp. 1–1. doi: 10.1109/TCSVT.2020.3046881. url: https://doi.org/10.1109/TCSVT.2020.3046881. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 28
  • 29. Online per-title encoding Dynamic framerate encoding Online Framerate Prediction Scheme (OFPS) architecture28 Target encoder/ codec 1 Video Complexity Feature Extraction Set of representations Set of framerates Optimized framerate prediction Features 2 Input video segment Representations CBR encoding 3 Figure: Encoding architecture using OFPS for adaptive live streaming. 28 V. V. Menon et al. “CODA: Content-aware Frame Dropping Algorithm for High Frame-rate Video Streaming”. In: 2022 Data Compression Conference (DCC). 2022, pp. 475–475. doi: 10.1109/DCC52660.2022.00086. url: https://doi.org/10.1109/DCC52660.2022.00086. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 29
  • 30. Online per-title encoding Dynamic framerate encoding Proposed optimized framerate estimation Figure: Optimized framerate (ˆ ft) prediction for a given target representation (rt, bt). v̂t is the maximum value among the v̂rt ,bt ,f values output from the predicted models trained for framerates f1, . . . , f˜ f . The framerate corresponding to the maximum predicted VMAF is chosen as ˆ ft. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 30
  • 31. Online per-title encoding Dynamic framerate encoding Evaluation of OFPS Table: Experimental parameters used to evaluate OFPS. Parameter Symbol Values Set of resolution heights (in pixels) R {1080, 2160} Set of target bitrates (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800 Set of framerates (in fps) F {20, 24, 30, 40, 60, 90, 120} Set of presets [x265] P {8 (veryslow)} 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 20 40 60 80 100 120 Framerate (in fps) Ground truth (fG) Predicted (f) (a) Beauty 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 20 40 60 80 100 120 Framerate (in fps) Ground truth (fG) Predicted (f) (b) HoneyBee 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 20 40 60 80 100 120 Framerate (in fps) Ground truth (fG) Predicted (f) (c) ShakeNDry Figure: Optimized framerate prediction results of representative sequences of UVG dataset. Please note that the optimized framerate in various bitrates differs depending on the content complexity. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 31
  • 32. Online per-title encoding Dynamic framerate encoding Evaluation of OFPS 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 VMAF Default (120fps) OFPS (VFR) 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 10 20 30 40 50 Time (in seconds) Default (120fps) OFPS (VFR) (a) Beauty 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 80 90 VMAF Default (120fps) OFPS (VFR) 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 10 15 20 25 30 35 40 45 Time (in seconds) Default (120fps) OFPS (VFR) (b) HoneyBee 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 VMAF Default (120fps) OFPS (VFR) 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 Time (in seconds) Default (120fps) OFPS (VFR) (c) ShakeNDry Figure: RD curves and encoding times of representative sequences of UVG dataset. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 32
  • 33. Online per-title encoding Dynamic framerate encoding Evaluation of OFPS Table: Results of OFPS-based encodings. Dataset Resolution Video fmax BDRP BDRV ∆T JVET 1080p CatRobot 60 -11.23% -12.22% -23.93% 1080p DaylightRoad2 60 -9.24% -8.93% -9.33% 1080p FoodMarket4 60 -5.73% -7.12% -10.80% 2160p CatRobot 60 -13.91% -15.43% -27.62% 2160p DaylightRoad2 60 -10.36% -11.21% -12.97% 2160p FoodMarket4 60 -7.37% -6.91% -12.01% UVG 1080p Beauty 120 -8.18% -20.01% -18.01% 1080p Bosphorus 120 -15.66% -17.58% -23.03% 1080p Lips 120 0.00% 0.00% 0.00% 1080p HoneyBee 120 -16.96% -10.87% -30.11% 1080p Jockey 120 -0.10% -1.22% -5.45% 1080p ReadySteadyGo 120 -2.32% -5.00% -19.76% 1080p ShakeNDry 120 -11.15% -34.41% -25.59% 1080p YachtRide 120 -18.35% -9.15% -12.17% 2160p Beauty 120 -18.97% -24.83% -38.43% 2160p Bosphorus 120 -27.63% -26.93% -23.90% 2160p Lips 120 -27.12% -34.22% -19.13% 2160p HoneyBee 120 -36.14% -42.37% -28.91% 2160p Jockey 120 -2.92% -2.20% -13.47% 2160p ReadySteadyGo 120 -0.36% -2.92% -18.30% 2160p ShakeNDry 120 -22.82% -28.46% -33.66% 2160p YachtRide 120 -7.01% -4.69% -11.60% BVI-HFR 1080p catch 120 -7.84% -8.99% -14.88% 1080p golf side 120 -4.10% -3.23% -9.87% Average (1080p) -8.53% -10.67% -15.61% Average (2160p) -15.87% -18.20% -21.82% On average, UHD encoding using OFPS requires 15.87% fewer bits to maintain the same PSNR and 18.20% fewer bits to keep the same VMAF as compared to the original framerate encoding. An overall encoding time reduction of 21.82% is also observed. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 33
  • 34. Online per-title encoding Perceptually-aware bitrate ladder prediction Perceptual redundancy in bitrate ladder 0.2 0.5 1.2 4.5 16.8 Bitrate (in Mbps) 50 60 70 80 90 100 VMAF (a) b1 b2 b3 b4 b5 b6 b7 Bitrate v1 v2 =v1 +vJ(v1) v3 =v2 +vJ(v2) v4 =v3 +vJ(v3) v5 =v4 +vJ(v4) v6 =v5 +vJ(v5) vmax v7 =v6 +vJ(v6) Quality metric (b) Figure: Rate distortion curves of (a) the HLS bitrate ladder encoding of Characters sequence of MCML dataset,29 (b) the ideal bitrate ladder targeted. Having many perceptually redundant representations for the bitrate ladder may not result in improved quality of experience, but it may lead to increased storage and bandwidth costs.30 29 Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE Transactions on Circuits and Systems for Video Technology. Vol. 28. 7. 2018, pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504. url: https://doi.org/10.1109/TCSVT.2017.2683504. 30 Tianchi Huang, Rui-Xiao Zhang, and Lifeng Sun. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey: Association for Computing Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url: https://doi.org/10.1145/3458306.3458873. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 34
  • 35. Online per-title encoding Perceptually-aware bitrate ladder prediction JND-aware bitrate ladder prediction scheme (JBLS) architecture31 Target encoder/ codec 1 Video Complexity Feature Extraction Set of resolutions Maximum bitrate Bitrate ladder prediction Features 2 Input video segment Minimum bitrate Target JND function Maximum VMAF Set of framerates Representations CBR encoding Figure: Online encoding architecture using JBLS for adaptive streaming. 31 V. V. Menon et al. “Perceptually-Aware Per-Title Encoding for Adaptive Video Streaming”. In: 2022 IEEE International Conference on Multimedia and Expo (ICME). Los Alamitos, CA, USA: IEEE Computer Society, July 2022, pp. 1–6. doi: 10.1109/ICME52920.2022.9859744. url: https://doi.ieeecomputersociety.org/10.1109/ICME52920.2022.9859744. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 35
  • 36. Online per-title encoding Perceptually-aware bitrate ladder prediction First RD point estimation Figure: Estimation of the first point of the bitrate ladder. v̂1 is the maximum value among the v̂r,f ,b̂1 values output from the predicted models trained for resolutions r1, .., r˜ r in R, and framerates f1, .., f˜ f in F. The resolution-framerate pair corresponding to the VMAF v̂1 is chosen as ˆ r1 and ˆ f1, respectively. Step 1: b̂1 ← bmin Determine v̂r,f ,b̂1 ∀r ∈ R, f ∈ F v̂1 ← max(v̂r,f ,b̂1 ) (ˆ r1, ˆ f1) ← arg maxr∈R,f ∈F (v̂r,f ,b̂1 ) (ˆ r1, ˆ f1, b̂1) is the first point of the bitrate ladder. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 36
  • 37. Online per-title encoding Perceptually-aware bitrate ladder prediction Remaining RD points estimation Figure: Estimation of the tth point (t ≥ 2) of the bitrate ladder. log(b̂t ) is the minimum value among the log(b̂r,v̂t ) values output from the predicted models trained for reso- lutions r1, .., rM . The resolution corresponding to log(b̂t ) is chosen as ˆ rt . Step 2: while b̂t−1 < bmax and v̂t−1 < vmax do v̂t ← v̂t−1 + vJ (v̂t−1) Determine b̂r,f ,v̂t ∀r ∈ R, f ∈ F b̂t ← min(b̂r,f ,v̂t ) (ˆ rt , ˆ ft ) ← arg minr∈R,f ∈F (b̂r,f ,v̂t ) (ˆ rt , ˆ ft , b̂t ) is the tth point of the bitrate ladder. t ← t + 1 Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 37
  • 38. Online per-title encoding Perceptually-aware bitrate ladder prediction Evaluation of JBLS Table: Experimental parameters used to evaluate JBLS. Parameter Symbol Values Set of resolution heights (in pixels) R 360 432 540 720 1080 1440 2160 Set of framerates (in fps) F {30} Set of presets [x265] P {0 (ultrafast)} Minimum target bitrate (in Mbps) bmin 0.145 Maximum target bitrate (in Mbps) bmax 16.8 Average target JND vJ 2 4 6 Maximum VMAF threshold vmax 98 96 94 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF HLS CBR JBLS (a) Bunny s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 80 90 100 VMAF HLS CBR JBLS (b) Characters s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF HLS CBR JBLS (c) HoneyBee s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF HLS CBR JBLS (d) Wood s000 Figure: RD curves of representative video sequences (segments) using the HLS bitrate ladder CBR encoding (green line), and JBLS encoding (red line). JND is considered as six VMAF points in these plots. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 38
  • 39. Online per-title encoding Perceptually-aware bitrate ladder prediction Evaluation of JBLS Table: Average results of JBLS compared to the HLS bitrate ladder CBR encoding. Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T JBLS (vJ =2)32 -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73% JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19% JBLS (vJ =6)33 -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35% Live streaming using JBLS requires 12.94% fewer bits to maintain the same PSNR and 17.94% fewer bits to maintain the same VMAF compared to the reference HLS bitrate ladder. The improvement in the compression efficiency is achieved with an average storage reduc- tion of 42.48% and an average encoding time reduction of 25.35% compared to HLS bitrate ladder CBR encoding, considering a JND of six VMAF points. 32 Andreas Kah et al. “Fundamental relationships between subjective quality, user acceptance, and the VMAF metric for a quality-based bit-rate ladder design for over-the-top video streaming services”. In: Applications of Digital Image Processing XLIV. vol. 11842. International Society for Optics and Photonics. SPIE, 2021, 118420Z. doi: 10.1117/12.2593952. url: https://doi.org/10.1117/12.2593952. 33 Jan Ozer. “Finding the Just Noticeable Difference with Netflix VMAF”. In: Sept. 2017. url: https://streaminglearningcenter.com/codecs/finding-the-just-noticeable-difference-with-netflix-vmaf.html. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 39
  • 40. Live variable bitrate encoding Live variable bitrate encoding Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 40
  • 41. Live variable bitrate encoding Two-pass encoding Two-pass encoding Encoder Encoder statistics Input video control parameters analysis of first pass statistics bitstream first pass second pass Figure: Two-pass encoding architecture. Two-pass encoding introduces adaptability and content awareness into the encoding pro- cess.34 In the first pass, the encoder analyzes the entire video sequence to gain insights into its complexity, motion, and spatial detail. In the second pass, based on the insights from the first pass, the encoder dynamically adjusts the bitrate allocation for each segment, prioritizing quality where needed and optimizing compression elsewhere.35 34 Chengsheng Que, Guobin Chen, and Jilin Liu. “An Efficient Two-Pass VBR Encoding Algorithm for H.264”. In: 2006 International Conference on Communications, Circuits and Systems. Vol. 1. 2006, pp. 118–122. doi: 10.1109/ICCCAS.2006.284599. 35 Ivan Zupancic et al. “Two-pass rate control for UHDTV delivery with HEVC”. In: 2016 Picture Coding Symposium (PCS). 2016, pp. 1–5. doi: 10.1109/PCS.2016.7906322. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 41
  • 42. Live variable bitrate encoding Two-pass encoding LiveVBR architecture36 Representations Target encoder/ codec 1 4 Video Complexity Feature Extraction Set of resolutions Maximum bitrate Perceptually-optimized bitrate ladder prediction cVBR encoding Features 2 Input video segment Minimum bitrate Target JND function Maximum VMAF Optimized CRF prediction 3 Set of framerates First pass Second pass Figure: Live encoding architecture featuring LiveVBR envisioned in this chapter. 36 Vignesh V Menon et al. “JND-aware Two-pass Per-title Encoding Scheme for Adaptive Live Streaming”. In: IEEE Transactions on Circuits and Systems for Video Technology. 2023, pp. 1–1. doi: 10.1109/TCSVT.2023.3290725. url: https://doi.org/10.1109/TCSVT.2023.3290725. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 42
  • 43. Live variable bitrate encoding Optimized CRF prediction Live-VBR cVBR encoding of the bitrate ladder37 Figure: Optimized CRF estimation for the tth representation to achieve the target bitrate b̂t using a prediction model trained for resolution ˆ rt, and framerate ˆ ft. Optimized CRF is determined for the selected (r, b, f ) pairs. cVBR encoding for the (r, b, f , c) pairs is performed. 37 Vignesh V Menon et al. “ETPS: Efficient Two-Pass Encoding Scheme for Adaptive Live Streaming”. In: 2022 IEEE International Conference on Image Processing (ICIP). 2022, pp. 1516–1520. doi: 10.1109/ICIP46576.2022.9897768. url: https://doi.org/10.1109/ICIP46576.2022.9897768. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 43
  • 44. Live variable bitrate encoding Optimized CRF prediction Evaluation of LiveVBR Table: Input parameters of LiveVBR used in the experiments. Parameter Symbol Values Set of resolutions (height in pixels) R { 360, 432, 540, 720, 1080, 1440, 2160 } Set of framerates (in fps) F { 30 } Set of presets P { ultrafast } Minimum bitrate (in Mbps) bmin 0.145 Maximum bitrate (in Mbps) bmax 16.80 Average target JND vJ {2, 4, 6 } Maximum VMAF threshold vmax {98, 96,94 } Table: Comparison of other per-title encoding methods with LiveVBR, regarding the target scenario, number of per-encodings, encoding type, and the additional computational overhead to determine the convex-hull. Method Target scenario Number of pre-encodings Encoding type ∆TC Bruteforce38 VoD ˜ r × c̃ cVBR 4596.77% Katsenou et al.39 VoD (˜ r − 1) × 2 CQP 120.57% FAUST40 VoD 1 CBR 48.65% Bhat et al.41 VoD 1 CBR 67.82% ORPS Live 0 CBR 0.30% JBLS Live 0 CBR 0.33% LiveVBR Live 0 cVBR 0.41% 38 De Cock et al., “Complexity-based consistent-quality encoding in the cloud”. 39 A. V. Katsenou, J. Sole, and D. R. Bull. “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming”. In: 2019 Picture Coding Symposium (PCS). 2019. doi: 10.1109/PCS48520.2019.8954529. 40 Anatoliy Zabrovskiy et al. “FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning”. In: 2021 30th Conference of Open Innovations Association FRUCT. 2021, pp. 292–302. doi: 10.23919/FRUCT53335.2021.9599963. url: https://doi.org/10.23919/FRUCT53335.2021.9599963. 41 M. Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “A Case Study of Machine Learning Classifiers for Real-Time Adaptive Resolution Prediction in Video Coding”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102934. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 44
  • 45. Live variable bitrate encoding Optimized CRF prediction Evaluation of LiveVBR Table: Average results of the encoding schemes compared to the HLS bitrate ladder CBR encoding. Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T Bruteforce (vJ =2) -23.09% -43.23% 1.34 dB 10.61 -25.99% 4732.33% Bruteforce (vJ =4) -28.15% -42.75% 1.70 dB 10.08 -59.07% 4732.33% Bruteforce (vJ =6) -25.36% -40.73% 1.67 dB 9.19 -70.50% 4732.33% ORPS CBR -17.28% -22.79% 0.98 dB 3.79 0.07% 15.74% JBLS (vJ =2) -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73% JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19% JBLS (vJ =6) -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35% HLS cVBR -35.25% -32.33% 2.09 dB 6.59 -9.39% 1.64% ORPS cVBR -34.42% -42.67% 2.90 dB 9.51 -1.34% 62.73% LiveVBR (vJ =2) -14.25% -29.14% 1.36 dB 7.82 23.57% 184.62% LiveVBR (vJ =4) -18.41% -32.48% 1.41 dB 8.31 -56.38% 26.14% LiveVBR (vJ =6) -18.80% -32.59% 1.34 dB 8.34 -68.96% -18.58% Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 45
  • 46. Live variable bitrate encoding Optimized CRF prediction Evaluation of LiveVBR 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF Bruteforce HLS CBR ORPS CBR JBLS HLS cVBR ORPS cVBR LiveVBR (a) Bunny s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 40 50 60 70 80 90 100 VMAF Bruteforce HLS CBR ORPS CBR JBLS HLS cVBR ORPS cVBR LiveVBR (b) Characters s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF Bruteforce HLS CBR ORPS CBR JBLS HLS cVBR ORPS cVBR LiveVBR (c) HoneyBee s000 0.2 0.5 1.2 3.0 8.0 16.8 Bitrate (in Mbps) 30 40 50 60 70 80 90 100 VMAF Bruteforce HLS CBR ORPS CBR JBLS HLS cVBR ORPS cVBR LiveVBR (d) Wood s000 Figure: RD curves of representative video sequences (segments) using the considered encoding schemes. JND is considered as six VMAF points in these plots. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 46
  • 47. Conclusions and Future Directions Conclusions and Future Directions Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 47
  • 48. Conclusions and Future Directions Contributions Contributions Video complexity analysis: Efficient DCT-energy-based spatial and temporal complexity features are proposed to ana- lyze video complexity accurately and quickly. These features are suitable for live-streaming applications as they are low complexity and significantly correlate to video coding parame- ters. Online per-title encoding optimizations: Online resolution prediction scheme (ORPS) predicts optimized resolution yielding the high- est perceptual quality using the video content complexity of the segment and the predefined set of target bitrates. Online framerate prediction scheme (OFPS) predicts the optimized framerate yielding the highest perceptual quality using the video content complexity of the segment and the predefined set of target bitrates. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 48
  • 49. Conclusions and Future Directions Contributions Contributions Just noticeable difference (JND)-aware bitrate ladder prediction scheme (JBLS) predicts optimized bitrate-resolution-framerate pairs such that there is a perceptual quality difference of one JND between representations. Constrained variable bitrate (cVBR) implementation of JBLS, i.e., LiveVBR, yields an av- erage bitrate reduction of 18.80% and 32.59% for the same PSNR and VMAF, respectively, compared to the HLS CBR bitrate ladder encoding using x265. For a target JND of six VMAF points, it was observed that the application of LiveVBR resulted in a 68.96% reduc- tion in storage space and an 18.58% reduction in encoding time, with a negligible impact on streaming latency. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 49
  • 50. Conclusions and Future Directions Reproducibility Reproducibility VCA is available at https://github.com/cd-athena/VCA. This initiative translates the proposed video complexity analysis into a practical open-source implementation. The open-source Python code of LiveVBR is available at https://github.com/cd-athena/ LiveVBR. Video complexity analysis Encoder Features Input video segment Encoded bitstream Application Figure: Content-adaptive encoding using VCA. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 50
  • 51. Conclusions and Future Directions Limitations Limitations 1 Dynamic network conditions: The interplay between content complexity and real-time network fluctuations is not extensively addressed. 2 Generalization across video genres: The generalization of the framework to highly spe- cialized genres or unique content types may present challenges. 3 Real-time implementation challenges: While developed and evaluated offline, the content- adaptive video coding framework poses challenges in real-time implementation, considering computational efficiency and latency constraints. 4 Subjective quality assessment: Incorporating subjective quality assessment methods, such as user studies or crowdsourced evaluations, could offer a more comprehensive under- standing of the framework’s impact on viewer satisfaction. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 51
  • 52. Conclusions and Future Directions Future directions Future directions 1 Address the escalated runtime complexity inherent in encoding representations for multiple codecs and representations, all while upholding the compression efficiency of the system. 2 Achieve zero-latency encoding in adaptive live-streaming scenarios using new-generation codecs by synergizing dynamic resolution, bitrate, framerate, and encoding resource con- figuration.42 3 Extend the per-title encoding schemes proposed in this dissertation to scenarios involving transcoding in networking servers.43 42 Vignesh V Menon et al. “Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming”. In: 2022 Picture Coding Symposium (PCS). 2022, pp. 253–257. doi: 10.1109/PCS56426.2022.10018034. url: https://doi.org/10.1109/PCS56426.2022.10018034. 43 Reza Farahani. “CDN and SDN Support and Player Interaction for HTTP Adaptive Video Streaming”. In: Proceedings of the 12th ACM Multimedia Systems Conference. Istanbul, Turkey: Association for Computing Machinery, 2021, 398–402. isbn: 9781450384346. doi: 10.1145/3458305.3478464. url: https://doi.org/10.1145/3458305.3478464. Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 52
  • 53. Q & A Q & A Thank you for your attention! Vignesh V Menon (vignesh.menon@ieee.org) Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 53