In today’s dynamic streaming landscape, where viewers access content on various devices and en- counter fluctuating network conditions, optimizing video delivery for each unique scenario is impera- tive. Video content complexity analysis, content-adaptive video coding, and multi-encoding methods are fundamental for the success of adaptive video streaming, as they serve crucial roles in delivering high-quality video experiences to a diverse audience. Video content complexity analysis allows us to comprehend the video content’s intricacies, such as motion, texture, and detail, providing valuable insights to enhance encoding decisions. By understanding the content’s characteristics, we can effi- ciently allocate bandwidth and encoding resources, thereby improving compression efficiency without compromising quality. Content-adaptive video coding techniques built upon this analysis involve dy- namically adjusting encoding parameters based on the content complexity. This adaptability ensures that the video stream remains visually appealing and artifacts are minimized, even under challenging network conditions. Multi-encoding methods further bolster adaptive streaming by offering faster encoding of multiple representations of the same video at different bitrates. This versatility reduces computational overhead and enables efficient resource allocation on the server side. Collectively, these technologies empower adaptive video streaming to deliver optimal visual quality and uninter- rupted viewing experiences, catering to viewers’ diverse needs and preferences across a wide range of devices and network conditions. Embracing video content complexity analysis, content-adaptive video coding, and multi-encoding methods is essential to meet modern video streaming platforms’ evolving demands and create immersive experiences that captivate and engage audiences. In this light, this dissertation proposes contributions categorized into four classes:
Developer Data Modeling Mistakes: From Postgres to NoSQL
Content-adaptive Video Coding for HTTP Adaptive Streaming
1. Content-adaptive Video Coding for HTTP Adaptive Streaming
Vignesh V Menon
Alpen-Adria-Universität, Klagenfurt, Austria
Supervisor : Univ.-Prof. DI Dr. Christian Timmerer, Alpen-Adria-Universität, Klagenfurt, Austria
Advisor : Assoc.-Prof. DI Dr. Klaus Schoeffmann, Alpen-Adria-Universität, Klagenfurt, Austria
Date : Jan 15, 2024
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 1
2. Outline
1 Introduction
2 Video complexity analysis
3 Online per-title encoding
4 Live variable bitrate encoding
5 Conclusions and Future Directions
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 2
4. Introduction
Introduction
HTTP Adaptive Streaming (HAS)1
Source: https://bitmovin.com/adaptive-streaming/
Why Adaptive Streaming?
Adapt for a wide range of devices.
Adapt for a broad set of Internet speeds.
1
A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019),
pp. 562–585.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 4
5. Introduction
Introduction
HTTP Adaptive Streaming (HAS)
Network
Bandwidth
Time
Display
Received
representation
HTTP server
Bitrate
ladder
Encoders
.
.
.
.
.
.
.
.
.
.
.
.
Encoded
representations
bitrate
increase
Video source Input video
segment
Figure: HTTP adaptive streaming (HAS) concept.
What HAS does?
Each source video is split into segments.
Encoded at multiple bitrates, resolutions, and codecs.
Delivered to the client based on the device capability, network speed etc.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 5
6. Introduction
Research questions
1 How to adapt the speed and compression performance of the video encoder based on
content complexity?
Determine the video’s content-adaptive spatial and temporal features, which are used to influ-
ence encoder decisions like slice-type, quantization parameter, block partitioning, etc.
By leveraging CAE algorithms, adaptive bitrate encoding, and intelligent analysis, an encoder’s
speed, and compression performance can be effectively adapted to the complexity of the video
content, leading to optimized encoding results.2
2 How to improve the compression efficiency of bitrate ladder encoding in live-streaming
applications?
Minimize the time to compute the convex hull for each title by analyzing the video content
complexity features.3
Dynamically configure the encoding parameters on the fly to sustain a target encoding speed
according to the content for efficient live streaming.4
2
Sriram Sethuraman, Nithya V. S., and Venkata Narayanababu Laveti D. “Non-iterative Content-Adaptive Distributed Encoding Through ML Techniques”. In:
SMPTE 2017 Annual Technical Conference and Exhibition. 2017, pp. 1–8. doi: 10.5594/M001783. url: https://doi.org/10.5594/M001783.
3
J. De Cock et al. “Complexity-based consistent-quality encoding in the cloud”. In: 2016 IEEE International Conference on Image Processing (ICIP). 2016,
pp. 1484–1488. doi: 10.1109/ICIP.2016.7532605. url: https://doi.org/10.1109/ICIP.2016.7532605.
4
Pradeep Ramachandran et al. “Content adaptive live encoding with open source codecs”. In: Proceedings of the 11th ACM Multimedia Systems Conference.
May 2020, pp. 345–348. doi: 10.1145/3339825.3393580. url: https://doi.org/10.1145/3339825.3393580.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 6
7. Introduction Research questions
Research questions
3 How to provide fast and rate-efficient multi-bitrate and multi-resolution bitrate ladder en-
coding in adaptive streaming applications?
Sharing encoder analysis information such as motion vectors, scene complexity, and frame
types across different representations can avoid redundant calculations and minimize encoding
operations, improving encoding efficiency and reducing computational overhead5,
.6
Facilitate adaptive streaming optimizations on the server side, enabling efficient resource allo-
cation and bandwidth management.
5
J. De Praeter et al. “Fast simultaneous video encoder for adaptive streaming”. In: 2015 IEEE 17th International Workshop on Multimedia Signal Processing
(MMSP). Oct. 2015, pp. 1–6. doi: 10.1109/MMSP.2015.7340802. url: https://doi.org/10.1109/MMSP.2015.7340802.
6
Vignesh V Menon et al. “EMES: Efficient Multi-Encoding Schemes for HEVC-Based Adaptive Bitrate Streaming”. In: ACM Trans. Multimedia Comput.
Commun. Appl. New York, NY, USA: Association for Computing Machinery, Dec. 2022. doi: 10.1145/3575659. url: https://doi.org/10.1145/3575659.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 7
8. Introduction Target of this study
Target of this study
Representations
Target encoder/ codec
Video complexity
feature extraction
Set of resolutions
Maximum bitrate
Optimized
bitrate ladder
and encoding
parameters
prediction
Features
Minimum bitrate
Target JND function
Maximum quality
Set of framerates
Input video
Scene
detection
Scenes
Target encoding speed
Encoders
Encoding parameters
RQ 1, 2
RQ 1
RQ 3
RQ 2
Bitrate ladder
Figure: The ideal video compression system for HAS targeted in this dissertation.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 8
9. Introduction Thesis organization
Thesis organization
Video complexity analysis
RQ 1, 2
Content-adaptive encoding
optimizations
RQ 1
Online per-title encoding
optimizations
RQ 1, 2
Multi-encoding optimizations
RQ 3
Chapter 2 Video complexity analyzer
Chapter 3
Scene detection algorithm
Fast intra CU depth
prediction algorithm
Chapter 4
Online resolution
prediction scheme
Online framerate
prediction scheme
Online encoding preset
prediction scheme
Just noticeable difference
aware bitrate ladder
prediction scheme
Chapter 5
Live variable bitrate
encoding scheme
Chapter 6
Efficient multi-encoding
schemes
Contribution
class 1
Contribution
class 2
Contribution
class 3
Contribution
class 4
Improve
compression
efficiency
Improve
encoding
speed
Contribution
1
Contribution
2
Contribution
3
Contribution
4
Contribution
5
Contribution
6
Contribution
7
Contribution
8
Contribution
9
Figure: Thesis organization.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 9
10. Video complexity analysis
Video complexity analysis
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 10
11. Video complexity analysis Introduction
Introduction
Video complexity analysis is a critical step for numerous applications.
Content-based retrieval: in multimedia archives, surveillance systems, and digital li-
braries7
Video summarization: in video browsing, news aggregation, and event documentation8,9
Action recognition and scene understanding: in domains ranging from sports analytics
and surveillance to robotics and human-computer interaction10
Quality assessment: in streaming services, video conferencing, and multimedia content
distribution11,12
7
Wei Jiang et al. “Similarity-based online feature selection in content-based image retrieval”. In: IEEE Transactions on Image Processing. Vol. 15. 3. 2006,
pp. 702–712. doi: 10.1109/TIP.2005.863105. url: https://doi.org/10.1109/TIP.2005.863105.
8
Parul Saini et al. “Video summarization using deep learning techniques: a detailed analysis and investigation”. In: Artificial Intelligence Review. Mar. 2023.
doi: 10.1007/s10462-023-10444-0. url: https://doi.org/10.1007/s10462-023-10444-0.
9
Naveed Ejaz, Tayyab Bin Tariq, and Sung Wook Baik. “Adaptive key frame extraction for video summarization using an aggregation mechanism”. In: Journal
of Visual Communication and Image Representation. Vol. 23. 7. 2012, pp. 1031–1040. doi: https://doi.org/10.1016/j.jvcir.2012.06.013. url:
https://www.sciencedirect.com/science/article/pii/S1047320312001095.
10
N. Barman et al. “No-Reference Video Quality Estimation Based on Machine Learning for Passive Gaming Video Streaming Applications”. In: IEEE Access.
Vol. 7. 2019, pp. 74511–74527. doi: 10.1109/ACCESS.2019.2920477. url: https://doi.org/10.1109/ACCESS.2019.2920477.
11
S. Zadtootaghaj et al. “NR-GVQM: A No Reference Gaming Video Quality Metric”. In: 2018 IEEE International Symposium on Multimedia (ISM). 2018,
pp. 131–134. doi: 10.1109/ISM.2018.00031. url: https://doi.org/10.1109/ISM.2018.00031.
12
S. Göring, R. Rao, and A. Raake. “nofu — A Lightweight No-Reference Pixel Based Video Quality Model for Gaming Content”. In: 2019 Eleventh
International Conference on Quality of Multimedia Experience (QoMEX). 2019, pp. 1–6. doi: 10.1109/QoMEX.2019.8743262. url:
https://doi.org/10.1109/QoMEX.2019.8743262.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 11
12. Video complexity analysis State-of-the-art features
State-of-the-art features13
Spatial information (SI)
SI is a prominent indicator, portraying the peak spatial intricacy present within a video.
SI = max{std[Sobel(F(p))]} (1)
Temporal information (TI)
TI manifests as the maximum temporal variance observable between consecutive frames in a
video sequence.
D(p) = F(p)) − F(p − 1)) (2)
TI = max{std[D(p)]} (3)
13
ITU-T. “P.910 : Subjective video quality assessment methods for multimedia applications”. In: Nov. 2021. url:
https://www.itu.int/rec/T-REC-P.910-202111-I/en.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 12
13. Video complexity analysis Proposed features
Proposed features
The luma and chroma brightness of non-overlapping blocks k of size w × w pixels for each
frame p is defined as:
Lc,p,k =
p
DCTc(1, 1) ∀c ∈ [0, 2] (4)
A DCT-energy function is introduced to determine the luma and chroma texture of every non-
overlapping block k in each frame p, which is defined as:
Hc,p,k =
w
X
i=1
w
X
j=1
e[( ij
w2 )2−1]
|DCTc(i − 1, j − 1)| ∀c ∈ [0, 2] (5)
where DCTc(i, j) is the (i, j)th DCT coefficient when i + j > 2, and 0 otherwise.14 The block-
wise texture per frame is averaged to determine the luma and chroma texture features (EY, EU,
EV) per video segment.
14
Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone,
Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url:
https://doi.org/10.1145/3524273.3532896.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 13
14. Video complexity analysis Proposed features
Proposed features
The block-wise difference of the luma texture energy of each frame compared to its previous
frame is calculated as:
hp,k =
| HY,p,k − HY,p−1,k |
w2
(6)
(a) original frame (b) EY (c) h (d) LY
Figure: Heatmap depiction of the luma texture information {EY, h, LY } extracted from the second
frame of CoverSong 1080P 0a86 video of Youtube UGC Dataset.15
15
Yilin Wang, Sasi Inguva, and Balu Adsumilli. “YouTube UGC Dataset for Video Compression Research”. In: 2019 IEEE 21st International Workshop on
Multimedia Signal Processing (MMSP). Sept. 2019. doi: 10.1109/mmsp.2019.8901772. url: https://doi.org/10.1109/mmsp.2019.8901772.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 14
15. Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
EY
0.17 0.24 0.3 0.34
0.86 0.86 0.85 0.84
0.2
0.4
0.6
0.8
(a) x264
QP22 QP27 QP32 QP37
SI
EY
0.18 0.24 0.32 0.37
0.86 0.88 0.87 0.85
0.2
0.4
0.6
0.8
(b) x265
Figure: PCC between the spatial complexity features (SI and EY) and bitrate in All Intra configuration16
with medium preset of x264 and x265 encoders for the VCD dataset.17
Bitrate in AI configuration is considered the spatial complexity’s ground truth.
EY correlates better with the spatial complexity than the state-of-the-art SI feature.
16
F. Bossen. “Common test conditions and software reference configurations”. In: JCTVC-L1100. Vol. 12. 2013, p. 7. url:
http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281.
17
Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. New York, NY,
USA: Association for Computing Machinery, 2022. isbn: 9781450392839. doi: 10.1145/3524273.3532892. url: https://doi.org/10.1145/3524273.3532892.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 15
16. Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
TI
EY
h
0 0.01 0.02 0.03
0.53 0.53 0.55 0.55
0.59 0.48 0.45 0.4
0.55 0.61 0.72 0.76
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(a) veryslow
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.01 0.01 0.03 0.04
0.54 0.55 0.56 0.57
0.6 0.52 0.47 0.43
0.57 0.64 0.73 0.76 0.1
0.2
0.3
0.4
0.5
0.6
0.7
(b) medium
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.06 0.06 0.08 0.08
0.54 0.56 0.58 0.61
0.67 0.58 0.52 0.47
0.58 0.65 0.73 0.78
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(c) ultrafast
Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with bitrate
in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for the VCD dataset.
The correlation of EY with bitrate increases as QP decreases. Similarly, the correlation of
h with bitrate decreases as QP decreases.
EY and h correlate well with the LDP configuration’s RD complexity and encoding run-time
complexity.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 16
17. Video complexity analysis Analysis of features
Analysis of features
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.2 0.21 0.22 0.23
0.71 0.72 0.74 0.76
0.47 0.35 0.26 0.21
0.67 0.71 0.77 0.81
0.2
0.3
0.4
0.5
0.6
0.7
0.8
(a) veryslow
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.22 0.23 0.26 0.28
0.66 0.68 0.73 0.77
0.41 0.33 0.28 0.27
0.59 0.66 0.74 0.78 0.3
0.4
0.5
0.6
0.7
(b) medium
QP22 QP27 QP32 QP37
SI
TI
EY
h
0.25 0.28 0.32 0.35
0.69 0.71 0.73 0.73
0.5 0.43 0.38 0.35
0.61 0.62 0.64 0.63 0.3
0.4
0.5
0.6
0.7
(c) ultrafast
Figure: PCC between the spatial complexity features (SI and EY) and temporal features (TI and h) with
encoding time in the Low Delay P picture (LDP) configuration with various presets of x265 encoder for
the VCD dataset.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 17
18. Video complexity analysis Analysis of features
Analysis of features
2160p 1080p 720p
2160p
1080p
720p
1 0.43 0.45
0.43 1 0.82
0.45 0.82 1
0.5
0.6
0.7
0.8
0.9
1.0
(a) SI
2160p 1080p 720p
2160p
1080p
720p
1 0.94 0.91
0.94 1 0.99
0.91 0.99 1
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
(b) EY
Figure: PCC between the spatial complexity features across multiple resolutions for the VCD dataset.
EY exhibits better correlation across resolutions, facilitating optimizations, including computa-
tions in lower resolutions.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 18
19. Video complexity analysis Performance optimizations
Performance optimizations18
1 x86 SIMD optimization
2 Multi-threading optimization
3 Low-pass analysis optimization
SITI VCA
no opt
VCA
SIMD opt
VCA
SIMD opt +
2 threads
VCA
SIMD opt +
4 threads
VCA
SIMD opt +
8 threads
VCA
SIMD opt +
8 threads +
low-pass DCT opt
0
50
100
150
200
250
300
350
Speed
(in
fps)
Figure: Speed of the proposed video complexity analysis using various performance optimizations.
18
Vignesh V Menon et al. “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming”. In: Proceedings of the First International
Workshop on Green Multimedia Systems. GMSys ’23. Vancouver, BC, Canada: Association for Computing Machinery, 2023, 16–18. isbn: 9798400701962. doi:
10.1145/3593908.3593942. url: https://doi.org/10.1145/3593908.3593942.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 19
20. Online per-title encoding
Online per-title encoding
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 20
21. Online per-title encoding Dynamic resolution encoding
Dynamic resolution encoding
Dynamic resolution per-title encoding schemes are based on the fact that one resolution
performs better than others in a scene for a given bitrate range, and these regions depend
on the video complexity.
Dynamic resolution prediction seeks to strike an equilibrium between delivering optimal
visual quality and conserving bandwidth resources.
The adaptive streaming system can anticipate the ideal resolution for each segment in real-
time by harnessing predictive models, often based on historical data19 or machine learning
techniques.20
Dynamic resolution optimization approaches developed in the industry– from Bitmovin,21
MUX,22 and CAMBRIA23 are proprietary.
19
Venkata Phani Kumar M, Christian Timmerer, and Hermann Hellwagner. “MiPSO: Multi-Period Per-Scene Optimization For HTTP Adaptive Streaming”. In:
2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102775.
20
Madhukar Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “Combining Video Quality Metrics To Select Perceptually Accurate Resolution In A Wide Quality
Range: A Case Study”. In: 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 2164–2168. doi: 10.1109/ICIP42928.2021.9506310.
21
Gernot Zwantschko. “What is Per-Title Encoding? How to Efficiently Compress Video”. In: Bitmovin Developers Blog. Nov. 2020. url:
https://bitmovin.com/what-is-per-title-encoding/.
22
Jon Dahl. “Instant Per-Title Encoding”. In: Mux Video Education Blog. Apr. 2018. url: https://www.mux.com/blog/instant-per-title-encoding.
23
Capella. “Save Bandwidth and Improve Viewer Quality of Experience with Source Adaptive Bitrate Ladders”. In: CAMBRIA FTC. url:
https://capellasystems.net/wp-content/uploads/2021/01/CambriaFTC_SABL.pdf.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 21
22. Online per-title encoding Dynamic resolution encoding
Convex-hull estimation
(a) (b)
Figure: Rate-Distortion (RD) curves using VMAF24
as the quality metric of (a) Beauty and Golf sequences
of UVG and BVI datasets encoded at 540p and 1080p resolutions, and (b) Lake sequence of MCML
dataset encoded at a set of bitrates and resolutions to determine the convex hull.
24
Zhi Li et al. “VMAF: The Journey Continues”. In: Netflix Technology Blog. Oct. 2018. url:
https://netflixtechblog.com/vmaf-the-journey-continues-44b51ee9ed12.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 22
23. Online per-title encoding Dynamic resolution encoding
Online Resolution Prediction Scheme (ORPS) architecture25
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of resolutions
Set of bitrates
Optimized resolution
prediction
Features
2
Input video segment Representations
CBR
encoding
3
Figure: Encoding using ORPS for adaptive live streaming.
The encoding process is carried out only for the predicted bitrate-resolution pairs for each segment as constant
bitrate (CBR) encodings, thereby eliminating the need to encode in all bitrates and resolutions to find the
optimized bitrate-resolution pairs to yield maximum VMAF.
25
V. V. Menon et al. “OPTE: Online Per-Title Encoding for Live Video Streaming”. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). 2022, pp. 1865–1869. doi: 10.1109/ICASSP43922.2022.9746745. url:
https://doi.org/10.1109/ICASSP43922.2022.9746745.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 23
24. Online per-title encoding Dynamic resolution encoding
Proposed optimized resolution estimation
Figure: Optimized resolution (ˆ
rt ) prediction for a given target bitrate (bt ). v̂t is the maximum value among the
v̂r,b̂t
values output from the predicted models trained for resolutions r1, . . . , r˜
r . The resolution corresponding to
the maximum predicted VMAF is chosen as ˆ
rt .
VMAF is modeled as a function of the spatiotemporal features {EY, h, LY }, target resolution
(rt), target encoding bitrate (bt), encoding framerate (ft) and encoding preset (pt):
v(rt ,bt ,ft ,pt ) = fV (EY, h, LY, rt, bt, ft, pt) ; (7)
Random Forest regression models are trained for every supported resolution to predict VMAF.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 24
25. Online per-title encoding Dynamic resolution encoding
Evaluation of ORPS
Table: Experimental parameters used to evaluate ORPS.
Parameter Symbol Values
Set of resolution heights (in pixels) R {360, 432, 540, 720, 1080, 1440, 2160}
Target bitrate (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800
Set of framerates (in fps) F 30 50 60
Set of presets [x265] P {0 (ultrafast)}
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
360p
540p
720p
1080p
1440p
2160p
Resolution
height
(in
pixels)
HLS CBR
ORPS CBR
(d) Wood s000
Figure: The resolution predictions of representative video segments. HLS CBR encoding is represented
using the green line, and ORPS CBR encoding is represented using the red line.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 25
27. Online per-title encoding Dynamic framerate encoding
Motivation for variable framerate (VFR)26
encoding
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
VMAF
120fps
60fps
30fps
24fps
(a) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
VMAF
120fps
60fps
30fps
24fps
(b) Lips
Figure: RD curves of UHD encodings of two representative HFR sequences from UVG dataset for
multiple framerates.
Dynamic framerate per-title encoding schemes are based on the fact that one framerate performs
better than others in a scene for a given bitrate range, and these regions depend on the video
complexity.
26
Alex Mackin et al. “Investigating the impact of high frame rates on video compression”. In: 2017 IEEE International Conference on Image Processing (ICIP).
2017, pp. 295–299. doi: 10.1109/ICIP.2017.8296290. url: https://doi.org/10.1109/ICIP.2017.8296290.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 27
28. Online per-title encoding Dynamic framerate encoding
State-of-the-art VFR encoding
Temporal
downsampling
Temporal
upsampling
Spatial
downsampling
Spatial
upsampling
Encoder
Decoder
Reconstructed video segment
2160p 120fps
2160p 60fps 1080p 60fps
1080p 60fps
2160p 60fps
Input video segment
2160p 120fps
CBR 8.1 Mbps
Framerate
selection
60 fps
Source
Display
Figure: Block diagram of a variable framerate (VFR) coding scheme27
in the context of video encoding
for adaptive streaming. This example encodes a video segment of UHD resolution and native framerate
120fps in representation (1080p, 8.1 Mbps) with the selected framerate of 60 fps. It also illustrates the
corresponding operations on the client side. Red dashed blocks indicate the additional steps introduced
compared to the traditional bitrate ladder encoding.
27
G. Herrou et al. “Quality-driven Variable Frame-Rate for Green Video Coding in Broadcast Applications”. In: IEEE Transactions on Circuits and Systems for
Video Technology. 2020, pp. 1–1. doi: 10.1109/TCSVT.2020.3046881. url: https://doi.org/10.1109/TCSVT.2020.3046881.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 28
29. Online per-title encoding Dynamic framerate encoding
Online Framerate Prediction Scheme (OFPS) architecture28
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of representations
Set of framerates
Optimized framerate
prediction
Features
2
Input video segment
Representations
CBR
encoding
3
Figure: Encoding architecture using OFPS for adaptive live streaming.
28
V. V. Menon et al. “CODA: Content-aware Frame Dropping Algorithm for High Frame-rate Video Streaming”. In: 2022 Data Compression Conference
(DCC). 2022, pp. 475–475. doi: 10.1109/DCC52660.2022.00086. url: https://doi.org/10.1109/DCC52660.2022.00086.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 29
30. Online per-title encoding Dynamic framerate encoding
Proposed optimized framerate estimation
Figure: Optimized framerate (ˆ
ft) prediction for a given target representation (rt, bt). v̂t is the maximum
value among the v̂rt ,bt ,f values output from the predicted models trained for framerates f1, . . . , f˜
f . The
framerate corresponding to the maximum predicted VMAF is chosen as ˆ
ft.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 30
31. Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
Table: Experimental parameters used to evaluate OFPS.
Parameter Symbol Values
Set of resolution heights (in pixels) R {1080, 2160}
Set of target bitrates (in Mbps) B 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800
Set of framerates (in fps) F {20, 24, 30, 40, 60, 90, 120}
Set of presets [x265] P {8 (veryslow)}
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(a) Beauty
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(b) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
20
40
60
80
100
120
Framerate
(in
fps)
Ground truth (fG)
Predicted (f)
(c) ShakeNDry
Figure: Optimized framerate prediction results of representative sequences of UVG dataset. Please note
that the optimized framerate in various bitrates differs depending on the content complexity.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 31
32. Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
10
20
30
40
50
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(a) Beauty
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
10
15
20
25
30
35
40
45
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(b) HoneyBee
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
VMAF
Default (120fps)
OFPS (VFR)
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
Time
(in
seconds)
Default (120fps)
OFPS (VFR)
(c) ShakeNDry
Figure: RD curves and encoding times of representative sequences of UVG dataset.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 32
33. Online per-title encoding Dynamic framerate encoding
Evaluation of OFPS
Table: Results of OFPS-based encodings.
Dataset Resolution Video fmax BDRP BDRV ∆T
JVET
1080p CatRobot 60 -11.23% -12.22% -23.93%
1080p DaylightRoad2 60 -9.24% -8.93% -9.33%
1080p FoodMarket4 60 -5.73% -7.12% -10.80%
2160p CatRobot 60 -13.91% -15.43% -27.62%
2160p DaylightRoad2 60 -10.36% -11.21% -12.97%
2160p FoodMarket4 60 -7.37% -6.91% -12.01%
UVG
1080p Beauty 120 -8.18% -20.01% -18.01%
1080p Bosphorus 120 -15.66% -17.58% -23.03%
1080p Lips 120 0.00% 0.00% 0.00%
1080p HoneyBee 120 -16.96% -10.87% -30.11%
1080p Jockey 120 -0.10% -1.22% -5.45%
1080p ReadySteadyGo 120 -2.32% -5.00% -19.76%
1080p ShakeNDry 120 -11.15% -34.41% -25.59%
1080p YachtRide 120 -18.35% -9.15% -12.17%
2160p Beauty 120 -18.97% -24.83% -38.43%
2160p Bosphorus 120 -27.63% -26.93% -23.90%
2160p Lips 120 -27.12% -34.22% -19.13%
2160p HoneyBee 120 -36.14% -42.37% -28.91%
2160p Jockey 120 -2.92% -2.20% -13.47%
2160p ReadySteadyGo 120 -0.36% -2.92% -18.30%
2160p ShakeNDry 120 -22.82% -28.46% -33.66%
2160p YachtRide 120 -7.01% -4.69% -11.60%
BVI-HFR
1080p catch 120 -7.84% -8.99% -14.88%
1080p golf side 120 -4.10% -3.23% -9.87%
Average (1080p) -8.53% -10.67% -15.61%
Average (2160p) -15.87% -18.20% -21.82%
On average, UHD encoding using
OFPS requires 15.87% fewer bits to
maintain the same PSNR and 18.20%
fewer bits to keep the same VMAF as
compared to the original framerate
encoding.
An overall encoding time reduction of
21.82% is also observed.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 33
34. Online per-title encoding Perceptually-aware bitrate ladder prediction
Perceptual redundancy in bitrate ladder
0.2 0.5 1.2 4.5 16.8
Bitrate (in Mbps)
50
60
70
80
90
100
VMAF
(a)
b1 b2 b3 b4 b5 b6 b7
Bitrate
v1
v2 =v1 +vJ(v1)
v3 =v2 +vJ(v2)
v4 =v3 +vJ(v3)
v5 =v4 +vJ(v4)
v6 =v5 +vJ(v5)
vmax
v7 =v6 +vJ(v6)
Quality
metric
(b)
Figure: Rate distortion curves of (a) the HLS bitrate ladder encoding of Characters sequence of MCML dataset,29
(b) the ideal bitrate ladder targeted.
Having many perceptually redundant representations for the bitrate ladder may not result in improved quality
of experience, but it may lead to increased storage and bandwidth costs.30
29
Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE
Transactions on Circuits and Systems for Video Technology. Vol. 28. 7. 2018, pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504. url:
https://doi.org/10.1109/TCSVT.2017.2683504.
30
Tianchi Huang, Rui-Xiao Zhang, and Lifeng Sun. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: NOSSDAV ’21. Istanbul, Turkey:
Association for Computing Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. url:
https://doi.org/10.1145/3458306.3458873.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 34
35. Online per-title encoding Perceptually-aware bitrate ladder prediction
JND-aware bitrate ladder prediction scheme (JBLS) architecture31
Target encoder/ codec
1
Video Complexity
Feature Extraction
Set of resolutions
Maximum bitrate
Bitrate ladder
prediction
Features
2
Input video segment
Minimum bitrate
Target JND function
Maximum VMAF
Set of framerates
Representations
CBR
encoding
Figure: Online encoding architecture using JBLS for adaptive streaming.
31
V. V. Menon et al. “Perceptually-Aware Per-Title Encoding for Adaptive Video Streaming”. In: 2022 IEEE International Conference on Multimedia and Expo
(ICME). Los Alamitos, CA, USA: IEEE Computer Society, July 2022, pp. 1–6. doi: 10.1109/ICME52920.2022.9859744. url:
https://doi.ieeecomputersociety.org/10.1109/ICME52920.2022.9859744.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 35
36. Online per-title encoding Perceptually-aware bitrate ladder prediction
First RD point estimation
Figure: Estimation of the first point of the bitrate ladder.
v̂1 is the maximum value among the v̂r,f ,b̂1
values output
from the predicted models trained for resolutions r1, .., r˜
r in
R, and framerates f1, .., f˜
f in F. The resolution-framerate
pair corresponding to the VMAF v̂1 is chosen as ˆ
r1 and ˆ
f1,
respectively.
Step 1:
b̂1 ← bmin
Determine v̂r,f ,b̂1
∀r ∈ R, f ∈ F
v̂1 ← max(v̂r,f ,b̂1
)
(ˆ
r1, ˆ
f1) ← arg maxr∈R,f ∈F (v̂r,f ,b̂1
)
(ˆ
r1, ˆ
f1, b̂1) is the first point of the bitrate ladder.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 36
37. Online per-title encoding Perceptually-aware bitrate ladder prediction
Remaining RD points estimation
Figure: Estimation of the tth
point (t ≥ 2) of the bitrate
ladder. log(b̂t ) is the minimum value among the log(b̂r,v̂t )
values output from the predicted models trained for reso-
lutions r1, .., rM . The resolution corresponding to log(b̂t )
is chosen as ˆ
rt .
Step 2:
while b̂t−1 < bmax and v̂t−1 < vmax do
v̂t ← v̂t−1 + vJ (v̂t−1)
Determine b̂r,f ,v̂t ∀r ∈ R, f ∈ F
b̂t ← min(b̂r,f ,v̂t )
(ˆ
rt , ˆ
ft ) ← arg minr∈R,f ∈F (b̂r,f ,v̂t )
(ˆ
rt , ˆ
ft , b̂t ) is the tth
point of the bitrate ladder.
t ← t + 1
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 37
38. Online per-title encoding Perceptually-aware bitrate ladder prediction
Evaluation of JBLS
Table: Experimental parameters used to evaluate JBLS.
Parameter Symbol Values
Set of resolution heights (in pixels) R 360 432 540 720 1080 1440 2160
Set of framerates (in fps) F {30}
Set of presets [x265] P {0 (ultrafast)}
Minimum target bitrate (in Mbps) bmin 0.145
Maximum target bitrate (in Mbps) bmax 16.8
Average target JND vJ 2 4 6
Maximum VMAF threshold vmax 98 96 94
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS CBR
JBLS
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
HLS CBR
JBLS
(d) Wood s000
Figure: RD curves of representative video sequences (segments) using the HLS bitrate ladder CBR
encoding (green line), and JBLS encoding (red line). JND is considered as six VMAF points in these
plots.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 38
39. Online per-title encoding Perceptually-aware bitrate ladder prediction
Evaluation of JBLS
Table: Average results of JBLS compared to the HLS bitrate ladder CBR encoding.
Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T
JBLS (vJ =2)32 -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73%
JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19%
JBLS (vJ =6)33 -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35%
Live streaming using JBLS requires 12.94% fewer bits to maintain the same PSNR and
17.94% fewer bits to maintain the same VMAF compared to the reference HLS bitrate
ladder.
The improvement in the compression efficiency is achieved with an average storage reduc-
tion of 42.48% and an average encoding time reduction of 25.35% compared to HLS bitrate
ladder CBR encoding, considering a JND of six VMAF points.
32
Andreas Kah et al. “Fundamental relationships between subjective quality, user acceptance, and the VMAF metric for a quality-based bit-rate ladder design for
over-the-top video streaming services”. In: Applications of Digital Image Processing XLIV. vol. 11842. International Society for Optics and Photonics. SPIE,
2021, 118420Z. doi: 10.1117/12.2593952. url: https://doi.org/10.1117/12.2593952.
33
Jan Ozer. “Finding the Just Noticeable Difference with Netflix VMAF”. In: Sept. 2017. url:
https://streaminglearningcenter.com/codecs/finding-the-just-noticeable-difference-with-netflix-vmaf.html.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 39
40. Live variable bitrate encoding
Live variable bitrate encoding
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 40
41. Live variable bitrate encoding Two-pass encoding
Two-pass encoding
Encoder Encoder
statistics
Input video
control
parameters
analysis of
first pass
statistics
bitstream
first pass second pass
Figure: Two-pass encoding architecture.
Two-pass encoding introduces adaptability and content awareness into the encoding pro-
cess.34
In the first pass, the encoder analyzes the entire video sequence to gain insights into its
complexity, motion, and spatial detail.
In the second pass, based on the insights from the first pass, the encoder dynamically adjusts
the bitrate allocation for each segment, prioritizing quality where needed and optimizing
compression elsewhere.35
34
Chengsheng Que, Guobin Chen, and Jilin Liu. “An Efficient Two-Pass VBR Encoding Algorithm for H.264”. In: 2006 International Conference on
Communications, Circuits and Systems. Vol. 1. 2006, pp. 118–122. doi: 10.1109/ICCCAS.2006.284599.
35
Ivan Zupancic et al. “Two-pass rate control for UHDTV delivery with HEVC”. In: 2016 Picture Coding Symposium (PCS). 2016, pp. 1–5. doi:
10.1109/PCS.2016.7906322.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 41
42. Live variable bitrate encoding Two-pass encoding
LiveVBR architecture36
Representations
Target encoder/ codec
1
4
Video Complexity
Feature Extraction
Set of resolutions
Maximum bitrate
Perceptually-optimized
bitrate ladder
prediction
cVBR
encoding
Features
2
Input video segment
Minimum bitrate
Target JND function
Maximum VMAF
Optimized
CRF prediction
3
Set of framerates
First pass
Second pass
Figure: Live encoding architecture featuring LiveVBR envisioned in this chapter.
36
Vignesh V Menon et al. “JND-aware Two-pass Per-title Encoding Scheme for Adaptive Live Streaming”. In: IEEE Transactions on Circuits and Systems for
Video Technology. 2023, pp. 1–1. doi: 10.1109/TCSVT.2023.3290725. url: https://doi.org/10.1109/TCSVT.2023.3290725.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 42
43. Live variable bitrate encoding Optimized CRF prediction
Live-VBR
cVBR encoding of the bitrate ladder37
Figure: Optimized CRF estimation for the tth
representation to achieve the target bitrate b̂t using a
prediction model trained for resolution ˆ
rt, and framerate ˆ
ft.
Optimized CRF is determined for the selected (r, b, f ) pairs.
cVBR encoding for the (r, b, f , c) pairs is performed.
37
Vignesh V Menon et al. “ETPS: Efficient Two-Pass Encoding Scheme for Adaptive Live Streaming”. In: 2022 IEEE International Conference on Image
Processing (ICIP). 2022, pp. 1516–1520. doi: 10.1109/ICIP46576.2022.9897768. url: https://doi.org/10.1109/ICIP46576.2022.9897768.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 43
44. Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
Table: Input parameters of LiveVBR used in the experiments.
Parameter Symbol Values
Set of resolutions (height in pixels) R { 360, 432, 540, 720, 1080, 1440, 2160 }
Set of framerates (in fps) F { 30 }
Set of presets P { ultrafast }
Minimum bitrate (in Mbps) bmin 0.145
Maximum bitrate (in Mbps) bmax 16.80
Average target JND vJ {2, 4, 6 }
Maximum VMAF threshold vmax {98, 96,94 }
Table: Comparison of other per-title encoding methods with LiveVBR, regarding the target scenario, number of
per-encodings, encoding type, and the additional computational overhead to determine the convex-hull.
Method Target scenario Number of pre-encodings Encoding type ∆TC
Bruteforce38 VoD ˜
r × c̃ cVBR 4596.77%
Katsenou et al.39 VoD (˜
r − 1) × 2 CQP 120.57%
FAUST40 VoD 1 CBR 48.65%
Bhat et al.41 VoD 1 CBR 67.82%
ORPS Live 0 CBR 0.30%
JBLS Live 0 CBR 0.33%
LiveVBR Live 0 cVBR 0.41%
38
De Cock et al., “Complexity-based consistent-quality encoding in the cloud”.
39
A. V. Katsenou, J. Sole, and D. R. Bull. “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming”. In: 2019 Picture Coding Symposium
(PCS). 2019. doi: 10.1109/PCS48520.2019.8954529.
40
Anatoliy Zabrovskiy et al. “FAUST: Fast Per-Scene Encoding Using Entropy-Based Scene Detection and Machine Learning”. In: 2021 30th Conference of
Open Innovations Association FRUCT. 2021, pp. 292–302. doi: 10.23919/FRUCT53335.2021.9599963. url:
https://doi.org/10.23919/FRUCT53335.2021.9599963.
41
M. Bhat, Jean-Marc Thiesse, and Patrick Le Callet. “A Case Study of Machine Learning Classifiers for Real-Time Adaptive Resolution Prediction in Video
Coding”. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). 2020, pp. 1–6. doi: 10.1109/ICME46284.2020.9102934.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 44
45. Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
Table: Average results of the encoding schemes compared to the HLS bitrate ladder CBR encoding.
Method BDRP BDRV BD-PSNR BD-VMAF ∆S ∆T
Bruteforce (vJ =2) -23.09% -43.23% 1.34 dB 10.61 -25.99% 4732.33%
Bruteforce (vJ =4) -28.15% -42.75% 1.70 dB 10.08 -59.07% 4732.33%
Bruteforce (vJ =6) -25.36% -40.73% 1.67 dB 9.19 -70.50% 4732.33%
ORPS CBR -17.28% -22.79% 0.98 dB 3.79 0.07% 15.74%
JBLS (vJ =2) -11.06% -16.65% 0.87 dB 2.18 10.18% 105.73%
JBLS (vJ =4) -10.44% -15.13% 0.91 dB 2.39 -27.03% 10.19%
JBLS (vJ =6) -12.94% -17.94% 0.94 dB 2.32 -42.48% -25.35%
HLS cVBR -35.25% -32.33% 2.09 dB 6.59 -9.39% 1.64%
ORPS cVBR -34.42% -42.67% 2.90 dB 9.51 -1.34% 62.73%
LiveVBR (vJ =2) -14.25% -29.14% 1.36 dB 7.82 23.57% 184.62%
LiveVBR (vJ =4) -18.41% -32.48% 1.41 dB 8.31 -56.38% 26.14%
LiveVBR (vJ =6) -18.80% -32.59% 1.34 dB 8.34 -68.96% -18.58%
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 45
46. Live variable bitrate encoding Optimized CRF prediction
Evaluation of LiveVBR
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(a) Bunny s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(b) Characters s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(c) HoneyBee s000
0.2 0.5 1.2 3.0 8.0 16.8
Bitrate (in Mbps)
30
40
50
60
70
80
90
100
VMAF
Bruteforce
HLS CBR
ORPS CBR
JBLS
HLS cVBR
ORPS cVBR
LiveVBR
(d) Wood s000
Figure: RD curves of representative video sequences (segments) using the considered encoding schemes.
JND is considered as six VMAF points in these plots.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 46
47. Conclusions and Future Directions
Conclusions and Future Directions
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 47
48. Conclusions and Future Directions Contributions
Contributions
Video complexity analysis:
Efficient DCT-energy-based spatial and temporal complexity features are proposed to ana-
lyze video complexity accurately and quickly. These features are suitable for live-streaming
applications as they are low complexity and significantly correlate to video coding parame-
ters.
Online per-title encoding optimizations:
Online resolution prediction scheme (ORPS) predicts optimized resolution yielding the high-
est perceptual quality using the video content complexity of the segment and the predefined
set of target bitrates.
Online framerate prediction scheme (OFPS) predicts the optimized framerate yielding the
highest perceptual quality using the video content complexity of the segment and the
predefined set of target bitrates.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 48
49. Conclusions and Future Directions Contributions
Contributions
Just noticeable difference (JND)-aware bitrate ladder prediction scheme (JBLS) predicts
optimized bitrate-resolution-framerate pairs such that there is a perceptual quality difference
of one JND between representations.
Constrained variable bitrate (cVBR) implementation of JBLS, i.e., LiveVBR, yields an av-
erage bitrate reduction of 18.80% and 32.59% for the same PSNR and VMAF, respectively,
compared to the HLS CBR bitrate ladder encoding using x265. For a target JND of six
VMAF points, it was observed that the application of LiveVBR resulted in a 68.96% reduc-
tion in storage space and an 18.58% reduction in encoding time, with a negligible impact
on streaming latency.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 49
50. Conclusions and Future Directions Reproducibility
Reproducibility
VCA is available at https://github.com/cd-athena/VCA. This initiative translates the
proposed video complexity analysis into a practical open-source implementation.
The open-source Python code of LiveVBR is available at https://github.com/cd-athena/
LiveVBR.
Video complexity
analysis
Encoder
Features
Input video segment Encoded bitstream
Application
Figure: Content-adaptive encoding using VCA.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 50
51. Conclusions and Future Directions Limitations
Limitations
1 Dynamic network conditions: The interplay between content complexity and real-time
network fluctuations is not extensively addressed.
2 Generalization across video genres: The generalization of the framework to highly spe-
cialized genres or unique content types may present challenges.
3 Real-time implementation challenges: While developed and evaluated offline, the content-
adaptive video coding framework poses challenges in real-time implementation, considering
computational efficiency and latency constraints.
4 Subjective quality assessment: Incorporating subjective quality assessment methods,
such as user studies or crowdsourced evaluations, could offer a more comprehensive under-
standing of the framework’s impact on viewer satisfaction.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 51
52. Conclusions and Future Directions Future directions
Future directions
1 Address the escalated runtime complexity inherent in encoding representations for multiple
codecs and representations, all while upholding the compression efficiency of the system.
2 Achieve zero-latency encoding in adaptive live-streaming scenarios using new-generation
codecs by synergizing dynamic resolution, bitrate, framerate, and encoding resource con-
figuration.42
3 Extend the per-title encoding schemes proposed in this dissertation to scenarios involving
transcoding in networking servers.43
42
Vignesh V Menon et al. “Content-adaptive Encoder Preset Prediction for Adaptive Live Streaming”. In: 2022 Picture Coding Symposium (PCS). 2022,
pp. 253–257. doi: 10.1109/PCS56426.2022.10018034. url: https://doi.org/10.1109/PCS56426.2022.10018034.
43
Reza Farahani. “CDN and SDN Support and Player Interaction for HTTP Adaptive Video Streaming”. In: Proceedings of the 12th ACM Multimedia Systems
Conference. Istanbul, Turkey: Association for Computing Machinery, 2021, 398–402. isbn: 9781450384346. doi: 10.1145/3458305.3478464. url:
https://doi.org/10.1145/3458305.3478464.
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 52
53. Q & A
Q & A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@ieee.org)
Vignesh V Menon Content-adaptive Video Coding for HTTP Adaptive Streaming 53