SlideShare a Scribd company logo
Transcoding Quality Prediction for Adaptive Video Streaming
Vignesh V Menon1, Reza Farahani1, Prajit T Rajendran2, Mohammad Ghanbari3, Hermann
Hellwagner1, Christian Timmerer1
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
CEA, List, F-91120 Palaiseau, Université Paris-Saclay, France
3
School of Computer Science and Electronic Engineering, University of Essex, UK
08 May 2023
Transcoding Quality Prediction for Adaptive Video Streaming 1
Outline
1 Introduction
2 M-stage transcoding model
3 TQPM Architecture
4 Evaluation
5 Conclusions
Transcoding Quality Prediction for Adaptive Video Streaming 2
Introduction
Introduction
HTTP Adaptive Streaming (HAS)1
Source: https://bitmovin.com/adaptive-streaming/
Why Adaptive Streaming?
Adapt for a wide range of devices.
Adapt for a broad set of Internet speeds.
What HAS does?
Each source video is split into segments.
Encoded at multiple bitrates, resolutions,
and codecs.
Delivered to the client based on the device
capability, network speed etc.
1
A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019),
pp. 562–585.
Transcoding Quality Prediction for Adaptive Video Streaming 3
Introduction
Motivation
Video transcoding has been considered a prevalent solution for reconstructing video sequences at
in-network servers (deployed at cloud or edge) in latency-sensitive video streaming applications.
e
Original
Video
Segment
Highest Bitrate
Representation
1 d1 2
e
2160p, 16.8 Mbps
2160p, 16.8 Mbps
2160p, 11.6 Mbps
1080p, 5.8 Mbps
720p, 2.4 Mbps
Origin Server
Edge Server
ClientA
Client
Client
Client
B
D
Client
C
E
Transcoding
1
1
2
2
2
Figure: An example scenario of VQA in adaptive streaming applications. Clients A and B receive the
highest bitrate representation of the bitrate ladder, encoded at the origin server (single-stage transcoding).
In contrast, Clients C, D, and E receive lower bitrate representations transcoded at the edge server (two-
stage transcoding).
Transcoding Quality Prediction for Adaptive Video Streaming 4
Introduction
Motivation
Features
Quality Scores Processor
Input
Video
Reconstructed
Video
Video Processing
Features
Figure: Workflow of state-of-the-art VQA methods.
VQA is cumbersome in most video streaming applications where:
The original input video segment is not available as the reference at the destination
The final reconstructed video segment is not available at the source
Slow quality-based decision-making is not acceptable for online latency-sensitive services.
Transcoding Quality Prediction for Adaptive Video Streaming 5
M-stage transcoding model
M-stage transcoding model
e d e d e d
Input
Video
Segment
Reconstructed
Video
Segment
M-Stage Transcoding System Model
...
b
~
1 b
~
2 b
~
M
1 1 2 2 M M
Stage 1 Stage 2 Stage M
Figure: M-stage transcoding model considered in this paper. Here, ei and di represent the encoding and
decoding in ith
stage of transcoding, while b̃i denotes the target bitrate of ei where i ∈ [1, M].
The generalized M-stage transcoding model for HAS consists of a series of M encoders and
M decoders in a chain.
M=1 transcoding corresponds to the single-stage transcoding while M=2 transcoding corre-
sponds to the two-stage transcoding.
Transcoding Quality Prediction for Adaptive Video Streaming 6
M-stage transcoding model
M-stage transcoding model
e d e d e d
Input
Video
Segment
Reconstructed
Video
Segment
M-Stage Transcoding System Model
...
b
~
1 b
~
2 b
~
M
1 1 2 2 M M
Stage 1 Stage 2 Stage M
τT =
M
X
i=1
(τei + τdi
) + 2 · τf (1)
VQA at source by predicting the video quality using the input video segment characteristics and
the transcoding system characteristics solves the discussed problems.
Transcoding Quality Prediction for Adaptive Video Streaming 7
TQPM Architecture
TQPM Architecture
...
...
...
E
h
L L L
h h
E E
1
1
1
T
T
X
X1 T
Video
Quality
Prediction
Model bM
~
V
^
b2
~ b1
~
...
Input
Video
Segment
E T
b1
~
b2
~
.
.
.
bM
~
[T x (M+3)]
I
II
I Input Video Segment Characteristics
II M-Stage Transcoding Model Characteristics
~
Figure: TQPM architecture
The TQPM architecture comprises three steps:
input video segment characterization
transcoding model Characterization
video quality prediction
Transcoding Quality Prediction for Adaptive Video Streaming 8
TQPM Architecture Phase 1: Input video segment characterization
Input video segment characterization
Compute texture energy per block
A DCT-based energy function is used to determine the block-wise feature of each frame
defined as:
Hk =
w−1
X
i=0
w−1
X
j=0
e|( ij
wh
)2−1|
|DCT(i, j)| (2)
where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when
i + j > 0, and 0 otherwise.
The energy values of blocks in a frame are averaged to determine the energy per frame.2,3
Es =
K−1
X
k=0
Hs,k
K · w2
(3)
2
Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on
Multimedia and Expo. 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983.
3
Vignesh V Menon et al. “Efficient Content-Adaptive Feature-Based Shot Detection for HTTP Adaptive Streaming”. In: 2021 IEEE International Conference
on Image Processing (ICIP). 2021, pp. 2174–2178. doi: 10.1109/ICIP42928.2021.9506092.
Transcoding Quality Prediction for Adaptive Video Streaming 9
TQPM Architecture Phase 1: Input video segment characterization
Input video segment characterization
hs: SAD of the block level energy values of frame s to that of the previous frame s − 1.
hs =
K−1
X
k=0
| Hs,k, Hs−1,k |
K · w2
(4)
where K denotes the number of blocks in frame s.
The luminescence of non-overlapping blocks k of sth frame is defined as:
Ls,k =
p
DCT(0, 0) (5)
The block-wise luminescence is averaged per frame denoted as Ls as shown below.4
Ls =
K−1
X
k=0
Ls,k
K · w2
(6)
4
Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone,
Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url:
https://doi.org/10.1145/3524273.3532896.
Transcoding Quality Prediction for Adaptive Video Streaming 10
TQPM Architecture Phase 1: Input video segment characterization
Input video segment characterization
The video segment is divided into T chunks with a fixed number of frames (i.e., fc) in each
chunk. The averages of the E, h, and L features of each chunk are computed to obtain the
reduced reference representation of the input video segment, expressed as:
X = {x1, x2, .., xT } (7)
where, xi is the feature set of every ith chunk, represented as :
xi = [Ei , hi , Li ] ∀i ∈ [1, T] (8)
Transcoding Quality Prediction for Adaptive Video Streaming 11
TQPM Architecture Phase 2: Transcoding model Characterization
Phase 2: Transcoding model Characterization
e d e d e d
Input
Video
Segment
Reconstructed
Video
Segment
M-Stage Transcoding System Model
...
b
~
1 b
~
2 b
~
M
1 1 2 2 M M
Stage 1 Stage 2 Stage M
The settings of the encoders in the M-stage transcoding process, except the target bitrate-
resolution pair, are assumed identical.5
The resolutions corresponding to the target bitrates in the bitrate ladder are also assumed
to be fixed.
The transcoding model can be characterized as follows:
B̃ = [b̃1, b̃2, .., b̃M] (9)
where b̃i represents the target bitrate of the ei encoder.
5
Vignesh V Menon et al. “EMES: Efficient Multi-Encoding Schemes for HEVC-Based Adaptive Bitrate Streaming”. In: ACM Trans. Multimedia Comput.
Commun. Appl. 19.3s (2023). issn: 1551-6857. doi: 10.1145/3575659. url: https://doi.org/10.1145/3575659.
Transcoding Quality Prediction for Adaptive Video Streaming 12
TQPM Architecture Phase 3: Video quality prediction
Phase 3: Video quality prediction
B̃ is appended to xi , which is determined during the input video segment characterization phase,
to obtain:
˜
xi = [xi |B̃]T
∀˜
xi ∈ X̃, i ∈ [1, T] (10)
The predicted quality v̂b̃M |..|b̃1
can be presented as:
v̂b̃M |..|b̃1
= f (X̃) (11)
The feature sequences in the series X̃ are input to the LSTM model, which predicts visual quality
v̂ for the corresponding input video segment and chain of encoders in the transcoding process.
Transcoding Quality Prediction for Adaptive Video Streaming 13
Evaluation Experimental Setup
Experimental Setup
Dataset : JVET,6 MCML,7 SJTU,8 Berlin,9 UVG,10 BVI11
Framerate : 30fps
Encoder : x265 v3.5
Preset : ultrafast
Table: Representations considered in this paper.
Representation ID 01 02 03 04 05 06 07 08 09 10 11 12
r (width in pixels) 360 432 540 540 540 720 720 1080 1080 1440 2160 2160
b (in Mbps) 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800
E, h, L features are extracted using VCAv2.0.
6
Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018.
7
Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE
Transactions on Circuits and Systems for Video Technology 28.7 (2018), pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504.
8
Li Song et al. “The SJTU 4K Video Sequence Dataset”. In: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013) (July 2013).
9
B. Bross et al. “AHG4 Multiformat Berlin Test Sequences”. In: JVET-Q0791. 2020.
10
Alexandre Mercat, Marko Viitanen, and Jarno Vanne. “UVG Dataset: 50/120fps 4K Sequences for Video Codec Analysis and Development”. In: Proceedings
of the 11th ACM Multimedia Systems Conference. New York, NY, USA: Association for Computing Machinery, 2020, 297–302. isbn: 9781450368452. url:
https://doi.org/10.1145/3339825.3394937.
11
Alex Mackin, Fan Zhang, and David R. Bull. “A study of subjective video quality at various frame rates”. In: 2015 IEEE International Conference on Image
Processing (ICIP). 2015, pp. 3407–3411. doi: 10.1109/ICIP.2015.7351436.
Transcoding Quality Prediction for Adaptive Video Streaming 14
Evaluation Experimental Results
Experimental Results
20 30 40 50
Actual PSNR (in dB)
20
30
40
50
Predicted
PSNR
(in
dB)
(a)
10 20 30 40
Actual SSIM (in dB)
10
20
30
40
Predicted
SSIM
(in
dB)
(b)
0 25 50 75 100
Actual VMAF
0
20
40
60
80
100
Predicted
VMAF
JND threshold
(c)
20 30 40 50
Actual PSNR (in dB)
20
30
40
50
Predicted
PSNR
(in
dB)
(d)
10 20 30 40
Actual SSIM (in dB)
10
20
30
40
Predicted
SSIM
(in
dB)
(e)
0 25 50 75 100
Actual VMAF
0
20
40
60
80
100
Predicted
VMAF
JND threshold
(f)
Figure: Scatterplots of the actual quality and predicted quality for M=1 ((a) PSNR, (b) SSIM, and (c)
VMAF, respectively) and M=2 transcoding ((d) PSNR, (e) SSIM, and (f) VMAF, respectively).
Transcoding Quality Prediction for Adaptive Video Streaming 15
Evaluation Experimental Results
Experimental Results
Table: Prediction accuracy of TQPM when M=1 and M=2, respectively, for b̃1 representations considered in
this paper encoded using x265 HEVC encoder.
PSNR prediction SSIM prediction VMAF prediction
M=1 M=2 M=1 M=2 M=1 M=2
b̃1 R2 MAE R2 MAE R2 MAE R2 MAE R2 MAE R2 MAE
b1 360p 0.145 Mbps 0.82 1.20 dB - - 0.89 1.08 dB - - 0.87 3.35 - -
b2 432p 0.300 Mbps 0.83 1.19 dB 0.84 1.37 dB 0.89 1.14 dB 0.87 1.34 dB 0.87 3.51 0.76 3.38
b3 540p 0.600 Mbps 0.83 1.19 dB 0.85 1.28 dB 0.88 1.18 dB 0.85 1.21 dB 0.90 4.05 0.84 3.55
b4 540p 0.900 Mbps 0.83 1.19 dB 0.83 1.22 dB 0.86 1.17 dB 0.86 1.11 dB 0.90 3.83 0.89 3.53
b5 540p 1.600 Mbps 0.82 1.22 dB 0.82 1.15 dB 0.84 1.19 dB 0.85 1.38 dB 0.90 3.45 0.90 3.44
b6 720p 2.400 Mbps 0.83 1.26 dB 0.83 1.28 dB 0.82 1.18 dB 0.83 1.57 dB 0.88 2.88 0.91 3.45
b7 720p 3.400 Mbps 0.81 1.30 dB 0.85 1.23 dB 0.83 1.20 dB 0.82 1.35 dB 0.84 2.89 0.94 3.03
b8 1080p 4.500 Mbps 0.84 1.28 dB 0.83 1.28 dB 0.88 1.23 dB 0.82 1.34 dB 0.87 2.28 0.95 3.03
b9 1080p 5.800 Mbps 0.86 1.31 dB 0.87 1.42 dB 0.83 1.29 dB 0.86 1.30 dB 0.87 2.23 0.95 3.34
b10 1440p 8.100 Mbps 0.84 1.39 dB 0.81 1.41 dB 0.87 1.29 dB 0.87 1.32 dB 0.85 2.73 0.96 2.96
b11 2160p 11.600 Mbps 0.79 1.50 dB 0.82 1.31 dB 0.88 1.17 dB 0.84 1.32 dB 0.82 2.58 0.96 3.02
b12 2160p 16.800 Mbps 0.84 1.49 dB 0.79 1.26 dB 0.88 1.19 dB 0.86 1.35 dB 0.86 2.38 0.96 2.99
Average 0.83 1.31 dB 0.84 1.32 dB 0.85 1.19 dB 0.86 1.33 dB 0.87 3.01 0.91 3.25
The average processing time of TQPM for a 4s segment is 0.328s.
Transcoding Quality Prediction for Adaptive Video Streaming 16
Conclusions
Conclusions
This paper proposed TQPM, an online transcoding quality prediction model for video stream-
ing applications.
The proposed LSTM-based model uses DCT-energy-based features as reduced reference
to characterize the input video segment, which is used to predict the visual quality of an
M-stage transcoding process.
The performance of TQPM is validated by the Apple HLS bitrate ladder encoding and
transcoding using the x265 open-source HEVC encoder.
On average, for single-stage transcoding, TQPM predicts PSNR, SSIM, and VMAF with an
MAE of 1.31 dB, 1.19 dB, and 3.01, respectively.
Furthermore, PSNR, SSIM, and VMAF are predicted for two-stage transcoding with an
average MAE of 1.32 dB, 1.33 dB, and 3.25, respectively.
Transcoding Quality Prediction for Adaptive Video Streaming 17
Conclusions
Future Directions
In the future, transcoding between bitrate ladder representations of various codecs shall be
investigated.
Another future direction is defining a decision-making component based on the proposed
model in an end-to-end live streaming system.
Transcoding Quality Prediction for Adaptive Video Streaming 18
Q & A
Q & A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)
VCA
Transcoding Quality Prediction for Adaptive Video Streaming 19

More Related Content

Similar to TQPM.pdf

Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdfPerceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Vignesh V Menon
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video Streaming
Alpen-Adria-Universität
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
Vignesh V Menon
 
CODA_presentation.pdf
CODA_presentation.pdfCODA_presentation.pdf
CODA_presentation.pdf
JunZhao68
 
Efficient bitrate ladder construction for live video streaming
Efficient bitrate ladder construction for live video streamingEfficient bitrate ladder construction for live video streaming
Efficient bitrate ladder construction for live video streaming
Minh Nguyen
 
CAPS_Presentation.pdf
CAPS_Presentation.pdfCAPS_Presentation.pdf
CAPS_Presentation.pdf
Vignesh V Menon
 
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdfETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
Vignesh V Menon
 
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live StreamingETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
Alpen-Adria-Universität
 
JASLA_presentation.pdf
JASLA_presentation.pdfJASLA_presentation.pdf
JASLA_presentation.pdf
Vignesh V Menon
 
LiveVBR presentation at VQEG NORM.pdf
LiveVBR presentation at VQEG NORM.pdfLiveVBR presentation at VQEG NORM.pdf
LiveVBR presentation at VQEG NORM.pdf
Vignesh V Menon
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
Vignesh V Menon
 
Online Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC StreamingOnline Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC Streaming
Vignesh V Menon
 
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVCIEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
Vignesh V Menon
 
INCEPT: Intra CU Depth Prediction for HEVC
INCEPT: Intra CU Depth Prediction for HEVCINCEPT: Intra CU Depth Prediction for HEVC
INCEPT: Intra CU Depth Prediction for HEVC
Alpen-Adria-Universität
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Vignesh V Menon
 
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
IJECEIAES
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
eSAT Journals
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
eSAT Publishing House
 
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
IJECEIAES
 
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive StreamingMachine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Alpen-Adria-Universität
 

Similar to TQPM.pdf (20)

Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdfPerceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
Perceptually-aware Per-title Encoding for Adaptive Video Streaming.pdf
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video Streaming
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
 
CODA_presentation.pdf
CODA_presentation.pdfCODA_presentation.pdf
CODA_presentation.pdf
 
Efficient bitrate ladder construction for live video streaming
Efficient bitrate ladder construction for live video streamingEfficient bitrate ladder construction for live video streaming
Efficient bitrate ladder construction for live video streaming
 
CAPS_Presentation.pdf
CAPS_Presentation.pdfCAPS_Presentation.pdf
CAPS_Presentation.pdf
 
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdfETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
ETPS_Efficient_Two_pass_Encoding_Scheme_for_Adaptive_Streaming.pdf
 
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live StreamingETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
ETPS: Efficient Two-pass Encoding Scheme for Adaptive Live Streaming
 
JASLA_presentation.pdf
JASLA_presentation.pdfJASLA_presentation.pdf
JASLA_presentation.pdf
 
LiveVBR presentation at VQEG NORM.pdf
LiveVBR presentation at VQEG NORM.pdfLiveVBR presentation at VQEG NORM.pdf
LiveVBR presentation at VQEG NORM.pdf
 
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
IEEE ICIP'22:Efficient Content-Adaptive Feature-based Shot Detection for HTTP...
 
Online Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC StreamingOnline Bitrate ladder prediction for Adaptive VVC Streaming
Online Bitrate ladder prediction for Adaptive VVC Streaming
 
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVCIEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
IEEE MMSP'21: INCEPT: Intra CU Depth Prediction for HEVC
 
INCEPT: Intra CU Depth Prediction for HEVC
INCEPT: Intra CU Depth Prediction for HEVCINCEPT: Intra CU Depth Prediction for HEVC
INCEPT: Intra CU Depth Prediction for HEVC
 
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
Energy-efficient Adaptive Video Streaming with Latency-Aware Dynamic Resoluti...
 
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
 
Low complexity video coding for sensor network
Low complexity video coding for sensor networkLow complexity video coding for sensor network
Low complexity video coding for sensor network
 
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
Evaluation and Analysis of Rate Control Methods for H.264/AVC and MPEG-4 Vide...
 
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive StreamingMachine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
Machine Learning Based Video Coding Enhancements for HTTP Adaptive Streaming
 

More from Vignesh V Menon

Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Vignesh V Menon
 
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdfContent_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Vignesh V Menon
 
Green Variable framerate encoding for Adaptive Live Streaming
Green Variable framerate encoding  for Adaptive Live StreamingGreen Variable framerate encoding  for Adaptive Live Streaming
Green Variable framerate encoding for Adaptive Live Streaming
Vignesh V Menon
 
Doctoral Symposium presentation.pdf
Doctoral Symposium presentation.pdfDoctoral Symposium presentation.pdf
Doctoral Symposium presentation.pdf
Vignesh V Menon
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
Vignesh V Menon
 
Video Complexity Dataset (VCD).pdf
Video Complexity Dataset (VCD).pdfVideo Complexity Dataset (VCD).pdf
Video Complexity Dataset (VCD).pdf
Vignesh V Menon
 
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive StreamingLive-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Vignesh V Menon
 
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
Vignesh V Menon
 

More from Vignesh V Menon (8)

Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
Gain of Grain: A Film Grain Handling Toolchain for VVC-based Open Implementat...
 
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdfContent_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
Content_adaptive_video_coding_for_HTTP_Adaptive_Streaming.pdf
 
Green Variable framerate encoding for Adaptive Live Streaming
Green Variable framerate encoding  for Adaptive Live StreamingGreen Variable framerate encoding  for Adaptive Live Streaming
Green Variable framerate encoding for Adaptive Live Streaming
 
Doctoral Symposium presentation.pdf
Doctoral Symposium presentation.pdfDoctoral Symposium presentation.pdf
Doctoral Symposium presentation.pdf
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Video Complexity Dataset (VCD).pdf
Video Complexity Dataset (VCD).pdfVideo Complexity Dataset (VCD).pdf
Video Complexity Dataset (VCD).pdf
 
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive StreamingLive-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
Live-PSTR: Live Per-Title Encoding for Ultra HD Adaptive Streaming
 
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
IEEE PCS'21: Efficient multi-encoding for large-scale HTTP Adaptive Streaming...
 

Recently uploaded

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Denish Jangid
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 

Recently uploaded (20)

spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Chapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptxChapter wise All Notes of First year Basic Civil Engineering.pptx
Chapter wise All Notes of First year Basic Civil Engineering.pptx
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 

TQPM.pdf

  • 1. Transcoding Quality Prediction for Adaptive Video Streaming Vignesh V Menon1, Reza Farahani1, Prajit T Rajendran2, Mohammad Ghanbari3, Hermann Hellwagner1, Christian Timmerer1 1 Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria 2 CEA, List, F-91120 Palaiseau, Université Paris-Saclay, France 3 School of Computer Science and Electronic Engineering, University of Essex, UK 08 May 2023 Transcoding Quality Prediction for Adaptive Video Streaming 1
  • 2. Outline 1 Introduction 2 M-stage transcoding model 3 TQPM Architecture 4 Evaluation 5 Conclusions Transcoding Quality Prediction for Adaptive Video Streaming 2
  • 3. Introduction Introduction HTTP Adaptive Streaming (HAS)1 Source: https://bitmovin.com/adaptive-streaming/ Why Adaptive Streaming? Adapt for a wide range of devices. Adapt for a broad set of Internet speeds. What HAS does? Each source video is split into segments. Encoded at multiple bitrates, resolutions, and codecs. Delivered to the client based on the device capability, network speed etc. 1 A. Bentaleb et al. “A Survey on Bitrate Adaptation Schemes for Streaming Media Over HTTP”. In: IEEE Communications Surveys Tutorials 21.1 (2019), pp. 562–585. Transcoding Quality Prediction for Adaptive Video Streaming 3
  • 4. Introduction Motivation Video transcoding has been considered a prevalent solution for reconstructing video sequences at in-network servers (deployed at cloud or edge) in latency-sensitive video streaming applications. e Original Video Segment Highest Bitrate Representation 1 d1 2 e 2160p, 16.8 Mbps 2160p, 16.8 Mbps 2160p, 11.6 Mbps 1080p, 5.8 Mbps 720p, 2.4 Mbps Origin Server Edge Server ClientA Client Client Client B D Client C E Transcoding 1 1 2 2 2 Figure: An example scenario of VQA in adaptive streaming applications. Clients A and B receive the highest bitrate representation of the bitrate ladder, encoded at the origin server (single-stage transcoding). In contrast, Clients C, D, and E receive lower bitrate representations transcoded at the edge server (two- stage transcoding). Transcoding Quality Prediction for Adaptive Video Streaming 4
  • 5. Introduction Motivation Features Quality Scores Processor Input Video Reconstructed Video Video Processing Features Figure: Workflow of state-of-the-art VQA methods. VQA is cumbersome in most video streaming applications where: The original input video segment is not available as the reference at the destination The final reconstructed video segment is not available at the source Slow quality-based decision-making is not acceptable for online latency-sensitive services. Transcoding Quality Prediction for Adaptive Video Streaming 5
  • 6. M-stage transcoding model M-stage transcoding model e d e d e d Input Video Segment Reconstructed Video Segment M-Stage Transcoding System Model ... b ~ 1 b ~ 2 b ~ M 1 1 2 2 M M Stage 1 Stage 2 Stage M Figure: M-stage transcoding model considered in this paper. Here, ei and di represent the encoding and decoding in ith stage of transcoding, while b̃i denotes the target bitrate of ei where i ∈ [1, M]. The generalized M-stage transcoding model for HAS consists of a series of M encoders and M decoders in a chain. M=1 transcoding corresponds to the single-stage transcoding while M=2 transcoding corre- sponds to the two-stage transcoding. Transcoding Quality Prediction for Adaptive Video Streaming 6
  • 7. M-stage transcoding model M-stage transcoding model e d e d e d Input Video Segment Reconstructed Video Segment M-Stage Transcoding System Model ... b ~ 1 b ~ 2 b ~ M 1 1 2 2 M M Stage 1 Stage 2 Stage M τT = M X i=1 (τei + τdi ) + 2 · τf (1) VQA at source by predicting the video quality using the input video segment characteristics and the transcoding system characteristics solves the discussed problems. Transcoding Quality Prediction for Adaptive Video Streaming 7
  • 8. TQPM Architecture TQPM Architecture ... ... ... E h L L L h h E E 1 1 1 T T X X1 T Video Quality Prediction Model bM ~ V ^ b2 ~ b1 ~ ... Input Video Segment E T b1 ~ b2 ~ . . . bM ~ [T x (M+3)] I II I Input Video Segment Characteristics II M-Stage Transcoding Model Characteristics ~ Figure: TQPM architecture The TQPM architecture comprises three steps: input video segment characterization transcoding model Characterization video quality prediction Transcoding Quality Prediction for Adaptive Video Streaming 8
  • 9. TQPM Architecture Phase 1: Input video segment characterization Input video segment characterization Compute texture energy per block A DCT-based energy function is used to determine the block-wise feature of each frame defined as: Hk = w−1 X i=0 w−1 X j=0 e|( ij wh )2−1| |DCT(i, j)| (2) where wxw is the size of the block, and DCT(i, j) is the (i, j)th DCT component when i + j > 0, and 0 otherwise. The energy values of blocks in a frame are averaged to determine the energy per frame.2,3 Es = K−1 X k=0 Hs,k K · w2 (3) 2 Michael King, Zinovi Tauber, and Ze-Nian Li. “A New Energy Function for Segmentation and Compression”. In: 2007 IEEE International Conference on Multimedia and Expo. 2007, pp. 1647–1650. doi: 10.1109/ICME.2007.4284983. 3 Vignesh V Menon et al. “Efficient Content-Adaptive Feature-Based Shot Detection for HTTP Adaptive Streaming”. In: 2021 IEEE International Conference on Image Processing (ICIP). 2021, pp. 2174–2178. doi: 10.1109/ICIP42928.2021.9506092. Transcoding Quality Prediction for Adaptive Video Streaming 9
  • 10. TQPM Architecture Phase 1: Input video segment characterization Input video segment characterization hs: SAD of the block level energy values of frame s to that of the previous frame s − 1. hs = K−1 X k=0 | Hs,k, Hs−1,k | K · w2 (4) where K denotes the number of blocks in frame s. The luminescence of non-overlapping blocks k of sth frame is defined as: Ls,k = p DCT(0, 0) (5) The block-wise luminescence is averaged per frame denoted as Ls as shown below.4 Ls = K−1 X k=0 Ls,k K · w2 (6) 4 Vignesh V Menon et al. “VCA: Video Complexity Analyzer”. In: Proceedings of the 13th ACM Multimedia Systems Conference. MMSys ’22. Athlone, Ireland: Association for Computing Machinery, 2022, 259–264. isbn: 9781450392839. doi: 10.1145/3524273.3532896. url: https://doi.org/10.1145/3524273.3532896. Transcoding Quality Prediction for Adaptive Video Streaming 10
  • 11. TQPM Architecture Phase 1: Input video segment characterization Input video segment characterization The video segment is divided into T chunks with a fixed number of frames (i.e., fc) in each chunk. The averages of the E, h, and L features of each chunk are computed to obtain the reduced reference representation of the input video segment, expressed as: X = {x1, x2, .., xT } (7) where, xi is the feature set of every ith chunk, represented as : xi = [Ei , hi , Li ] ∀i ∈ [1, T] (8) Transcoding Quality Prediction for Adaptive Video Streaming 11
  • 12. TQPM Architecture Phase 2: Transcoding model Characterization Phase 2: Transcoding model Characterization e d e d e d Input Video Segment Reconstructed Video Segment M-Stage Transcoding System Model ... b ~ 1 b ~ 2 b ~ M 1 1 2 2 M M Stage 1 Stage 2 Stage M The settings of the encoders in the M-stage transcoding process, except the target bitrate- resolution pair, are assumed identical.5 The resolutions corresponding to the target bitrates in the bitrate ladder are also assumed to be fixed. The transcoding model can be characterized as follows: B̃ = [b̃1, b̃2, .., b̃M] (9) where b̃i represents the target bitrate of the ei encoder. 5 Vignesh V Menon et al. “EMES: Efficient Multi-Encoding Schemes for HEVC-Based Adaptive Bitrate Streaming”. In: ACM Trans. Multimedia Comput. Commun. Appl. 19.3s (2023). issn: 1551-6857. doi: 10.1145/3575659. url: https://doi.org/10.1145/3575659. Transcoding Quality Prediction for Adaptive Video Streaming 12
  • 13. TQPM Architecture Phase 3: Video quality prediction Phase 3: Video quality prediction B̃ is appended to xi , which is determined during the input video segment characterization phase, to obtain: ˜ xi = [xi |B̃]T ∀˜ xi ∈ X̃, i ∈ [1, T] (10) The predicted quality v̂b̃M |..|b̃1 can be presented as: v̂b̃M |..|b̃1 = f (X̃) (11) The feature sequences in the series X̃ are input to the LSTM model, which predicts visual quality v̂ for the corresponding input video segment and chain of encoders in the transcoding process. Transcoding Quality Prediction for Adaptive Video Streaming 13
  • 14. Evaluation Experimental Setup Experimental Setup Dataset : JVET,6 MCML,7 SJTU,8 Berlin,9 UVG,10 BVI11 Framerate : 30fps Encoder : x265 v3.5 Preset : ultrafast Table: Representations considered in this paper. Representation ID 01 02 03 04 05 06 07 08 09 10 11 12 r (width in pixels) 360 432 540 540 540 720 720 1080 1080 1440 2160 2160 b (in Mbps) 0.145 0.300 0.600 0.900 1.600 2.400 3.400 4.500 5.800 8.100 11.600 16.800 E, h, L features are extracted using VCAv2.0. 6 Jill Boyce et al. JVET-J1010: JVET common test conditions and software reference configurations. July 2018. 7 Manri Cheon and Jong-Seok Lee. “Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience”. In: IEEE Transactions on Circuits and Systems for Video Technology 28.7 (2018), pp. 1467–1480. doi: 10.1109/TCSVT.2017.2683504. 8 Li Song et al. “The SJTU 4K Video Sequence Dataset”. In: Fifth International Workshop on Quality of Multimedia Experience (QoMEX2013) (July 2013). 9 B. Bross et al. “AHG4 Multiformat Berlin Test Sequences”. In: JVET-Q0791. 2020. 10 Alexandre Mercat, Marko Viitanen, and Jarno Vanne. “UVG Dataset: 50/120fps 4K Sequences for Video Codec Analysis and Development”. In: Proceedings of the 11th ACM Multimedia Systems Conference. New York, NY, USA: Association for Computing Machinery, 2020, 297–302. isbn: 9781450368452. url: https://doi.org/10.1145/3339825.3394937. 11 Alex Mackin, Fan Zhang, and David R. Bull. “A study of subjective video quality at various frame rates”. In: 2015 IEEE International Conference on Image Processing (ICIP). 2015, pp. 3407–3411. doi: 10.1109/ICIP.2015.7351436. Transcoding Quality Prediction for Adaptive Video Streaming 14
  • 15. Evaluation Experimental Results Experimental Results 20 30 40 50 Actual PSNR (in dB) 20 30 40 50 Predicted PSNR (in dB) (a) 10 20 30 40 Actual SSIM (in dB) 10 20 30 40 Predicted SSIM (in dB) (b) 0 25 50 75 100 Actual VMAF 0 20 40 60 80 100 Predicted VMAF JND threshold (c) 20 30 40 50 Actual PSNR (in dB) 20 30 40 50 Predicted PSNR (in dB) (d) 10 20 30 40 Actual SSIM (in dB) 10 20 30 40 Predicted SSIM (in dB) (e) 0 25 50 75 100 Actual VMAF 0 20 40 60 80 100 Predicted VMAF JND threshold (f) Figure: Scatterplots of the actual quality and predicted quality for M=1 ((a) PSNR, (b) SSIM, and (c) VMAF, respectively) and M=2 transcoding ((d) PSNR, (e) SSIM, and (f) VMAF, respectively). Transcoding Quality Prediction for Adaptive Video Streaming 15
  • 16. Evaluation Experimental Results Experimental Results Table: Prediction accuracy of TQPM when M=1 and M=2, respectively, for b̃1 representations considered in this paper encoded using x265 HEVC encoder. PSNR prediction SSIM prediction VMAF prediction M=1 M=2 M=1 M=2 M=1 M=2 b̃1 R2 MAE R2 MAE R2 MAE R2 MAE R2 MAE R2 MAE b1 360p 0.145 Mbps 0.82 1.20 dB - - 0.89 1.08 dB - - 0.87 3.35 - - b2 432p 0.300 Mbps 0.83 1.19 dB 0.84 1.37 dB 0.89 1.14 dB 0.87 1.34 dB 0.87 3.51 0.76 3.38 b3 540p 0.600 Mbps 0.83 1.19 dB 0.85 1.28 dB 0.88 1.18 dB 0.85 1.21 dB 0.90 4.05 0.84 3.55 b4 540p 0.900 Mbps 0.83 1.19 dB 0.83 1.22 dB 0.86 1.17 dB 0.86 1.11 dB 0.90 3.83 0.89 3.53 b5 540p 1.600 Mbps 0.82 1.22 dB 0.82 1.15 dB 0.84 1.19 dB 0.85 1.38 dB 0.90 3.45 0.90 3.44 b6 720p 2.400 Mbps 0.83 1.26 dB 0.83 1.28 dB 0.82 1.18 dB 0.83 1.57 dB 0.88 2.88 0.91 3.45 b7 720p 3.400 Mbps 0.81 1.30 dB 0.85 1.23 dB 0.83 1.20 dB 0.82 1.35 dB 0.84 2.89 0.94 3.03 b8 1080p 4.500 Mbps 0.84 1.28 dB 0.83 1.28 dB 0.88 1.23 dB 0.82 1.34 dB 0.87 2.28 0.95 3.03 b9 1080p 5.800 Mbps 0.86 1.31 dB 0.87 1.42 dB 0.83 1.29 dB 0.86 1.30 dB 0.87 2.23 0.95 3.34 b10 1440p 8.100 Mbps 0.84 1.39 dB 0.81 1.41 dB 0.87 1.29 dB 0.87 1.32 dB 0.85 2.73 0.96 2.96 b11 2160p 11.600 Mbps 0.79 1.50 dB 0.82 1.31 dB 0.88 1.17 dB 0.84 1.32 dB 0.82 2.58 0.96 3.02 b12 2160p 16.800 Mbps 0.84 1.49 dB 0.79 1.26 dB 0.88 1.19 dB 0.86 1.35 dB 0.86 2.38 0.96 2.99 Average 0.83 1.31 dB 0.84 1.32 dB 0.85 1.19 dB 0.86 1.33 dB 0.87 3.01 0.91 3.25 The average processing time of TQPM for a 4s segment is 0.328s. Transcoding Quality Prediction for Adaptive Video Streaming 16
  • 17. Conclusions Conclusions This paper proposed TQPM, an online transcoding quality prediction model for video stream- ing applications. The proposed LSTM-based model uses DCT-energy-based features as reduced reference to characterize the input video segment, which is used to predict the visual quality of an M-stage transcoding process. The performance of TQPM is validated by the Apple HLS bitrate ladder encoding and transcoding using the x265 open-source HEVC encoder. On average, for single-stage transcoding, TQPM predicts PSNR, SSIM, and VMAF with an MAE of 1.31 dB, 1.19 dB, and 3.01, respectively. Furthermore, PSNR, SSIM, and VMAF are predicted for two-stage transcoding with an average MAE of 1.32 dB, 1.33 dB, and 3.25, respectively. Transcoding Quality Prediction for Adaptive Video Streaming 17
  • 18. Conclusions Future Directions In the future, transcoding between bitrate ladder representations of various codecs shall be investigated. Another future direction is defining a decision-making component based on the proposed model in an end-to-end live streaming system. Transcoding Quality Prediction for Adaptive Video Streaming 18
  • 19. Q & A Q & A Thank you for your attention! Vignesh V Menon (vignesh.menon@aau.at) VCA Transcoding Quality Prediction for Adaptive Video Streaming 19