Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution

Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution
Hadi Amirpour 1
Hannaneh Barahouei Pasandi 2
Christian Timmerer 1
Mohammad Ghanbari 1,3
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
Department of Computer Science, Virginia Commonwealth University, Richmond, USA 3
School of Computer Science and Electronic Engineering, University of Essex, UK
Paper ID: 092
VCIP 2021
Introduction
In HTTP Adaptive Streaming (HAS), the same video content is provided as a set of bitrate-
resolution pairs (also referred to as a bitrate ladder) to adapt to the end-user’s network con-
ditions and to support heterogeneous environments.
An example of a bitrate ladder from HLS bitrate ladder is shown in Table 1.
Table 1. HLS Bitrate Ladder.
Bitrate (kbps) Resolution Framerate
145 640x360 ≤ 30 fps
300 768 x 432 ≤ 30 fps
600 960 x 540 ≤ 30 fps
900 960 x 540 ≤ 30 fps
1600 960 x 540 Same as source
This “one-size-fits-all” bitrate ladder is usually optimized over bitrate, and resolution resulting
in low-quality video delivery and a lower QoE.
Bitrate:
“easy to encode” videos may suffer from over-allocating bitrate.
“hard to encode” videos may suffer from under-allocating bitrate.
Figure 1a shows RD curves of an “easy to encode” video and a “hard to encode” video.
Bitrates of an optimized bitrate ladder are selected based on the content.
Additionally, context (network) may have impact on selecting the optimal bitrates for a
bitrate ladder.
Resolution:
Figure 1b shows RD curves of two videos encoded at two resolutions (540p and 1080p).
The optimal resolution for a given bitrate depends on the content.
This content-dependent behavior shows that the bitrate ladder should be optimized over
spatial resolution per title.
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
25
30
35
40
45
PSNR
(dB)
Typing YachtRide
(a)
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
30
32
34
36
38
40
42
PSNR
(dB)
Beauty (1080p)
Beauty (540p)
HoneyBee (1080p)
HoneyBee (540p)
(b)
Figure 1. Videos show different behaviour to the compression.
Additionally, bitrate ladders may optimize over framerate [1], codec, and display size.
Per-Title Encoding using Deep Neural Networks
In this paper, we optimize bitrate ladders over a new dimension, upscaling.
As shown in Figure 1b, two videos, were encoded at two resolutions (i.e., 1080p and 540p).
540p versions are upscaled to 1080p to compare their quality with the original video.
For HoneyBee, there is a change over at approx. 2000 kbps between RD curves of 540p
and 1080p resolutions.
for Beauty, 540p remains superior at the given bitrate.
Deep learning-based Video Super Resolution algorithms show significant improvements in up-
scaling videos over traditional approaches.
Figure 2a compares the quality (VMAF) of a video encoded at 270p and upscaled by traditional
bicubic and deep learning-based EDVR [2] methods.
The method that is used to upscale a low resolution video to the original resolution may have
impact on the intersection point of two resolutions; consequently, selecting optimal resolution
for each bitrate.
(a)
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
30
40
50
60
70
80
90
VMAF
cross-over
bicubic
cross-over
EDVR
270p - bicucbic
270p - EDVR
1080p
(b)
Figure 2. (a) RD curves for the Beauty sequence encoded at 270p and upscaled by bicubic and EDVR methods. (b) RD curves for the
Beauty sequences encoded at 270p and upscaled by bicubic and EDVR methods.
In this paper, PTED, we encode videos at three resolutions, and encode each of them at a set of
bitrates. To upscale low resolution video to the original resolution EDVR [2] is used. Figure 3a
shows an example of convex hulls that are formed based on the scaled qualities using bicubic
and EDVR methods.
Increasing bitrate typically leads to increased quality.
This increased quality might not be noticeable for the end-users.
Consequently, the increased bitrate may lead to bandwidth wastage.
Therefore, to select encodings at the range of [QminQmax], we consider one Just Noticeable
Difference (JND) as a quality step between encodings. The encodings with quality
differences less than one JND are imperceptible to viewers.
Therefore, we will select Qmax−Qmin
JND + 1 encodings for each bitrate ladder.
Qmax and Qmin are set to 90 VMAF and 50 VMAF, respectively, and JND is set to six VMAF
points in our experiments.
Figure 3b shows the selected encodings based on the one JND between Qmax and Qmin.
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
20
30
40
50
60
70
80
90
VMAF
1080p
convex hull w/ bicubic
convex hull w/ VSR
(a) (b)
Figure 3. (a) Convex hulls for SOTA (bicubic upscaling) [3], and proposed method (VSR upscaling). (b) Selected encoding from convex
hull based on one JND (6 VMAF) between Qmax = 90 V MAF and Qmin = 50 V MAF.
Results
Figure 4 shows the first frame of the Gold sequence encoded at 660kps and upscaled by bicubic and EDVR
methods for subjective evaluation.
HR SR LR
Figure 4. Subjective evaluation of bicubic (LR) and EDVR (SR) upscaling methods when the first frame of the 270p Golf sequence
encoded at 660kbps is 4x upscaled.
Table 2 summarizes BD-Rate and BD-VMAF values for convex hulls compared to the anchor. PTED (upscaling
with VSR) shows a considerable bitrate saving compared to SOTA (upscaling with bicubic) [3].
The storage reduction is also given in Table 2.
Table 2. Bitrate saving (BD-rate (%)) and BD-VMAF against encoding with 1080p.
BD-Rate BD-VMAF
SOTA PTED SOTA PTED Storage Reduction (%)
Beauty -23.97 -31.19 5.83 7.84 22.95
Bosphorus -6.43 -19.30 1.36 3.95 45.64
Flowers -4.15 -15.00 0.90 3.55 47.36
Golf -14.70 -36.31 3.46 9.43 47.36
HoneyBee -8.90 -15.86 3.40 6.51 76.54
Jockey -13.63 -23.35 3.21 6.14 75.65
Pond -6.29 -29.14 1.50 7.68 66.13
Typing -11.28 -28.92 2.98 7.61 82.19
YachtRide -4.53 -15.55 1.67 4.17 9.14
Average -10.43 -23.84 2.7 6.32 53.13
Acknowledgment
The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for
Research, Technology, and Development, and the Christian Doppler Research Association is gratefully acknowl-
edged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.
References
[1] H. Amirpour et al., ”PSTR: Per-Title Encoding Using Spatio-Temporal Resolutions,” in IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2021.
[2] X. Wang et al., ”EDVR: Video Restoration With Enhanced Deformable Convolutional Networks,” in IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), pp. 1954-1963, 2019.
[3] J. De Cock et al., ”Complexity-based Consistent-Quality Encoding in the Cloud,” in 2IEEE International Conference on Image Processing (ICIP), pp. 1484-1488,
2016.
https:/
/www.athena.itec.aau.at IEEE International Conference on Visual Communications and Image Processing 2021 (VCIP2021) hadi.amirpour@aau.at

Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution

More Related Content

Similar to Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution

More from Alpen-Adria-Universität

Recently uploaded

Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution