Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution
Hadi Amirpour 1
Hannaneh Barahouei Pasandi 2
Christian Timmerer 1
Mohammad Ghanbari 1,3
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
Department of Computer Science, Virginia Commonwealth University, Richmond, USA 3
School of Computer Science and Electronic Engineering, University of Essex, UK
Paper ID: 092
VCIP 2021
Introduction
In HTTP Adaptive Streaming (HAS), the same video content is provided as a set of bitrate-
resolution pairs (also referred to as a bitrate ladder) to adapt to the end-user’s network con-
ditions and to support heterogeneous environments.
An example of a bitrate ladder from HLS bitrate ladder is shown in Table 1.
Table 1. HLS Bitrate Ladder.
Bitrate (kbps) Resolution Framerate
145 640x360 ≤ 30 fps
300 768 x 432 ≤ 30 fps
600 960 x 540 ≤ 30 fps
900 960 x 540 ≤ 30 fps
1600 960 x 540 Same as source
2400 1280 x 720 Same as source
3400 1280 x 720 Same as source
4500 1920 x 1080 Same as source
5800 1920 x 1080 Same as source
This “one-size-fits-all” bitrate ladder is usually optimized over bitrate, and resolution resulting
in low-quality video delivery and a lower QoE.
Bitrate:
“easy to encode” videos may suffer from over-allocating bitrate.
“hard to encode” videos may suffer from under-allocating bitrate.
Figure 1a shows RD curves of an “easy to encode” video and a “hard to encode” video.
Bitrates of an optimized bitrate ladder are selected based on the content.
Additionally, context (network) may have impact on selecting the optimal bitrates for a
bitrate ladder.
Resolution:
Figure 1b shows RD curves of two videos encoded at two resolutions (540p and 1080p).
The optimal resolution for a given bitrate depends on the content.
This content-dependent behavior shows that the bitrate ladder should be optimized over
spatial resolution per title.
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
25
30
35
40
45
PSNR
(dB)
Typing YachtRide
(a)
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
30
32
34
36
38
40
42
PSNR
(dB)
Beauty (1080p)
Beauty (540p)
HoneyBee (1080p)
HoneyBee (540p)
(b)
Figure 1. Videos show different behaviour to the compression.
Additionally, bitrate ladders may optimize over framerate [1], codec, and display size.
Per-Title Encoding using Deep Neural Networks
In this paper, we optimize bitrate ladders over a new dimension, upscaling.
As shown in Figure 1b, two videos, were encoded at two resolutions (i.e., 1080p and 540p).
540p versions are upscaled to 1080p to compare their quality with the original video.
For HoneyBee, there is a change over at approx. 2000 kbps between RD curves of 540p
and 1080p resolutions.
for Beauty, 540p remains superior at the given bitrate.
Deep learning-based Video Super Resolution algorithms show significant improvements in up-
scaling videos over traditional approaches.
Figure 2a compares the quality (VMAF) of a video encoded at 270p and upscaled by traditional
bicubic and deep learning-based EDVR [2] methods.
The method that is used to upscale a low resolution video to the original resolution may have
impact on the intersection point of two resolutions; consequently, selecting optimal resolution
for each bitrate.
(a)
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
30
40
50
60
70
80
90
VMAF
cross-over
bicubic
cross-over
EDVR
270p - bicucbic
270p - EDVR
1080p
(b)
Figure 2. (a) RD curves for the Beauty sequence encoded at 270p and upscaled by bicubic and EDVR methods. (b) RD curves for the
Beauty sequences encoded at 270p and upscaled by bicubic and EDVR methods.
In this paper, PTED, we encode videos at three resolutions, and encode each of them at a set of
bitrates. To upscale low resolution video to the original resolution EDVR [2] is used. Figure 3a
shows an example of convex hulls that are formed based on the scaled qualities using bicubic
and EDVR methods.
Increasing bitrate typically leads to increased quality.
This increased quality might not be noticeable for the end-users.
Consequently, the increased bitrate may lead to bandwidth wastage.
Therefore, to select encodings at the range of [QminQmax], we consider one Just Noticeable
Difference (JND) as a quality step between encodings. The encodings with quality
differences less than one JND are imperceptible to viewers.
Therefore, we will select Qmax−Qmin
JND + 1 encodings for each bitrate ladder.
Qmax and Qmin are set to 90 VMAF and 50 VMAF, respectively, and JND is set to six VMAF
points in our experiments.
Figure 3b shows the selected encodings based on the one JND between Qmax and Qmin.
1
4
5
3
5
0
6
6
0
9
9
0
1
7
0
0
2
4
0
0
3
2
0
0
4
5
0
0
5
8
0
0
Bitrate(kbps)
20
30
40
50
60
70
80
90
VMAF
1080p
convex hull w/ bicubic
convex hull w/ VSR
(a) (b)
Figure 3. (a) Convex hulls for SOTA (bicubic upscaling) [3], and proposed method (VSR upscaling). (b) Selected encoding from convex
hull based on one JND (6 VMAF) between Qmax = 90 V MAF and Qmin = 50 V MAF.
Results
Figure 4 shows the first frame of the Gold sequence encoded at 660kps and upscaled by bicubic and EDVR
methods for subjective evaluation.
HR SR LR
Figure 4. Subjective evaluation of bicubic (LR) and EDVR (SR) upscaling methods when the first frame of the 270p Golf sequence
encoded at 660kbps is 4x upscaled.
Table 2 summarizes BD-Rate and BD-VMAF values for convex hulls compared to the anchor. PTED (upscaling
with VSR) shows a considerable bitrate saving compared to SOTA (upscaling with bicubic) [3].
The storage reduction is also given in Table 2.
Table 2. Bitrate saving (BD-rate (%)) and BD-VMAF against encoding with 1080p.
BD-Rate BD-VMAF
SOTA PTED SOTA PTED Storage Reduction (%)
Beauty -23.97 -31.19 5.83 7.84 22.95
Bosphorus -6.43 -19.30 1.36 3.95 45.64
Flowers -4.15 -15.00 0.90 3.55 47.36
Golf -14.70 -36.31 3.46 9.43 47.36
HoneyBee -8.90 -15.86 3.40 6.51 76.54
Jockey -13.63 -23.35 3.21 6.14 75.65
Pond -6.29 -29.14 1.50 7.68 66.13
Typing -11.28 -28.92 2.98 7.61 82.19
YachtRide -4.53 -15.55 1.67 4.17 9.14
Average -10.43 -23.84 2.7 6.32 53.13
Acknowledgment
The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for
Research, Technology, and Development, and the Christian Doppler Research Association is gratefully acknowl-
edged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/.
References
[1] H. Amirpour et al., ”PSTR: Per-Title Encoding Using Spatio-Temporal Resolutions,” in IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2021.
[2] X. Wang et al., ”EDVR: Video Restoration With Enhanced Deformable Convolutional Networks,” in IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), pp. 1954-1963, 2019.
[3] J. De Cock et al., ”Complexity-based Consistent-Quality Encoding in the Cloud,” in 2IEEE International Conference on Image Processing (ICIP), pp. 1484-1488,
2016.
https:/
/www.athena.itec.aau.at IEEE International Conference on Visual Communications and Image Processing 2021 (VCIP2021) hadi.amirpour@aau.at

Improving Per-title Encoding for HTTP Adaptive Streaming by Utilizing Video Super-resolution

  • 1.
    Improving Per-title Encodingfor HTTP Adaptive Streaming by Utilizing Video Super-resolution Hadi Amirpour 1 Hannaneh Barahouei Pasandi 2 Christian Timmerer 1 Mohammad Ghanbari 1,3 1 Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria 2 Department of Computer Science, Virginia Commonwealth University, Richmond, USA 3 School of Computer Science and Electronic Engineering, University of Essex, UK Paper ID: 092 VCIP 2021 Introduction In HTTP Adaptive Streaming (HAS), the same video content is provided as a set of bitrate- resolution pairs (also referred to as a bitrate ladder) to adapt to the end-user’s network con- ditions and to support heterogeneous environments. An example of a bitrate ladder from HLS bitrate ladder is shown in Table 1. Table 1. HLS Bitrate Ladder. Bitrate (kbps) Resolution Framerate 145 640x360 ≤ 30 fps 300 768 x 432 ≤ 30 fps 600 960 x 540 ≤ 30 fps 900 960 x 540 ≤ 30 fps 1600 960 x 540 Same as source 2400 1280 x 720 Same as source 3400 1280 x 720 Same as source 4500 1920 x 1080 Same as source 5800 1920 x 1080 Same as source This “one-size-fits-all” bitrate ladder is usually optimized over bitrate, and resolution resulting in low-quality video delivery and a lower QoE. Bitrate: “easy to encode” videos may suffer from over-allocating bitrate. “hard to encode” videos may suffer from under-allocating bitrate. Figure 1a shows RD curves of an “easy to encode” video and a “hard to encode” video. Bitrates of an optimized bitrate ladder are selected based on the content. Additionally, context (network) may have impact on selecting the optimal bitrates for a bitrate ladder. Resolution: Figure 1b shows RD curves of two videos encoded at two resolutions (540p and 1080p). The optimal resolution for a given bitrate depends on the content. This content-dependent behavior shows that the bitrate ladder should be optimized over spatial resolution per title. 1 4 5 3 5 0 6 6 0 9 9 0 1 7 0 0 2 4 0 0 3 2 0 0 4 5 0 0 5 8 0 0 Bitrate(kbps) 25 30 35 40 45 PSNR (dB) Typing YachtRide (a) 1 4 5 3 5 0 6 6 0 9 9 0 1 7 0 0 2 4 0 0 3 2 0 0 4 5 0 0 5 8 0 0 Bitrate(kbps) 30 32 34 36 38 40 42 PSNR (dB) Beauty (1080p) Beauty (540p) HoneyBee (1080p) HoneyBee (540p) (b) Figure 1. Videos show different behaviour to the compression. Additionally, bitrate ladders may optimize over framerate [1], codec, and display size. Per-Title Encoding using Deep Neural Networks In this paper, we optimize bitrate ladders over a new dimension, upscaling. As shown in Figure 1b, two videos, were encoded at two resolutions (i.e., 1080p and 540p). 540p versions are upscaled to 1080p to compare their quality with the original video. For HoneyBee, there is a change over at approx. 2000 kbps between RD curves of 540p and 1080p resolutions. for Beauty, 540p remains superior at the given bitrate. Deep learning-based Video Super Resolution algorithms show significant improvements in up- scaling videos over traditional approaches. Figure 2a compares the quality (VMAF) of a video encoded at 270p and upscaled by traditional bicubic and deep learning-based EDVR [2] methods. The method that is used to upscale a low resolution video to the original resolution may have impact on the intersection point of two resolutions; consequently, selecting optimal resolution for each bitrate. (a) 1 4 5 3 5 0 6 6 0 9 9 0 1 7 0 0 2 4 0 0 3 2 0 0 4 5 0 0 5 8 0 0 Bitrate(kbps) 30 40 50 60 70 80 90 VMAF cross-over bicubic cross-over EDVR 270p - bicucbic 270p - EDVR 1080p (b) Figure 2. (a) RD curves for the Beauty sequence encoded at 270p and upscaled by bicubic and EDVR methods. (b) RD curves for the Beauty sequences encoded at 270p and upscaled by bicubic and EDVR methods. In this paper, PTED, we encode videos at three resolutions, and encode each of them at a set of bitrates. To upscale low resolution video to the original resolution EDVR [2] is used. Figure 3a shows an example of convex hulls that are formed based on the scaled qualities using bicubic and EDVR methods. Increasing bitrate typically leads to increased quality. This increased quality might not be noticeable for the end-users. Consequently, the increased bitrate may lead to bandwidth wastage. Therefore, to select encodings at the range of [QminQmax], we consider one Just Noticeable Difference (JND) as a quality step between encodings. The encodings with quality differences less than one JND are imperceptible to viewers. Therefore, we will select Qmax−Qmin JND + 1 encodings for each bitrate ladder. Qmax and Qmin are set to 90 VMAF and 50 VMAF, respectively, and JND is set to six VMAF points in our experiments. Figure 3b shows the selected encodings based on the one JND between Qmax and Qmin. 1 4 5 3 5 0 6 6 0 9 9 0 1 7 0 0 2 4 0 0 3 2 0 0 4 5 0 0 5 8 0 0 Bitrate(kbps) 20 30 40 50 60 70 80 90 VMAF 1080p convex hull w/ bicubic convex hull w/ VSR (a) (b) Figure 3. (a) Convex hulls for SOTA (bicubic upscaling) [3], and proposed method (VSR upscaling). (b) Selected encoding from convex hull based on one JND (6 VMAF) between Qmax = 90 V MAF and Qmin = 50 V MAF. Results Figure 4 shows the first frame of the Gold sequence encoded at 660kps and upscaled by bicubic and EDVR methods for subjective evaluation. HR SR LR Figure 4. Subjective evaluation of bicubic (LR) and EDVR (SR) upscaling methods when the first frame of the 270p Golf sequence encoded at 660kbps is 4x upscaled. Table 2 summarizes BD-Rate and BD-VMAF values for convex hulls compared to the anchor. PTED (upscaling with VSR) shows a considerable bitrate saving compared to SOTA (upscaling with bicubic) [3]. The storage reduction is also given in Table 2. Table 2. Bitrate saving (BD-rate (%)) and BD-VMAF against encoding with 1080p. BD-Rate BD-VMAF SOTA PTED SOTA PTED Storage Reduction (%) Beauty -23.97 -31.19 5.83 7.84 22.95 Bosphorus -6.43 -19.30 1.36 3.95 45.64 Flowers -4.15 -15.00 0.90 3.55 47.36 Golf -14.70 -36.31 3.46 9.43 47.36 HoneyBee -8.90 -15.86 3.40 6.51 76.54 Jockey -13.63 -23.35 3.21 6.14 75.65 Pond -6.29 -29.14 1.50 7.68 66.13 Typing -11.28 -28.92 2.98 7.61 82.19 YachtRide -4.53 -15.55 1.67 4.17 9.14 Average -10.43 -23.84 2.7 6.32 53.13 Acknowledgment The financial support of the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology, and Development, and the Christian Doppler Research Association is gratefully acknowl- edged. Christian Doppler Laboratory ATHENA: https://athena.itec.aau.at/. References [1] H. Amirpour et al., ”PSTR: Per-Title Encoding Using Spatio-Temporal Resolutions,” in IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, 2021. [2] X. Wang et al., ”EDVR: Video Restoration With Enhanced Deformable Convolutional Networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1954-1963, 2019. [3] J. De Cock et al., ”Complexity-based Consistent-Quality Encoding in the Cloud,” in 2IEEE International Conference on Image Processing (ICIP), pp. 1484-1488, 2016. https:/ /www.athena.itec.aau.at IEEE International Conference on Visual Communications and Image Processing 2021 (VCIP2021) hadi.amirpour@aau.at