Just Noticeable Difference-aware
Per-Scene Bitrate-laddering for Adaptive Video Streaming
Vignesh V Menon1, Jingwen Zhu2, Prajit T Rajendran3, Hadi Amirpour1, Patrick Le Callet2,
Christian Timmerer1
1
Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria
2
Nantes Universite, Ecole Centrale Nantes, CAPACITES SAS, CNRS, LS2N, UMR 6004, F-44000 Nantes, France
3
CEA, List, F-91120 Palaiseau, Université Paris-Saclay, France
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 1
Outline
1 Introduction
2 JASLA architecture
3 Results
4 Conclusions
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 2
Introduction
Motivation for per-scene encoding
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
10
30
50
70
90
VMAF
Dolls_s000 540p
Dolls_s000 1080p
RushHour_s000 540p
RushHour_s000 1080p
Figure: RD curve of 540p and 1080p CBR encodings of Dolls s000 and RushHour s0001
video
sequences using x265 HEVC encoder at slower preset.
Per-scene encoding schemes are based on the fact that one resolution performs better than
others in a scene for a given bitrate range, and these regions depend on the video complexity.2
1
Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. 2022. isbn: 9781450392839.
doi: 10.1145/3524273.3532892.
2
Vignesh V Menon et al. “JND-aware Two-pass Per-title Encoding Scheme for Adaptive Live Streaming”. In: IEEE Transactions on Circuits and Systems for
Video Technology (2023), pp. 1–1. doi: 10.1109/TCSVT.2023.3290725.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 3
Introduction
Motivation for JND-aware bitrate ladder
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
40
50
60
70
80
90
100
VMAF
360p
432p
540p
720p
1080p
Figure: RD curve of HLS CBR encoding of Characters s000 video sequence (segment) of VCD dataset
using x265 HEVC encoder at slower preset. The points with a bitrate greater than 3.6 Mbps are in the
perceptually lossless region.
Having many perceptually redundant representations for the bitrate ladder may not result in
improved quality of experience, but it may lead to increased storage and bandwidth costs.3
3
Tianchi Huang et al. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: Istanbul, Turkey: Association for Computing Machinery, 2021,
66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 4
JASLA architecture
JASLA architecture
Input scene
Encoding
representations
Scene Complexity
Feature Extraction
Representation
elimination
Input
Parameters
Set of target
Bitrates (B)
Set of target
resolutions (R)
Supplementary
information (e.g.,
encoder, preset)
Bitstream
Feature Extraction JND prediction
Resolution
prediction
CRF
prediction
Figure: JASLA architecture.
JASLA comprises three phases:
(i) scene complexity features extraction
(ii) optimized resolution and CRF prediction
(iii) JND threshold prediction.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 5
JASLA architecture Scene complexity feature extraction
Phase 1: Scene complexity feature extraction
Accomplished using VCA.4
EY : the average texture energy
h: the average gradient of the texture energy
LY : the average luminescence
(a) Original frame (b) Heatmap of L (c) Heatmap of E (d) Heatmap of h
Figure: Example heatmap of Luminescence (L), spatial texture (E) and temporal activity (h) features of
the 2nd
frame of CoverSong 1080P 0a86 video of Youtube UGC dataset extracted using VCA.
4
V. V. Menon et al. “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming”. In: First International ACM Green Multimedia
Systems Workshop (GMSys ’23). 2023. isbn: 9798400701962. doi: 10.1145/3593908.3593942.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 6
JASLA architecture Optimized resolution and CRF prediction
Phase 2: Optimized resolution and CRF prediction
Inputs:
R : set of all resolutions ˜
rm ∀ m ∈ [1, M]
M : number of resolutions in R
B : set of all bitrates bt ∀ t ∈ [1, N]
N : number of bitrates in B
EY , h, LY : average scene complexity
Output: (ˆ
r, b, ĉ) pairs of the bitrate ladder
for t ∈ [1, N] do
for m ∈ [1, M] do
Determine v˜
rm,bt with [EY , h, LY , log(bt)], using the model trained for ˜
rm.
ˆ
rt = arg max˜
rm∈R(v˜
r,bt )
Determine ĉt with [EY , h, LY , log(bt)], using the model trained for ˆ
rt.
(ˆ
rt, bt, ĉt) is the tth point of the bitrate ladder
Random forest models are trained to predict VMAF and CRF for every resolution sup-
ported by the streaming service provider.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 7
JASLA architecture JND-based representation elimination
JND-based representation elimination
JND threshold estimation
Near lossless encoded
bitstream features
Scene complexity features
GLCM features
EY ... EY
EY ... EY hY ... hY
hY ... hY
Features extractor
Feature
selection
Concatenate
...
...
LY ... LY
LY ... LY ...
SVR
Input scene
EY
...
fr
...
C
...
EY
...
fr
...
C
...
...
mean ... skew
mean ... skew mean ... skew
mean ... skew mean ... skew
mean ... skew
framerate
bitrate
...
framerate
bitrate
...
Temporal pooling
MX ... MX
MX ... MX
mean ... skew
mean ... skew
MY ... MY
MY ... MY
mean ... skew
mean ... skew
Spatial pooling
mean ... skew
mean ... skew
mean ... skew
mean ... skew
mean ... skew mean ... skew
mean ... skew
Contrast ... Contrast
T
c
ˆ
S
X
ˆ
G
X
ˆ
B
X
X̂
...
...
mean ... skew
mean ... skew
mean
...
mean ... skew
mean
...
...
Figure: JND threshold prediction model architecture.
A reduced complexity JND prediction model is derived from [5], which predicts the minimum
CRF where perceptual distortion is introduced.
(i) Scene complexity features, (ii) bitstream features, and (iii) Gray-Level Co-occurrence
Matrix (GLCM) features are extracted from the input video scene to predict the JND
threshold CRF (cT ).
5
J. Zhu et al. “Subjective test methodology optimization and prediction framework for Just Noticeable Difference and Satisfied User Ratio for compressed HD
video”. In: 2022 Picture Coding Symposium. 2022.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 8
JASLA architecture JND-based representation elimination
JND-based representation elimination
JND threshold estimation
Table: List of the fifteen features fed to SVR.
X̂S = Ft(XS ) X̂B = Ft(XB) X̂G = Ft(Fs(XG ))
max(LY ) kurt(AvMotionX) mean(mean(dissimilarity))
max(LU) kurt(AvMotionY) kurt(kurt(dissimilarity))
kurt(SpatialComplexity) max(mean(homogeneity))
mean(mean(homogeneity))
skew(std(angular second moment))
kurt(std(angular second moment))
kurt(skew(angular second moment))
mean(skew((energy))
std(max((correlation))
kurt(max((contrast))
All pooled features are concatenated into one feature vector, and Forward-Sequential Fea-
ture Selection (F-SFS)6 selects 15 features.
These features are fed into a Support Vector Regression (SVR) for predicting the minimum
CRF (cT ) where noticeable quality distortion (first JND) is observed.
6
Francesc J Ferri et al. “Comparative study of techniques for large-scale feature selection”. In: Machine Intelligence and Pattern Recognition. Vol. 16. Elsevier,
1994, pp. 403–413.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 9
JASLA architecture JND-based representation elimination
Phase 3: JND-based representation elimination
Representation elimination
Inputs:
N : number of bitrates in B
(ˆ
r, b, ĉ) pairs of the bitrate ladder
cT : JND threshold CRF
rmax : maximum resolution in R
Output: (ˆ
r, b, ĉ) pairs for encoding
t = 1, flag = 0
while t ≤ N do
if ˆ
rt == rmax and ĉt < cT then
flag + +
if flag > 1 then
Eliminate (ˆ
rt, bt, ĉt) from the ladder.
t + +
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 10
Results
Results
Performance of the prediction models
Average R2 score of the VMAF prediction models: 0.93
Average R2 score of the CRF prediction models : 0.97
Average MAE of the VMAF prediction models : 3.25
Average MAE of the CRF prediction models : 1.86
MAE of JND threshold prediction model : 0.96
Bitrate saving and storage reduction results7
BDRP = -34.42%
BDRV = -42.67%
∆S = -54.34%
BD-PSNR = 2.90 dB
BD-VMAF = 9.51
7
G. Bjontegaard. “Calculation of average PSNR differences between RD-curves”. In: VCEG-M33 (2001).
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 11
Results
Results
RD curves
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
40
60
80
VMAF
HLS CBR
JASLA
(a) Bunny s000
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF HLS CBR
JASLA
(b) Bosphorus s000
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
30
40
50
60
70
80
90
VMAF
HLS CBR
JASLA
(c) HoneyBee s000
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
20
40
60
80
VMAF
HLS CBR
JASLA
(d) RushHour s000
Figure: Comparison of RD curves of representative scenes (a) Bunny s000 (EY =22.40, h=4.70,
LY =129.21), (b) Bosphorus s000 (EY =26.77, h=16.08, LY =140.54), (c) HoneyBee s000 (EY =42.93,
h=7.91, LY =103.00), (d) RushHour s000 (EY =47.75, h=19.70, LY =101.66) using HLS CBR
encoding (blue line), JASLA encoding (red line).
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 12
Results
Results
RD curves
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
40
60
80
100
VMAF
HLS CBR
JASLA
(a) Characters s000
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
40
60
80
VMAF
HLS CBR
JASLA
(b) Eldorado s005
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
20
40
60
80
VMAF
HLS CBR
JASLA
(c) Runners s000
0.2 0.5 1.2 3.0 9.0
Bitrate (in Mbps)
40
60
80
VMAF
HLS CBR
JASLA
(d) Wood s000
Figure: Comparison of RD curves of representative scenes (a) Characters s000 (EY =45.42, h=36.88,
LY =134.56), (b) Eldorado s005 (EY =100.37, h=9.23, LY =109.06), (c) Runners s000 (EY =105.85,
h=22.48, LY =126.60), (d) Wood s000 (EY =124.72, h=47.03, LY =119.57) using HLS CBR encoding
(blue line), JASLA encoding (red line).
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 13
Conclusions
Conclusions
This paper proposed a JND-aware per-scene bitrate ladder prediction scheme (JASLA) for
adaptive video-on-demand streaming applications.
JASLA predicts the optimized resolution and corresponding CRF for given target bitrates
for every video scene based on content-aware spatial and temporal complexity features.
A JND threshold prediction scheme is proposed, eliminating representations that yield
distortion lower than one JND from the bitrate ladder.
On average, streaming using JASLA requires 34.42% and 42.67% fewer bits to maintain the
same PSNR and VMAF, respectively, compared to the reference HLS bitrate ladder, along
with a 54.34% cumulative decrease in the storage space needed to store representations,
using x265 HEVC encoder.
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 14
Q & A
Q & A
Thank you for your attention!
Vignesh V Menon (vignesh.menon@aau.at)
Jingwen Zhu (jingwen.zhu@univ-nantes.fr)
Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 15

JASLA_presentation.pdf

  • 1.
    Just Noticeable Difference-aware Per-SceneBitrate-laddering for Adaptive Video Streaming Vignesh V Menon1, Jingwen Zhu2, Prajit T Rajendran3, Hadi Amirpour1, Patrick Le Callet2, Christian Timmerer1 1 Christian Doppler Laboratory ATHENA, Alpen-Adria-Universität, Klagenfurt, Austria 2 Nantes Universite, Ecole Centrale Nantes, CAPACITES SAS, CNRS, LS2N, UMR 6004, F-44000 Nantes, France 3 CEA, List, F-91120 Palaiseau, Université Paris-Saclay, France Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 1
  • 2.
    Outline 1 Introduction 2 JASLAarchitecture 3 Results 4 Conclusions Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 2
  • 3.
    Introduction Motivation for per-sceneencoding 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 10 30 50 70 90 VMAF Dolls_s000 540p Dolls_s000 1080p RushHour_s000 540p RushHour_s000 1080p Figure: RD curve of 540p and 1080p CBR encodings of Dolls s000 and RushHour s0001 video sequences using x265 HEVC encoder at slower preset. Per-scene encoding schemes are based on the fact that one resolution performs better than others in a scene for a given bitrate range, and these regions depend on the video complexity.2 1 Hadi Amirpour et al. “VCD: Video Complexity Dataset”. In: Proceedings of the 13th ACM Multimedia Systems Conference. 2022. isbn: 9781450392839. doi: 10.1145/3524273.3532892. 2 Vignesh V Menon et al. “JND-aware Two-pass Per-title Encoding Scheme for Adaptive Live Streaming”. In: IEEE Transactions on Circuits and Systems for Video Technology (2023), pp. 1–1. doi: 10.1109/TCSVT.2023.3290725. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 3
  • 4.
    Introduction Motivation for JND-awarebitrate ladder 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 40 50 60 70 80 90 100 VMAF 360p 432p 540p 720p 1080p Figure: RD curve of HLS CBR encoding of Characters s000 video sequence (segment) of VCD dataset using x265 HEVC encoder at slower preset. The points with a bitrate greater than 3.6 Mbps are in the perceptually lossless region. Having many perceptually redundant representations for the bitrate ladder may not result in improved quality of experience, but it may lead to increased storage and bandwidth costs.3 3 Tianchi Huang et al. “Deep Reinforced Bitrate Ladders for Adaptive Video Streaming”. In: Istanbul, Turkey: Association for Computing Machinery, 2021, 66–73. isbn: 9781450384353. doi: 10.1145/3458306.3458873. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 4
  • 5.
    JASLA architecture JASLA architecture Inputscene Encoding representations Scene Complexity Feature Extraction Representation elimination Input Parameters Set of target Bitrates (B) Set of target resolutions (R) Supplementary information (e.g., encoder, preset) Bitstream Feature Extraction JND prediction Resolution prediction CRF prediction Figure: JASLA architecture. JASLA comprises three phases: (i) scene complexity features extraction (ii) optimized resolution and CRF prediction (iii) JND threshold prediction. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 5
  • 6.
    JASLA architecture Scenecomplexity feature extraction Phase 1: Scene complexity feature extraction Accomplished using VCA.4 EY : the average texture energy h: the average gradient of the texture energy LY : the average luminescence (a) Original frame (b) Heatmap of L (c) Heatmap of E (d) Heatmap of h Figure: Example heatmap of Luminescence (L), spatial texture (E) and temporal activity (h) features of the 2nd frame of CoverSong 1080P 0a86 video of Youtube UGC dataset extracted using VCA. 4 V. V. Menon et al. “Green Video Complexity Analysis for Efficient Encoding in Adaptive Video Streaming”. In: First International ACM Green Multimedia Systems Workshop (GMSys ’23). 2023. isbn: 9798400701962. doi: 10.1145/3593908.3593942. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 6
  • 7.
    JASLA architecture Optimizedresolution and CRF prediction Phase 2: Optimized resolution and CRF prediction Inputs: R : set of all resolutions ˜ rm ∀ m ∈ [1, M] M : number of resolutions in R B : set of all bitrates bt ∀ t ∈ [1, N] N : number of bitrates in B EY , h, LY : average scene complexity Output: (ˆ r, b, ĉ) pairs of the bitrate ladder for t ∈ [1, N] do for m ∈ [1, M] do Determine v˜ rm,bt with [EY , h, LY , log(bt)], using the model trained for ˜ rm. ˆ rt = arg max˜ rm∈R(v˜ r,bt ) Determine ĉt with [EY , h, LY , log(bt)], using the model trained for ˆ rt. (ˆ rt, bt, ĉt) is the tth point of the bitrate ladder Random forest models are trained to predict VMAF and CRF for every resolution sup- ported by the streaming service provider. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 7
  • 8.
    JASLA architecture JND-basedrepresentation elimination JND-based representation elimination JND threshold estimation Near lossless encoded bitstream features Scene complexity features GLCM features EY ... EY EY ... EY hY ... hY hY ... hY Features extractor Feature selection Concatenate ... ... LY ... LY LY ... LY ... SVR Input scene EY ... fr ... C ... EY ... fr ... C ... ... mean ... skew mean ... skew mean ... skew mean ... skew mean ... skew mean ... skew framerate bitrate ... framerate bitrate ... Temporal pooling MX ... MX MX ... MX mean ... skew mean ... skew MY ... MY MY ... MY mean ... skew mean ... skew Spatial pooling mean ... skew mean ... skew mean ... skew mean ... skew mean ... skew mean ... skew mean ... skew Contrast ... Contrast T c ˆ S X ˆ G X ˆ B X X̂ ... ... mean ... skew mean ... skew mean ... mean ... skew mean ... ... Figure: JND threshold prediction model architecture. A reduced complexity JND prediction model is derived from [5], which predicts the minimum CRF where perceptual distortion is introduced. (i) Scene complexity features, (ii) bitstream features, and (iii) Gray-Level Co-occurrence Matrix (GLCM) features are extracted from the input video scene to predict the JND threshold CRF (cT ). 5 J. Zhu et al. “Subjective test methodology optimization and prediction framework for Just Noticeable Difference and Satisfied User Ratio for compressed HD video”. In: 2022 Picture Coding Symposium. 2022. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 8
  • 9.
    JASLA architecture JND-basedrepresentation elimination JND-based representation elimination JND threshold estimation Table: List of the fifteen features fed to SVR. X̂S = Ft(XS ) X̂B = Ft(XB) X̂G = Ft(Fs(XG )) max(LY ) kurt(AvMotionX) mean(mean(dissimilarity)) max(LU) kurt(AvMotionY) kurt(kurt(dissimilarity)) kurt(SpatialComplexity) max(mean(homogeneity)) mean(mean(homogeneity)) skew(std(angular second moment)) kurt(std(angular second moment)) kurt(skew(angular second moment)) mean(skew((energy)) std(max((correlation)) kurt(max((contrast)) All pooled features are concatenated into one feature vector, and Forward-Sequential Fea- ture Selection (F-SFS)6 selects 15 features. These features are fed into a Support Vector Regression (SVR) for predicting the minimum CRF (cT ) where noticeable quality distortion (first JND) is observed. 6 Francesc J Ferri et al. “Comparative study of techniques for large-scale feature selection”. In: Machine Intelligence and Pattern Recognition. Vol. 16. Elsevier, 1994, pp. 403–413. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 9
  • 10.
    JASLA architecture JND-basedrepresentation elimination Phase 3: JND-based representation elimination Representation elimination Inputs: N : number of bitrates in B (ˆ r, b, ĉ) pairs of the bitrate ladder cT : JND threshold CRF rmax : maximum resolution in R Output: (ˆ r, b, ĉ) pairs for encoding t = 1, flag = 0 while t ≤ N do if ˆ rt == rmax and ĉt < cT then flag + + if flag > 1 then Eliminate (ˆ rt, bt, ĉt) from the ladder. t + + Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 10
  • 11.
    Results Results Performance of theprediction models Average R2 score of the VMAF prediction models: 0.93 Average R2 score of the CRF prediction models : 0.97 Average MAE of the VMAF prediction models : 3.25 Average MAE of the CRF prediction models : 1.86 MAE of JND threshold prediction model : 0.96 Bitrate saving and storage reduction results7 BDRP = -34.42% BDRV = -42.67% ∆S = -54.34% BD-PSNR = 2.90 dB BD-VMAF = 9.51 7 G. Bjontegaard. “Calculation of average PSNR differences between RD-curves”. In: VCEG-M33 (2001). Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 11
  • 12.
    Results Results RD curves 0.2 0.51.2 3.0 9.0 Bitrate (in Mbps) 40 60 80 VMAF HLS CBR JASLA (a) Bunny s000 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF HLS CBR JASLA (b) Bosphorus s000 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 30 40 50 60 70 80 90 VMAF HLS CBR JASLA (c) HoneyBee s000 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 20 40 60 80 VMAF HLS CBR JASLA (d) RushHour s000 Figure: Comparison of RD curves of representative scenes (a) Bunny s000 (EY =22.40, h=4.70, LY =129.21), (b) Bosphorus s000 (EY =26.77, h=16.08, LY =140.54), (c) HoneyBee s000 (EY =42.93, h=7.91, LY =103.00), (d) RushHour s000 (EY =47.75, h=19.70, LY =101.66) using HLS CBR encoding (blue line), JASLA encoding (red line). Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 12
  • 13.
    Results Results RD curves 0.2 0.51.2 3.0 9.0 Bitrate (in Mbps) 40 60 80 100 VMAF HLS CBR JASLA (a) Characters s000 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 40 60 80 VMAF HLS CBR JASLA (b) Eldorado s005 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 20 40 60 80 VMAF HLS CBR JASLA (c) Runners s000 0.2 0.5 1.2 3.0 9.0 Bitrate (in Mbps) 40 60 80 VMAF HLS CBR JASLA (d) Wood s000 Figure: Comparison of RD curves of representative scenes (a) Characters s000 (EY =45.42, h=36.88, LY =134.56), (b) Eldorado s005 (EY =100.37, h=9.23, LY =109.06), (c) Runners s000 (EY =105.85, h=22.48, LY =126.60), (d) Wood s000 (EY =124.72, h=47.03, LY =119.57) using HLS CBR encoding (blue line), JASLA encoding (red line). Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 13
  • 14.
    Conclusions Conclusions This paper proposeda JND-aware per-scene bitrate ladder prediction scheme (JASLA) for adaptive video-on-demand streaming applications. JASLA predicts the optimized resolution and corresponding CRF for given target bitrates for every video scene based on content-aware spatial and temporal complexity features. A JND threshold prediction scheme is proposed, eliminating representations that yield distortion lower than one JND from the bitrate ladder. On average, streaming using JASLA requires 34.42% and 42.67% fewer bits to maintain the same PSNR and VMAF, respectively, compared to the reference HLS bitrate ladder, along with a 54.34% cumulative decrease in the storage space needed to store representations, using x265 HEVC encoder. Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 14
  • 15.
    Q & A Q& A Thank you for your attention! Vignesh V Menon (vignesh.menon@aau.at) Jingwen Zhu (jingwen.zhu@univ-nantes.fr) Vignesh V Menon Just Noticeable Difference-aware Per-Scene Bitrate-laddering for Adaptive Video Streaming 15