Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming

264 views

Published on

Cisco reported in the past reports that the video data share was expected to reach 80% by the year 2023. However, due to the pandemic and recently imposed a remote work lifestyle, this figure is expected to increase even more. Except for the on-demand and conferencing services, the number of users that are generating, storing, and sharing their content usually through either social media platforms or video-sharing platforms is increasing. Meanwhile from the video coding perspective, as video technologies evolve towards improved compression performance, their complexity inversely increases.

A challenge that many video service providers face is the heterogeneity of networks and display devices for streaming, as well as dealing with a wide variety of content with different encoding performance. In the past, a fixed bit rate ladder solution based on a „fitting all“ approach has been employed. However, such a content-tailored solution is highly demanding; the computational and financial cost of constructing the convex hull per video by encoding at all resolutions and quantization levels is huge. In this talk, we present a content-agnostic approach that exploits machine learning to predict the bit rate ladder with only a small number of encodes required.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming

  1. 1. Content-gnostic Bitrate Ladder Prediction forAdaptive Video Streaming 29/09/2020 Angeliki Katsenou
  2. 2. Structure I. Motivation II. Compression and Content III. Proposed Framework IV. Results V.Conclusion and Future Work
  3. 3. I. Motivation Cisco reports on internet data traffic estimate that the video data share is expected to reach 80% by 2022 and is expected to increase more. [1] Due to the pandemic and recently shift towards remote work life-style, this figure is probably almost a reality. Video providers employ adaptive streaming to address the users specifications. Traditionally, this is achieved by creating several versions for a video sequence using different encoding parameters, such as resolution. This, however, requires a huge amount of encodings, which impacts on time, cost and energy (increased CO2 footprint). 11% 17% 11% 6%20% 16% 19% Distribution of energy consumption for production and use in 2017 TVs (production) Computers (production) Smartphones (production) Others Terminals (use) Networks (use) Data Centers (use) “…as of the end of December last year, the maximum number of daily meeting participants, both free and paid, conducted on Zoom was approximately 10 million. In March this year, we reached more than 200 million daily meeting participants, both free and paid.” [2] Eric S. Yuan  Founder and CEO, Zoom
  4. 4. I. Motivation Fig.1 Sample frames of a 100 4K dataset. 101 102 103 104 105 106 Bitrate (kbps) 25 30 35 40 45 50 55 PSNR(dB) 4K RQsFHD RQs HD RQs Fig.2 PSNR-log(Rate) curves across resolutions. One ladder does not fit all! Table 1 The encoding ladder presented in Apple Tech Note TN2224.
  5. 5. I. Motivation How can we find the “best” bitrate ladder per content so that we do not compromise the quality of experience? How could we make this process more computationally efficient without degrading the delivered video quality? Table 1 The encoding ladder presented in Apple Tech Note TN2224. Table 2 Netflix’s per-title can change both the number of rungs and their resolution. [3, 4] Other Per-Title Approaches: Bitmovin, Mux, CAMBRIA, etc
  6. 6. I. Motivation How can we find the “best” bitrate ladder per content so that we do not compromise the quality of experience? How could we make this process more computationally efficient without degrading the delivered video quality? Convex Hull- Optimal Encoding Solution Sub-optimal Encoding Solution Sub-optimal Encoding Solution Practical Approach Fig.3 RD curves and convex hull. Ideally the optimal solution would to build the ladder by sampling the convex hull of the RQ curves across resolutions. We propose a content-gnostic machine- learning based approach that predicts the bitrate ladder.
  7. 7. II. Content Features and Compression Fig.4 Correlation matrix of HM coding statistics to spatio-temporal features. [5] Fig.5 Examples of predicted PSNR-Rate curves. [5]
  8. 8. III. Proposed Framework 101 102 103 104 105 106 Bitrate (kbps) 25 30 35 40 45 50 55 PSNR(dB) 4K RQsFHD RQs HD RQs Fig.2 PSNR-log(Rate) curves across resolutions. 5000 10000 50000 log (Bitrate (kbps)) 32 34 36 38 40 PSNR(dB) 4K FHD HD Convex Hull {QP high FHD ,QP HD } {QP 4K ,QP low FHD } Fig.6 Example of RQ curves’ intersection. Finding the cross-over points helps defining the switching of resolution on the convex hull. We assume that the RQs are intersecting in an ordered monotonic fashion (e.g. 2160p intersects with the 1080p, 1080p with the 720p, etc).
  9. 9. III. Proposed Framework Fig.7 Scatterplots of cross-over QPs. 15 20 25 30 35 40 45 QP 4K 15 20 25 30 35 40 45 QP low FHD PCC: .9917 SROCC: .9888 20 25 30 35 40 QPhigh FHD 20 25 30 35 40 QP HD PCC: .9817 SROCC: .9538 This relation can be used to improve cross- over QP predictions.
  10. 10. III. Proposed Framework Content Features Extraction Machine Learning-based Regression Testing Videos @ Native Spatial Resolution Spatio-temporal Features of Testing Videos Video CodecBitrate of Cross-over Points RQ Convex Hull Fitting Ground-truth - RQ Convex Hull Training Videos @ Native Resolution Downscaling Resolution Training Videos @ all considered resolutions Training Videos Cross- over QPs Training Videos @ Native Resolution Spatio-temporal Features of Training Videos Training Process Testing Process Upscaling Resolution Decoded Training Videos @ all considered resolutions Upscaled Training Videos @ Native Resolution Quality Metrics Computation Upscaled Decoded Training Videos @ Native Resolution Decoded Testing Videos @ Cross-over QPs Upscaled Decoded Testing Videos @ Cross-over QPs Quality Metric Values for Training Videos Quality Metric Values for Testing Videos at Cross-over Points Testing Videos @ Native Spatial Resolution Predicted Cross- over QPs per Resolution Predicted BitrateLadder • RQ Convex Hull Eq. • Rate-QP Eq. • Resolution Switching Rate points Fig.8 Proposed method.
  11. 11. III. Proposed Framework Fig.9 RQ convex hulls (blue: 2160p, red: 1080p, yellow: 720p. purple: 540p, green: 480p).
  12. 12. III. Proposed Framework We fitted the convex hull in a 3rd order polynomial. This means that after determining the cross-over QPs, we need four encodes in order to determine the polynomial parameters. Then, we can sample the convex hull and build the bitrate ladder. Table 3 Fitted Models.
  13. 13. III. Proposed Framework 17 18 19 20 21 22 23 24 25 log2(Bitrate) 20 30 40 50 60 70 80 90 100 VMAF 17 18 19 20 21 22 23 24 25 log2(BitRate) 20 25 30 35 40 45 50 55 PSNR(dB) Fig.10 PSNR-Rate Ladder Fig.11 VMAF-Rate Ladder RL,i ≃ 2RL,i−1 or log(RL,i) ≃ 1 + log(RL,i−1) , where RL,i ∈ (Rmin, Rmax) QL,i(RL,i) ≤ Qmax and dQL,i RL > ϵ , where ϵ → 0 Building the bitrate ladder: 1. Determine the operational bitrate range; 2. Sample the bitrate: 3. Sample the quality:
  14. 14. IV. Results 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 stdTC std 20 25 30 35 40 45 QP 4K Table 4 List of Features [5]. Fig.12 Example of content dependency of cross-over QPs.
  15. 15. IV. Results Fig.1 Sample frames of a 100 4K dataset. Fig.13 Spatial and Temporal Information of the dataset.
  16. 16. IV. Results From PCS2019 paper[6]: We have tested the proposed framework with HM16.20, considering the resolutions {2160p,1080p,720p}. Lanczos-3 filter (ffmpeg implementation) was used for the spatial down/up-sampling. We compare our method against two state-of-the-art solutions: • Brute force method: we performed encodings with a QP step equal to 1. The brute force method theoretically creates the optimal convex hull. This is considered our ground truth. • Interpolation-based method: 7 encodings per resolution (using equidistant QPs to cover the range) and by using a piece-wise cubic Hermite interpolation for the in-between QPs. This method of course results in constructing a suboptimal convex hull, but it can provide a good approximation of it, while significantly reducing the number of pre-encodes.
  17. 17. IV. Results We applied feature selection, and particularly Recursive Feature Elimination on the set of spatio-temporal features. We perform a sequential prediction of the QPs starting from the higher resolution: • For the QP4K prediction, we only relied on spatio-temporal features. • For the rest of the predictions, we made use of the identified relations and considered the previously predicted QPs (of the highest resolutions) as features. We have tested various regression methods, such as SVMs with different kernels, RFs, etc, but GPs were the best performing models. To avoid overfitting, we performed a 10-fold cross-validation.
  18. 18. IV. Results 15 20 25 30 35 40 True QP4K 15 20 25 30 35 40 PredictedQP4K 20 25 30 35 40 True QPhigh FHD 20 22 24 26 28 30 32 34 36 38 40 PredictedQPhigh FHD 14 16 18 20 22 24 26 28 30 32 34 36 True QPlow HD 14 16 18 20 22 24 26 28 30 32 34 36 PredictedQPlow HD Fig.14 Predicted cross QP 4K Fig.15 Predicted cross QP FHD high. Fig.16 Predicted cross QP HD Table 5 Results on cross-over QPs prediction.
  19. 19. IV. Results The different distributions are due to the different reference convex hulls. Fig.17 BDRate Histogram. Fig.18 BDPSNR Histogram. Most outliers refer to sequences that do not comply with the hypothesis that the RQs are intersecting in a resolution- monotonic manner.
  20. 20. IV. Results 0 5 10 15 Bitrate (kbps) 104 31 32 33 34 35 36 37 38 39 PSNR(dB) 4K - 2160p FHD - 1080p HD - 720p Convex Hull 0 0.5 1 1.5 2 Bitrate (kbps) 105 30 32 34 36 38 40 PSNR(dB) campfirepartyg op1 - BDRate:0.18364 , BDPSNR:-0.0040022 Ground Truth Convex Hull Predicted Convex Hull BDRate=0.18% BDPSNR=-0.004dB 0 1 2 3 4 5 Bitrate (kbps) 10 4 37 37.5 38 38.5 39 39.5 40 PSNR(dB) 4K - 2160p HD - 1080p SD - 720p Convex Hull 0 1 2 3 4 5 6 7 Bitrate (kbps) 10 4 37.5 38 38.5 39 39.5 40 PSNR(dB) barsceneg op1 - BDRate:2.0087 , BDPSNR:-0.0091561 Ground Truth Convex Hull Predicted Convex Hull BDRate=2.009% BDPSNR=-0.009dB Fig.19 Examples of results.
  21. 21. IV. Results 94.2% fewer encodings compared to the brute force method and 80.95% compared to the interpolation-based method. Proposed method overhead: the average feature extraction time for a sequence at 4K resolution to the average 4K encoding time for a sequence at QP=27 is 0.18. Table 6 Comparison of the number of encodes required per method.
  22. 22. V. Conclusion and Future Work Conclusions: We proposed a method that can predict the bitrate ladders of the considered resolutions based on spatio-temporal features extracted from the uncompressed videos at their native resolution and with a few video encodings (two encodes per RQ intersecting points). The first results are promising compared to the ground truth, while requiring 94.2% and 81% fewer pre-encodes compared to the brute force method and the interpolation- based method, respectively. Future Work: Our focus will be on validating the presented method across different codecs. We will also work on identifying cross-codecs optimization of bitrate ladders.
  23. 23. References 1. “Global Mobile Data Traffic Forecast Update 2017-2022”, White Paper, Cisco, 2018. 2. E. S. Yuan, “A message to our users”, https://blog.zoom.us/a-message-to-our-users/ 3. J. De Cock, Z. Li, M. Manohara, and A. Aaron, “Complexity-based consistent quality encoding in the Cloud”, IEEE ICIP 2016. 4. J. Sole, L. Guo, A. Norkin, M. Afonso, K. Swanson, and A. Aaron, “Performance comparison of video coding standards: an
 adaptive streaming perspective,” https://medium.com/netflix-techblog/performance- comparison- of- video- coding- standards- an- adaptive- streaming- perspective- d45d0183ca95, 2018. 5.A. Katsenou, M. Afonso, D. Agrafiotis, and D. R. Bull, “Predicting Video Rate-Distortion Curves using Textural Features,” in PCS 2016. 6. A. V. Katsenou, J. Sole, and D. R. Bull, “Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming,” in PCS 2019.
  24. 24. Thanks to Dr Joel Sole Dr Mariana Afonso Prof David Bull
  25. 25. pcs2021.org You are invited!

×