Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning

HTTP Adaptive Streaming(HAS) is the most common approach for delivering video content over the Internet. Therequirement to encode the same content at different quality levels(i.e., representations) in HAS is a challenging problem for content providers. Fast multirate encoding approaches try to accelerate this process by reusing information from previously encoded representations. In this paper, we propose to use convolutional neural networks (CNNs) to speed up the encoding of multiple representations with a specific focus on parallel encoding. In parallel encoding, the overall time-complexity is limited to the maximum time-complexity of one of the representations that are encoded in parallel. Therefore, instead of reducing the time-complexity for all representations, the highest time-complexities are reduced. Experimental results show that FaME-ML achieves significant time-complexity savings in parallel encoding scenarios(41%in average) with a slight increase in bitrate and quality degradation compared to the HEVC reference software.

  • Be the first to comment

  • Be the first to like this

FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning

  1. 1. FaME-ML: Fast Multirate Encoding for HTTP Adaptive Streaming Using Machine Learning December 2nd , 2020 IEEE VCIP 1 Ekrem Çetinkaya, Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari
  2. 2. ● Introduction ● FaME-ML ● Experimental Results ● Conclusion ● Q & A Agenda 2
  3. 3. Introduction 3
  4. 4. Video Streaming 4
  5. 5. Video Streaming Share in the Internet Traffic 82% 4
  6. 6. Video Streaming Share in the Internet Traffic 82% Content Characteristics 4
  7. 7. Video Streaming Share in the Internet Traffic 82% Content Characteristics 1 Million minutes Video Streamed Every Second 4
  8. 8. Video Streaming Share in the Internet Traffic 82% Content Characteristics 1 Million minutes Video Streamed Every Second As of 2021 * Cisco VNI Forecast Highlights (2021) 4
  9. 9. HTTP Adaptive Streaming (HAS) 5
  10. 10. HTTP Adaptive Streaming (HAS) Very Nice Video Play 5
  11. 11. HTTP Adaptive Streaming (HAS) Very Nice Video Play 5
  12. 12. HTTP Adaptive Streaming (HAS) Very Nice Video Play 5
  13. 13. HTTP Adaptive Streaming (HAS) Very Nice Video Play 5
  14. 14. HTTP Adaptive Streaming (HAS) Very Nice Video Play 5
  15. 15. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps 5
  16. 16. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  17. 17. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  18. 18. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  19. 19. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  20. 20. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  21. 21. HTTP Adaptive Streaming (HAS) Very Nice Video Play Very Nice Video 3500Kbps Play 5
  22. 22. HTTP Adaptive Streaming (HAS) Very Nice Video PlayPlay Very Nice Video 1500Kbps 5
  23. 23. HTTP Adaptive Streaming (HAS) Very Nice Video PlayPlay Very Nice Video 1500Kbps 5
  24. 24. HTTP Adaptive Streaming (HAS) Very Nice Video PlayPlay Very Nice Video 1500Kbps 5
  25. 25. Multi-rate Encoding 6 1500 kbps 2000 kbps 5000 kbps 3500 kbps Source Video HTTP Server Encoding x4
  26. 26. Multi-rate Encoding 6 1500 kbps 2000 kbps 5000 kbps 3500 kbps Source Video HTTP Server Encoding x4
  27. 27. Block Partitioning 7
  28. 28. Block Partitioning 7
  29. 29. Block Partitioning 7
  30. 30. Block Partitioning PSNR Bitrate 0 7
  31. 31. Block Partitioning PSNR Bitrate 0 1 1 1 1 7
  32. 32. Block Partitioning PSNR Bitrate 0 1 1 1 1 2 2 2 2 7
  33. 33. Block Partitioning PSNR Bitrate 0 1 1 1 1 7
  34. 34. Block Partitioning PSNR Bitrate 0 1 1 1 1 2 2 2 2 7
  35. 35. Block Partitioning PSNR Bitrate 0 1 1 1 1 2 2 2 2 3 3 3 3 7
  36. 36. CTU Search Window Bound QP 22 QP 38QP 30 ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  37. 37. CTU Search Window Bound QP 22 QP 38QP 30 Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  38. 38. CTU Search Window Bound QP 22 QP 38QP 30 2 Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  39. 39. CTU Search Window Bound QP 22 QP 38QP 30 2 3 Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  40. 40. CTU Search Window Bound QP 22 QP 38QP 30 2 3 Depth = [0 1 2 3] Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  41. 41. CTU Search Window Bound QP 22 QP 38 1 QP 30 2 3 Depth = [0 1 2 3] Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  42. 42. CTU Search Window Bound QP 22 QP 38 1 QP 30 2 3 2 Depth = [0 1 2 3] Depth = [0 1 2 3] ● Finding = CTUs tend to get higher depth levels as the quality goes up Upper1 Lower2 8 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358
  43. 43. Problem & Solution ● Existing methods typically use the highest quality representation as the reference ● Cannot reduce the parallel encoding time ● The highest quality representation is the bottleneck ● Use the lowest quality representation as the reference ● Utilize machine learning for better performance ● Focus on parallel encoding time ○ Reduce the encoding-time of the highest complexity representations ○ Eliminate the encoding-time bottleneck 9 Normalized time-complexity of different quality representations using three different encoding methods 1 Schroeder, Damien, et al. "Efficient multi-rate video encoding for HEVC-based adaptive HTTP streaming." IEEE Transactions on Circuits and systems for Video Technology 28.1 (2016): 143- 157. 2 H. Amirpour, E. Çetinkaya, C. Timmerer and M. Ghanbari, "Fast Multi-rate Encoding for Adaptive HTTP Streaming," 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 2020, pp. 358-358 1 2
  44. 44. FaME-ML 10
  45. 45. Features 11
  46. 46. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD 5
  47. 47. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV 5 5 ● Variance of pixels = Inside the CU (FV)
  48. 48. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV FMV 5 5 1 ● Variance of pixels = Inside the CU (FV) ● Motion vectors = Average magnitude of MVs inside the CU (FMV)
  49. 49. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV FMV FD 5 5 1 1 ● Variance of pixels = Inside the CU (FV) ● Motion vectors = Average magnitude of MVs inside the CU (FMV) ● Depth level = CU split decision for given depth level (FD)
  50. 50. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV FMV FD FQP 5 5 1 1 1 ● Variance of pixels = Inside the CU (FV) ● Motion vectors = Average magnitude of MVs inside the CU (FMV) ● Depth level = CU split decision for given depth level (FD) ● Frame level QP = QP value for the given frame (FQP)
  51. 51. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV FMV FD FQP FPU 5 5 1 1 1 1 ● Variance of pixels = Inside the CU (FV) ● Motion vectors = Average magnitude of MVs inside the CU (FMV) ● Depth level = CU split decision for given depth level (FD) ● Frame level QP = QP value for the given frame (FQP) ● PU decision = PU split decision for the given CU (FPU)
  52. 52. Features ● RD Cost = Number of bits to encode the given CU and four sub-CUs (FRD) 11 FRD FV FMV FD FQP FPU 5 5 1 1 1 1 F = ● Variance of pixels = Inside the CU (FV) ● Motion vectors = Average magnitude of MVs inside the CU (FMV) ● Depth level = CU split decision for given depth level (FD) ● Frame level QP = QP value for the given frame (FQP) ● PU decision = PU split decision for the given CU (FPU) 14
  53. 53. Training Dataset ● 12 Test sequences defined in HEVC CTC3 ● YUV information are extracted for each CU ○ 64x64 size for D0 and 32x32 size for D1 ● Sequences are encoded with HEVC reference software (HM 16.21)4 ○ Encoding information are extracted and saved for QP38 ○ 64x64 size for D0 and 32x32 size for D1 ○ Features are individually min-max normalized in video level ○ Depth values are saved as targets for remaining QPs ● 90 % of frames for training set (259,200 CTUs) ● 10 % of frames for validation set (28,800 CTUs) 12 3 F. Bossen et al., “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, p. 7, 2013 4 https://vcgit.hhi.fraunhofer.de/jct-vc/HM
  54. 54. Convolutional Neural Network (CNN) 13 Y,U,V input sizes are halved and red part is dismissed in the Depth 1 classifier.
  55. 55. Overall Method ● Encode the lowest quality representation with HEVC reference software ○ Save the encoding information ● Pass YUV information to texture processing CNN and get an intermediate decisions ● Combine the intermediate decision with feature vector and pass through a fully connected layer to get the final decision ● Apply CNN for bottleneck quality levels in parallel encoding scenario ○ Depth 0 and Depth 1 for QP22 ○ Depth 0 for QP26
  56. 56. Overall Method ● Encode the lowest quality representation with HEVC reference software ○ Save the encoding information ● Pass YUV information to texture processing CNN and get an intermediate decisions ● Combine the intermediate decision with feature vector and pass through a fully connected layer to get the final decision ● Apply CNN for bottleneck quality levels in parallel encoding scenario ○ Depth 0 and Depth 1 for QP22 ○ Depth 0 for QP26
  57. 57. Overall Method QP38 HEVC ● Encode the lowest quality representation with HEVC reference software ○ Save the encoding information ● Pass YUV information to texture processing CNN and get an intermediate decisions ● Combine the intermediate decision with feature vector and pass through a fully connected layer to get the final decision ● Apply CNN for bottleneck quality levels in parallel encoding scenario ○ Depth 0 and Depth 1 for QP22 ○ Depth 0 for QP26
  58. 58. Overall Method QP38 QP34QP30 HEVC HEVCHEVC ● Encode the lowest quality representation with HEVC reference software ○ Save the encoding information ● Pass YUV information to texture processing CNN and get an intermediate decisions ● Combine the intermediate decision with feature vector and pass through a fully connected layer to get the final decision ● Apply CNN for bottleneck quality levels in parallel encoding scenario ○ Depth 0 and Depth 1 for QP22 ○ Depth 0 for QP26
  59. 59. Overall Method QP38 CNN QP34QP22 QP26 QP30 HEVC HEVCHEVC CNN HEVC HEVC ● Encode the lowest quality representation with HEVC reference software ○ Save the encoding information ● Pass YUV information to texture processing CNN and get an intermediate decisions ● Combine the intermediate decision with feature vector and pass through a fully connected layer to get the final decision ● Apply CNN for bottleneck quality levels in parallel encoding scenario ○ Depth 0 and Depth 1 for QP22 ○ Depth 0 for QP26
  60. 60. Experimental Results 15
  61. 61. Experiment Settings ● 8 Test sequences from SVT 5 and JVET 6 datasets ● Five QP levels [38, 34, 30, 26, 22] ● Low-Delay P configuration ● Bjontegaard Delta 7 Rate with PSNR and VMAF 8 are calculated ● Encoding performance is compared with HEVC reference software (HM 16.21)4 and the lower bound approach 2 ○ Lower bound = Minimum CTU depth search value is limited by the lowest quality representation 16 5 L. Haglund, “The SVT high definition multi format test set,”SwedishTelevision Stockholm, 2006 6 K. Suehring and X. Li, “JVET common test conditions and software reference configurations,”JVET-B1010, 2016 7 G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,”VCEG-M33, 2001. 8 Z.Li,A.Aaron,I.Katsavounidis,A.Moorthy, and M.Manohara,“Towards practical perceptual video quality metric,”[Online]https://netflixtechblog.com/toward-a- practical-perceptual-video-quality-metric-653f208b9652,2016
  62. 62. Encoding Results ● Compared with the HM ● Calculated over five QP levels ● ΔT is the difference between the maximum time complexity of each method ● BDRP and BDRV are the Bjontegaard Delta rates with PSNR and VMAF respectively ● 41 % time saving for parallel encoding (difference between the highest time complexity representations) 17
  63. 63. Encoding Time Graph 18 0.59 0.88 1.00
  64. 64. Conclusion 19
  65. 65. Conclusion ● Machine learning based approach for fast multi-rate encoding ○ Focus on the parallel encoding performance ● The lowest quality representation is used as the reference ● CNN is used for CU split decision for a given depth level ● Method is applied on the highest two complexity representations ○ Bottleneck encoding times are reduced with minimal quality increase ● 41 % time saving for parallel encoding with 0.88 % bitrate increase in average 20
  66. 66. Thank you 21 ekrem@itec.aau.at

×