Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Perceptual Video Coding

3,913 views

Published on

Latest research progress about perceptual video coding

Published in: Technology
  • Be the first to comment

Perceptual Video Coding

  1. 1. Perceptual Video Coding Research Progress Dr. Li Song Associate Professor, SJTUVisiting Associate Professor, SCU 2012.09
  2. 2. Outline Introduction  Perceptual Cues in Video Coding Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework Summary & References
  3. 3. Perceptual Lossless ImagesPIC: 0.914 bits/pixel! Original! [T. Pappas, Visual Signal Analysis and Compression, ICIP 2010]
  4. 4. Perceptual Video Coding Technique (Digital) Video D Codec(Encoder + Decoder) R Human Visual System (HVS) (end recipient) Dimensions of coder performanceBasic Principle in Perceptual coding technique - consider all the data that humans cannot perceive assuperfluous data, and discard them.
  5. 5. Rate-Distortion Theory ^ x Q xQuantization noise: ˆ e X X N D   pi ( xi  xi ) 2 ˆ i 1 probabilities If X is Gaussian distribution N(0,σ2): D  2 2 2 R
  6. 6. Gap between theory and real codec SPIHT can beat Shannon bound! Gaussian prior is not valid for image!Rate-distortion curves achieved with the SPIHT coder(dash line) and with theShannon RD theoretical bounds(solid line) corresponding to an i.i.d. zero-mean Gaussian model for each wavelet sub bands (Gaussian vector source) [A. Ortega, etc, IEEE Signal Processing Magazine, 1998]
  7. 7. HEVC: MSE vs MOS Random Low Delay AccessClass A −36.9%Class B −39.4% −40.3%Class C −30.1% −31.5%Class D −28.3% −29.2%Class E −41.2%Class F −26.2% −28.8%Average −32.5% −34.2%Average −34.0% −35.5%without F [from:JCTVC-I0409, 2012] [from: JCT-VC Summary, 8th JCT-VC] There is >20% gap between MSE and MOS!
  8. 8. Ideal perceptual metric Half century’s endeavor and still open problem!Many metrics proposed: SSIM/M-SSIM/CW-SSIM, VIF, VQM,… [Figure from :N. Jayant, Proceedings of the IEEE ,1993]
  9. 9. What about Popular SSIM? [JCTVC-H0063,2012]
  10. 10. Outline Introduction  Perceptual Cues in Video Coding Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework Summary & References
  11. 11. Where do we use perceptual model currently? [Pourazad, IEEE Consumer Electronics Magazine, 2012]
  12. 12. Frequency Masking for JPEG The DCT-based encoder incorporated with human visual frequency weighting(L.W Chang,2001 )Modulation Transfer Function(MTF) or Quantization Matrix(QM) we can do better with fine adjustment factor!
  13. 13. HEVC QM Design HEVC default quantization matrix Intra 8x8 QM: Uses the same QM developed for JPEG in 1999. Intra 4x4 QM: Sub-sampled from 8x8 Intra QM Intra 16x16 QM and Intra 32x32 QM: Up-sampled from 8x8 Intra QM Inter QM’s : Predicted from Intra QM’s, using the linear relationship between the Intra QM’s and the corresponding inter QM’s in AVC/H.264 [JCT-VC I012]&[L.W. Chang 2001]
  14. 14. Local Spatial-temporal contrast sensitivity of luminance perception
  15. 15. JND in the classic DCT domainTJND  n, i, j   Tbasic  n, i, j   Flum  n   Fcontrast  n, i, j   Ftemporal  n, i, j The basic threshold Spatial frequency TbasicThe luminance adaptation factor Luminance sensitivity FlumThe contrast masking factor Plane, edge, texture, etc FcontrastThe temporal modulation factor Motion, frame rate, etc Ftemporal [Zhenyu Wei,etc, IEEE T-CSVT, 2009]
  16. 16. Different Embedded Schemes [X. Yang, TCSVT, 2005] [Our, ISCAS 2010]& [TCSVT (accept)] [Z. Chen, TCSVT ,2010] & [M. Naccari,TCSVT, 2011]
  17. 17. The proposed Coding Framework Adjustment Threshold Calculation JND Calculation and Translation Adaptive EntropyInput T Q Output Suppression Coding Q-1 T-1 Intra or Inter Prediction Frame Buffer Lagrange Multiplier D= D1(Q)+D2(JND) Adaptation Motion Vector Scaling
  18. 18. Bit Saving Bitrate Reduction Against Bitrate (kbps)Sequence Preset QP JM 14.2 (%) JM 14.2 Chen’s Proposed Chen’s Proposed 20 7945.83 6889.50 5149.85 13.29 35.19 24 3165.17 2660.42 2436.40 15.95 23.02Cyclists 28 1343.73 1103.82 1138.30 17.85 15.29 32 658.92 543.16 612.40 17.57 7.06 20 25104.43 23734.86 15822.41 5.46 36.97 24 13496.66 12290.08 8843.39 8.94 34.48Harbour 28 6054.17 5336.50 4557.15 11.85 24.73 32 2909.30 2607.64 2588.25 10.37 11.04 20 20306.64 18749.84 11330.19 7.67 44.20 24 9688.57 8714.15 6239.72 10.06 35.60 Night 28 4507.60 4036.23 3430.19 10.46 23.90 32 2311.90 2088.36 2050.42 9.67 11.31
  19. 19. Bit Saving Bitrate Reduction Against Bitrate (kbps) Sequence Preset QP JM 14.2 (%) JM 14.2 Chen’s Proposed Chen’s Proposed 20 7135.21 6568.93 4147.18 7.94 41.88 24 3193.59 2850.05 2201.83 10.76 31.05 Raven 28 1537.32 1346.20 1189.10 12.43 22.65 32 803.07 705.19 710.89 12.19 11.48 20 13951.79 12986.99 7317.07 6.92 47.55 24 6472.74 5838.45 3739.43 9.80 42.23 Sheriff 28 2665.81 2361.96 1817.07 11.40 31.84 32 1159.36 1032.24 963.12 10.96 16.93 20 25071.25 21394.72 11108.62 14.66 55.69 24 7878.49 5930.58 4548.43 24.72 42.27SpinCalendar 28 2653.01 2194.53 2046.35 17.28 22.87 32 1315.22 1129.24 1177.62 14.14 10.46 Average 12.18 28.32
  20. 20. Frame DifferencesJM 14.2: QP=20 88th Frame
  21. 21. Frame DifferencesOur: QP=20 88th Frame
  22. 22. Frame DifferencesDifferences: QP=20 88th Frame
  23. 23. Frame DifferencesJM 14.2: QP=20 102nd Frame
  24. 24. Frame DifferencesOur: QP=20 102nd Frame
  25. 25. Frame DifferencesFrame Differences: QP=20 102nd Frame
  26. 26. SSIM motivated Perceptual Coding Yi-Hsin Huang, etc,. "Perceptual Rate-Distortion Optimization Using Structural Similarity Index as Quality Metric“, IEEE T-CSVT, vol. 20, no. 11, pp. 1614-1624, Nov., 2010.  Replace PNSR with SSIM  Empirically estimating Rate-SSIM model  Reuse classical Lagrange multiplier method for mode selection and motion estimation
  27. 27. Improved SSIM Perceptual Coding Shiqi Wang, etc., “SSIM-Motivated Rate- Distortion Optimization for Video Coding”, IEEE T-CSVT, Vol.22, no. 4, pp.516-529, April, 2012.  They try to get the analytical model for the Rate-SSIM relationship ChuoHao Yeo, etc., “On Rate Distortion Optimization using SSIM”, ICASSP 2012. Abdul Rehman ,etc., “SSIM-Inspired Perceptual Video Coding for HEVC”, ICME 2012. Xi Wang, etc., “Motion Based Perceptual Distortion and Rate Optimization for video Coding”, ICEM 2012
  28. 28. Basic Analysis-Completion Structure [P. Ndjiki-Nya, Signal Processing: Image Communication, 2012]
  29. 29. Abstract+Detail Framework Key Frame (Abstract+Detail) [Z. Yuan, H. Xiong and Li Song, ICASSP 2009]Abstract Only(NonKey Frame) Use ME to find matchingUse Bilateral Filtering to block to recover detailsremove details
  30. 30. Super-resolution Framework Encoder  Symmetric coding complexity  5~10% bit saving at same quality Decoder [Q. Zhou, and Li Song, IEEE PCM 2010]
  31. 31. Outline Introduction  Perceptual Cues in Video Coding Recent Research  JND based RDO  SSIM based RDO  Analysis-Completion Framework Summary & References
  32. 32. Personal Respective Can we do much better than HEVC?  Yes, new generation video coding probably will need more perceptual related techniques. Some preliminary works  “On Just Noticeable Distortion Quantization in the HEVC Codec”, JCTVC-H0477, Feb.2012  Claim 3%~25% bitrate saving at same quality.  “A joint JND model based on luminance and frequency masking for HEVC”, JCTVC-I0163, May.2012  Claim 3%~30% bitrate saving at same quality.
  33. 33. Personal Respective Future research  Advanced computational HVS model – Suprathreshold vs suberthreshold – Other masking model, like attention  Exploiting new Distortion Metric – Image statistical properties – Learning from large-scale datasets  Generic R-D Optimization – R-D relationship and RDO for video coding.
  34. 34. References Important papers  J. L. Mannnos and D. J. Sakrison, “The Effects of a Visual Fidelity Criterion on the Encoding of Images”, IEEE Trans. On Information Theory, Vol.20, No.4, July 1974.(Cited by 776)  N. Jayant, J. Johnston and R. Safranek, “Signal Compression Based on Models of Human Perception”, Proceedings of the IEEE, Vol. 81, No.10, Oct., 1993 (Cited by 761)  A Ortega, K Ramchandran, Rate-distortion methods for image and video compression, IEEE Signal Processing Magazine, Vol.15 (6), 23-50, 1998(Cited by 597)  W. Zhou, A.C. Bovik, "Mean Squared Error: love it or leave it? A new look at Signal Fidelity Measures", IEEE Signal Processing Magazine , Vol.26(1):98-117, Jan. 2009. (Cited by 353)  Ching Yang Wang, Shiuh Ming Lee, Long-Wen Chang, “Designing JPEG quantization tables based on human visual system”, Sig. Proc.: Image Comm. 16(5): 501-506, 2001.  Wenjun Zeng, Scott Daly, Shawmin Lei, “An Overview of the Visual Optimization Tools in JPEG 2000”, Sig. Proc.: Image Comm. 17: 85-104, 2002.
  35. 35. References JND related  X. Yang, W. Lin, Z. Lu, E. Ong and S. Yao, “Motion-compensated Residue Pre-processing in Video Coding Based on Just-noticeable-distortion Profile”, IEEE Trans. Circuits and Systems for Video Technology, vol.15(6), pp.742-750, June, 2005.  Z. Chen and C. Guillemot, "Perceptually-friendly H.264/AVC video coding based on foveated Just-Noticeable-Distortion model," IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 806-819, June 2010.  M. Naccari and F. Pereira, "Advanced H.264/AVC based perceptual video coding: architecture, tools and assessment", IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 6, pp. 766-782, June 2011.  M. Naccari and M. Mrak, “On Just Noticeable Distortion Quantization in the HEVC codec”, JCTVC-H0477, JCTVT 8th Meeting, San Jose, Feb., 2012  Z. Luo, Li Song, S. Zheng,"Improving H.264/AVC Video Coding with Adaptive Coefficient Suppression",IEEE International Symposium on Circuits and Systems (ISCAS 2010), May.30-June.2, 2010, France.
  36. 36. References SSIM or Other Metrics as Distortion: Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. "Perceptual Rate- Distortion Optimization Using Structural Similarity Index as Quality Metric“, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 11, pp. 1614-1624, Nov., 2010. Yi-Hsin Huang, Tao-Sheng Ou, Po-Yen Su, Chen, H.H. “SSIM-Based Perceptual Rate Control for Video Coding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol.21, No.5, pp.682-691, May, 2012. Shiqi Wang, Rehman, A, Zhou Wang, Siwei Ma and Wen Gao, “SSIM-Motivated Rate-Distortion Optimization for Video Coding”, IEEE Transactions on Circuits and Systems for Video Technology, Vol.22, no. 4, pp.516-529, April, 2012 Yeo chuoHao, Tan Huili, Tan Yihhan, “On Rate Distortion Optimization using SSIM”, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2012. Abdul Rehman and Zhou Wang, “SSIM-Inspired Perceptual Video Coding for HEVC”, IEEE International Conference on Multimedia and Expo, June 2012. Xi Wang, Li Su, Qingming Huang, Chunxi Liu, Ling-yu Duan, “Motion Based Perceptual Distortion and Rate Optimization for video Coding”, IEEE International Conference on Multimedia and Expo, 2012.
  37. 37. References Analysis-Completion Framework: Minmin Shen, Ping Xue and Ci Wang, “Down-Sampling Based Video Coding Using Super-Resolution Technique”, IEEE Transaction On Circuits and Systems for Video Technology, VOL. 21, NO. 6, pp.755-765, June, 2011 P. Ndjiki-Nya, D. Doshkov, H. Kaprykowsky, F. Zhang, D. Bull, T. Wiegand, "Perception-oriented video coding based on image analysis and completion: A review", Signal Processing: Image Communication 27 (2012) 579–594. F.Zhang,D.R.Bull,Aparametricframeworkforvideocompression using region- basedtexturemodels,IEEE Journal of Selected Topics in Signal Processing Vol.5(7):1378–1392,2011. Q. Zhou, Li Song, W. Zhang, “Video Coding With Key Frames Guided Super Resolution”, IEEE Pacific-Rim Conference on Multimedia (PCM 2010), September 21-24, Shanghai, China. Z Yuan, H. Xiong, Li Song, “Generic Video Coding With Abstraction And Detail Completion”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), April 19-24,2009, Taipei, Taiwan.
  38. 38. Thanks!

×