A METRIC FOR NO-REFERENCE VIDEO QUALITY ASSESSMENT FOR HD TV DELIVERY BASED ON SALIENCY MAPS H. BOUJUT*, J. BENOIS-PINEAU*, T. AHMED*, O. HADAR** & P. BONNET*** *LaBRI UMR CNRS 5800, University of Bordeaux, France **Communication Systems Engineering Dept., Ben Gurion University of the Negev, Israel ***AudematWorldCast Systems Group, France ICME 2011 – Workshop on Hot Topics in Multimedia Delivery (HotMD’11) 2011-07-11
Overview Introduction Focus Of Attention and Saliency Maps Our approach: Weighted Macro Block Error Rate (WMBER) based on saliency maps a no reference video quality metric Prediction of subjective quality metrics from objective quality metrics Evaluation and results Conclusion and future work
Introduction Motivation VQA for HD broadcast applications Measure the influence of transmission loss on perceived quality Video quality assessment protocol Full Reference (FR) SSIM (Z. Wang, A. Bovik) A novel perceptual metric for video compression (A. Bhat, I. Richardson) PCS’09 Evaluation of temporal variation of video quality in packet loss networks (C. Yim, A. C. Bovik, 2011) Image Communication 26 (2011) Reduced Reference (RR) A Convolutional Neural Network Approach for Objective Video Quality Assessment (P. Le Callet, C. Viard-Gaudin, D. Barba) IEEE Transactions on Neural Networks 17. No Reference (NR) No-reference image and video quality estimation: Applications and human-motivated design (S. Hemami, A. Reibman) Image Communication 25 (2010) In this work: NR VQA with visual saliency in H.264/AVC framework Contributions: Visual saliency map during compression process WMBER NR quality metric Prediction of subjective quality metrics from objective quality metrics
Focus of Attention and Saliency maps FOA is mostly attracted by salient areas which stand out from the visual scene. FOA is sequentially grabbed over the salient areas. Salient stimuli are mainly due to: High color Contrast Motion Edge orientation Original Frame Saliency map Tractor sequence (TUM/VQEG)
Saliency maps (1/2) Several methods for saliency map extraction already exist in the literature. All methods work in the same way [O. Brouard, V. Ricordel and D. Barba, 2009], [S. Marat, et al., 2009]: Extraction of the spatial saliency map (static pathway) Extraction of the temporal saliency map (dynamic pathway) Fusion of the spatial and the temporal saliency maps (fusion) Temporal saliency map Spatial saliency map Spatio-temporal saliency map
Saliency maps (2/2) In this work we re-used the saliency map extraction method published at IS&T Electronic Imaging 2011 : Based on the saliency map model from O. Brouard, V. Ricordeland D. Barba. Use partial decoding of H.264 stream to reach real-time performances. A fusion method to combine spatial and temporal saliency maps has been proposed. We propose a new fusion method
Saliency map fusion (1/2) We use the multiplication fusion method and the logarithm fusion method , both weighted with a 5 visual deg. 2D Gaussian 2DGauss(s) to compare with our proposed fusion method. Spatio-temporal saliency map
To produce spatio-temporal saliency map, we also propose a new fusion method Similar fusion properties as Gives more weight to regions which have both: High spatial saliency High temporal saliency Do not provide null spatio-temporal saliency when temporal saliency is very low. Saliency map fusion (2/2)
WMBER Vq metric based on saliency maps (1/3) Weighted Macro Block Error Rate (WMBER) is a No Reference metric Visual attention is focused on the saliency map Video transmission artifacts may change the saliency map We propose to extract the saliency maps on the already broadcasted disturbed video stream. WMBER also relies on MB error detection in the bit stream DC/AC and MV error detection Error propagation according to H.264 decoding process WMBER is based on: MB error detection Weighted by Saliency maps Original transmission error Propagation of transmission errors
WMBER Vq metric based on saliency maps (2/3) MB errormap & Decoder Decoded Frame Gradient energy X Σ GME SaliencyMap / Σ WMBER
WMBER Vq metric based on saliency maps (3/3) When MB errors covers the whole frame and the energy of the gradient is high: WMBER is high (near 1.0) When there no MB errors or the energy of the gradient is low: WMBER is low (near 0.0) The WMBER of a video sequence is the average WMBER of the frames.
Subjective Experiment Subjective experiment According to: VQEG Report on Validation of the Video Quality Models for High Definition Video Content (June 2010). ITU-R Rec. BT.500-11 20 HDTV (1920x1080 pixels) video sources (SRC) from : The Open Video Project: www.open-video.org NTIA/ITS TUM/Taurus Media Technik French HDTV Measure the influence of transmission loss on perceived quality 2 loss models: IP model (ITU-T Rec. G.1050) RF (Radio Frequency) model 8 loss profiles were compared 160 Processed Video Streams (PVS) 35 participants were gathered MOS values were computed for each SRC and PVS. Experiment room
We propose to use a supervised learning method to predict MOS values from WMBER or MSE This prediction method is called: Similarity-weighted average Requires a training data set of n known pairs (xi, yi) to predict y from x. Here (xi, yi) pairs are WMBERor MSE values associated with MOS values. y is the predicted MOS from a given WMBER/MSE x. The prediction is performed using (known as a weighted mean classifier): Prediction of subjective quality metrics from objective quality metrics
Evaluation and results We compare 6 objective video quality metrics: MSE WMBER using the 5 v/deg 2D Gaussian (WMBER2DGauss) WMBER using the multiplication fusion (WMBERmul) WMBER using the log sum fusion (WMBERlog) WMBER using the square sum fusion (WMBERsquare) WMBER using the spatial saliency map (WMBERsp) All metrics are computed for each 160 PVS + 20 SRC. 6data sets are built: 180 pairs Objective Metric/MOS Each data set is split in 2 equal parts: Training set and Evaluation set The Pearson Correlation Coefficient (PCC) is used for the evaluation Cross validation
Conclusion and future Work We were interested in the problem of objective video quality assessment over lossy channels. We followed the recent trends in the definition of spatio-temporal saliency maps for FOA. New no reference metric : the WMBER based on saliency maps. We bought a new solution for saliency maps fusion: the Square sum fusion. We proposed a supervised learning method to predict subjective quality metric MOS from objective quality metrics. Similarity weighted average. Gives better results than the conventional approach: polynomial fitting. We intend to improve the saliency model to better consider: Transmission artifacts Masking effect in the neighborhood of high saliency areas. We plan to evaluate the WMBER on the IRCCyN/IVC Eyetracker SD 2009_12 Database.