[2024]Digital Global Overview Report 2024 Meltwater.pdf
Shahid presentation
1. 1/42
Methods for Objective and Subjective Video Quality Assessment and for Speech Enhancement
Muhammad Shahid Dec 2, 2014 PhD Degree Defense
2. 2/42
Disposition
•Introduction
•Part 1
oOn classification and review of No-Reference (NR) visual quality assessment
•Part 2
oOn NR and Reduced-Reference (RR) methods of video quality assessment
•Part 3
oOn subjective methods of video quality assessment
•Part 4
oOn speech enhancement in modulation domain
3. 3/42
Introduction
Two key areas in multimedia services
•Video Quality Assessment (VQA)
•Speech Enhancement
•Video Quality
oWhat? Perceptual quality of a video
oWho? Subjective matter
oWhy? We want it better
oHow? Assessment: Subjective and Objective
Both are covered in the thesis
•Speech Enhancement
oNoisy environments; remove noise or boost speech
oEvaluate established technique, in a different domain
4. 4/42
Introduction Quality
Aristotle categorized every object of human apprehension into:
•Substance
•Quantity
•Quality (from Latin – qualitas)
•Reflection
•Place
•Time
•Position
•State
•Action
•Affection Quality (Meriam-Webster): how good or bad something is
6. 6/42
Introduction Motivation
•Video Quality: Why do we bother?
•We:
•Consumers, Service providers, Content providers etc.
•Bother: Low quality Dissatisfaction Churn
•Is that all?
Decrease quality (resources) without loosing user?
Charging more for better quality?
Provide various quality options? larger consumer base!
• Many reasons for Video Quality Assessment
7. 7/42
Introduction Video Quality
System Influence Factors that can degrade video quality
In order to avoid/minimize any degradation, its impact on quality has to be measured!
8. 8/42
Introduction Video Quality Assessment
•How to measure (assess)?
•Compared to (b) ‘original’ MSE of (a) = 42 MSE of (c) = 25
(a) (b) (c)
9. 9/42
Introduction Video Quality Assessment
•Simple pixel comparisons are not adequate
•Humans should be the assessors
•Subjective assessment (mean opinion score: MOS)
Laboratory based, standardized by ITU-T
Crowdsourcing based, loosely controlled
BUT: impracticable for many practical applications
•Objective assessment, computational models that mimic subjective assessment
’Original’ available Full-Reference (FR)
Features of ’original’ available Reduced-Reference (RR)
No access to ’original’ No-Reference (NR)
•Real-time, online, practical application scenarios Preferably NR or RR
10. 10/30
Disposition
•Introduction
•Part 1
•On classification and review of NR visual quality assessment
•Part 2
•On NR and RR methods of video quality assessment
•Part 3
•On Subjective methods of video quality assessment
•Part 4
•On Speech enhancement in modulation domain
11. 11/30
Part 1 Classification and review of NR visual quality assessment
12. 12/30
•QP: Quantization Parameter
•DCT: Discrete Cosine Transform
Part 1 Classification and Review of NR visual quality assessment
13. 13/30
•Over 170 references reviewed
•Pixel-based methods in majority; designed for images
•Many image based techniques have been adapted for videos
•Joint impact of different artifacts / methods of ’global’ quality assessment: Scarce
•NR VQA gaining interest
•P.NAMS and P.NBAMS standardized by ITU-T
•Bitstream based approaches more popular (computationally less complex but still offer competitive performance)
Part 1 Conclusions and Observations
14. 14/42
Disposition
•Introduction
•Part 1
•On classification and review of NR visual quality assessment
•Part 2
•NR and RR methods of video quality assessment
•ANN based NR method
•LS-SVM based NR method
•LASSO based NR and RR methods
•Part 3
•On subjective methods of video quality assessment
•Part 4
•On speech enhancement in modulation domain
15. 15/42
Part 2
A general framework of NR video quality prediction/estimation
16. 16/42
Part 2 Artificial Neural Network (ANN) based NR method
•ANN used in image processing, found useful in different applications
•Bitstream-based video features
•P16x16, P4x4, and P8x8 are partitioning sizes of blocks in percentage
•Avg = Average
•Perceptual Evaluation of Video Quality (PEVQ)
• Peak Signal to Noise Ratio (PSNR)
• Structural SIMilarity (SSIM)
17. 17/42
Part 2 Artificial Neural Network (ANN) based NR method
•Two layer ANN with Levenberg-Marquardt backpropagation
•H.264/AVC encoded test stimuli, QCIF resolution
•7 SRCs, 6 bitrates, 4 frame-rates = 168 for training
•5 clips from 1 SRC, 6 bitrates, 4 frame-rates = 120 for testing
18. 18/42
Part 2 Artificial Neural Network (ANN) based NR method Results
•Competitive performance was observed as compared to linear regression
•Possible improvements/extensions: MOS prediction, better regression technique…
19. 19/42
Part 2 Least Squares-Support Vector Machine (LS-SVM) based NR method
•SVM is a popular machine-learning technique for regression
•LS-SVM is computationally simpler than SVM
•Quadratic programming a set of linear equations
•Test-stimuli chosen based on Spatial and Temporal perceptual Information (SI and TI)
20. 20/42
Part 2 LS-SVM based NR method
•6 SRCs, 2 resolutions, each @ 2 frame-rates and 5 bitrates = 120
•20 sec videos, last 10s considered
•Randomly chosen 80 for training and rest for testing
•17 Bitstream-based features representing the impact of coding distortions and content characteristics
21. 21/42
Part 2 LS-SVM based NR method
•VQEG recommended performance statistics used
•LS-SVM performed slightly better or similar to ANN
•Much better than linear regression
•Was it useful to have more features?
•Which of the features are more significant?
•Impact on performance if RR features are added?
22. 22/42
Part 2 LASSO based NR and RR methods
•Least Absolute Shrinkage and Selection Operator (LASSO) regression not used for VQA before
•Offers linear solution for regression besides co-linearity removal and dimensionality reduction
•Ridge regression used for baseline performance
•In LASSO:
• The task is to minimize the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant
• For a given non-negative λ (tuning parameter) value, it solves the following minimization problem
23. 23/42
Part 2 LASSO based NR and RR methods
•144 H.264/AVC encoded videos, impaired with simulated effects of packet-loss rate of different values from École Polytechnique Fédérale de Lausanne (EPFL)
•The selected features (51) represent motion and structural contents of a video, the energy of the video signal, the impact of the packet losses, and the impact of error concealment
•Feature values were standardized (using zscore)
25. 25/42
Part 2 LASSO based NR and RR methods
•12-fold cross validation (CV) for training and testing
•Nested 10-fold CV for determining optimal λ NR RR
26. 26/42
Part 2 LASSO based NR and RR methods
Summary and Conclusions
•A variety of video features investigated for quality estimation; LASSO uses far less features but offer a performance competitive to Ridge, VQM, PEVQ, and 5 reference methods
•Reported the perceptual preference of block partitioning
•Feature selection and quality estimation performed together offer promising results
•NR approach is competitive to RR, in our case
•Future work – Evaluate for HEVC coded videos
27. 27/42
Disposition
•Introduction
•Part 1
•On classification and review of NR visual quality assessment
•Part 2
•On NR and RR methods of video quality assessment
•RR methods
•Part 3
oSubjective VQA
Low-resolution videos
Temporal, spatial, and quantization variations based videos
Adaptive streaming videos, crowdsourcing based
•Part 4
•On speech enhancement in modulation domain
28. 28/42
Part 3 Subjective VQA of Low Resolution Videos
•SRC videos selected on the basis of spatio-temporal perceptual information variety (SI & TI values)
•H.264/AVC was followed
•6 SRCs, 2 resolutions, each @ 2 frame-rates and 5 bitrates = 120 test-stimuli
29. 29/42
Part 3 Subjective VQA of Low Resolution Videos
•Subjective assessment of VQA performed in an ITU standards compliant lab
•21 subjects participated, MOS computed of 18 subjects
•Obtained results conform to previously reported trends
•Bitstreams and MOS published online, used in the study of Chapter 4 of thesis
CIF QCIF
30. 30/42
Part 3 Temporal, spatial, and quantization variations
?
32. 32/42
Part 3 Temporal, spatial, and quantization variations Results
•MOS Vs Bitrate values were plotted for all SRCs
•For low TI SRCs (Elisa, City), frame-resolution significant
•For high TI SRCs (Soccer, Ice) similar trend but to less extent
•ANOVA : Perceptual preference in the order of frame-resolution, bits per pixel, and frame-rate
33. 33/42
Part 3 Adaptive streaming videos, ”crowdsourcing” based VQA
•7 HD (1280x720) videos encoded at {5,3,1,0.6} Mbps
•Subjective assessment of Acreo lab
34. 34/42
Part 3 Adaptive streaming videos, ”crowdsourcing” based VQA
•215 workers participated, 6 removed
•Larger subject diversity than typically in lab- based VQA
35. 35/42
Part 3 Adaptive streaming videos crowdsourcing based Results
•Promising correlation with lab-based tests
•Crowdsourcing potentially an alternative?
•Verified already reported trends
•Constant (less) quality preferred over freezing events
36. 36/42
Disposition
•Introduction
•Part 1
• On classification and review of NR visual quality assessment
•Part 2
• On NR and RR methods of video quality assessment
•Part 3
• On subjective methods of video quality assessment
•Part 4
•Speech Enhancement
•Spectral center-of-gravity based demodulation
•Convex optimization based demodulation
37. 37/42
Part 4 Spectral center-of-gravity based demodulation
•Speech enhancement done by Adaptive Gain Equalizer (AGE)
•AGE boosts the speech signal, leaving noise unchanged
•Speech signal decomposed into modulator and carrier for modulation-frequency domain processing
•Modulation frequency domain processing employed in many applications
•Spectral center-of-gravity based demodulation preferred
38. 38/42
Part 4 Spectral center-of-gravity based demodulation
•Procedure includes:
•Filter bank used to get sub-bands
•Demodulation of each sub-band
•Processing of modulators
•Re-modulation of sub-bands
•Signal synthesis
Gain function of AGE
•AGE performed well in modulation domain
•Max SNRI of 9 dB obtained
39. 39/42
Part 4 Convex optimization based demodulation
•Traditional methods of demodulation may not provide unique modulator-carrier pair
•Convex optimization proven useful: SNRI, Spectral Distortion, PESQ MOS, and spectrogram analysis indicate its superiority of performance
40. 40/42
•A detailed review of recent publications in NR visual quality assessment can be instrumental for research, a handbook for experts as well as for young researchers
•Examined different techniques of regression proposed different methods of NR and RR VQA based on a variety of video features
•Lab-based and crowdsourcing based experiments contributed to subjective VQA
•Evaluation of AGE in modulation domain investigated the usefulness of modulation frequency domain
Main Contributions
42. 42/42
In physical science a first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the stage of science, whatever the manner may be. — Sir William Thomson (Lord Kelvin), 1889 *[PLA, vol. 1, "Electrical Units of Measurement", 1883-05-03]