This document presents research using genetic programming to develop non-intrusive models for estimating voice over IP (VoIP) quality. Researchers used a VoIP simulation environment to generate distorted speech files under different network conditions and trained genetic programs to map transport layer metrics like packet loss and delay to mean opinion scores. The best models achieved good accuracy compared to the intrusive PESQ standard with only 1-3 variables, making them suitable for real-time VoIP quality monitoring. Future work aims to include wideband codecs and develop a unified quality estimation model.
Genetic programming approach for non-intrusive VoIP speech quality estimation
1. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
AN EVOLUTIONARY APPROACH TO SPEECH
QUALITY ESTIMATION
USING GENETIC PROGRAMMING
A. Raja1 A. Azad2 C. Flanagan1 C. Ryan2
1Wireless Access Research Centre
Department of Electronic and Computer Engineering
2Bio-Computing and Developmental Systems
Department of Computer Science and Information Sysmtems
University of Limerick, Limerick, Ireland
2. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
3. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
4. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
5. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
6. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
7. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SPEECH QUALITY ASSESSMENT METHODOLOGIES
Two approaches to speech quality Assessment
1 Subjective Assessment
2 Objective Assessment
8. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SUBJECTIVE ASSESSMENT OF SPEECH QUALITY
Speech quality is estimated by humans.
Advantage – Reliable results.
Limitations
1 Expensive
2 Time Consuming
3 Laborious
4 Lack of Repeatability
Mean Opinion Score (MOS) is the measure of quality.
1 – bad
5 – Excellent
9. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
A computer automated fast and reliable program is used to
assay human perception of speech quality
Two approaches:
1 Intrusive Assessment
2 Non-Intrusive Assessment
10. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
INTRUSIVE ASSESSMENT
The signal under test is compared against a corresponding
reference signal.
Advantages:
1 The most reliable artificial means of estimating speech
quality
2 Tests can be repeated easily
Limitations:
1 Consumes considerable computing resources.
2 Is not useful for continuous monitoring of quality due to
requirement of a reference signal.
11. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
INTRUSIVE ASSESSMENT
The signal under test is compared against a corresponding
reference signal.
Advantages:
1 The most reliable artificial means of estimating speech
quality
2 Tests can be repeated easily
Limitations:
1 Consumes considerable computing resources.
2 Is not useful for continuous monitoring of quality due to
requirement of a reference signal.
12. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
ITU-T P.862 (PESQ)
PESQ algorithm is the current ITU-T Recommendation for
intrusive speech quality estimation.
The speech signal is mapped from time domain to
time-frequency representation using the psychophysical
equivalents of frequency and intensity.
13. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
ITU-T P.862 (PESQ)
It has shown a high correlation with various ITU-T
benchmark tests.
For 30 ITU-T subjective tests the Pearson’s Correlation
Coefficient (R) was 0.935
14. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
NON-INTRUSIVE ASSESSMENT
A challenging problem since a reference is not available.
Two approaches exist
1 Signal-based models
2 Parametric models
Signal-based models
Recent approaches are based on emulating
1 Human speech production model
2 Psychoacoustic processing of human ear
ITU-T P.563 is the current Recommendation.
15. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
NON-INTRUSIVE ASSESSMENT
A challenging problem since a reference is not available.
Two approaches exist
1 Signal-based models
2 Parametric models
Signal-based models
Recent approaches are based on emulating
1 Human speech production model
2 Psychoacoustic processing of human ear
ITU-T P.563 is the current Recommendation.
16. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
PARAMETRIC MEASUREMENT OF VOIP QUALITY
Functions of transport layer metrics and other measurable
quantities.
Cogent metrics may be:
Packet Loss Rate
Variable delay – jitter
End-to-end delay
. . .
Aimed at Real-time and continuous evaluation of quality
17. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
RESEARCH GOAL
Derivation of a VoIP listening Quality estimation model as a
function of transport layer metrics.
Genetic Programming based Symbolic Regression is used
Using the PESQ algorithm as the reference system
18. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOIP SIMULATION ENVIRONMENT
19. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
NETWORK TRAFFIC PARAMETERS
No. Parameter Name Abbreviation
1 Bit-rate (kbps) br
2 mean loss rate mlr
3 mean burst length mbl
4 Packetization Interval (ms) PI
5 Frame duration (ms) fd
20. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
NETWORK TRAFFIC SCENARIOS
No. Parameter Range
1 br G.729 (8 kbps), G.723.1 (6.3 kbps),
AMR 7.4 and 12.2 kbps
2 mlr [0,2.5,3.5,. . . 15,20,25,. . . 40]%
3 mbl 10, 50, 60, 70 and 80%
4 PI 10-60 ms
5 fd 10, 20, 30 ms
21. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
EXPERIMENTAL SETUP
GPLab
Four GP Experiments were performed with various
configurations
Commonalities
Each experiment constituted 50 runs
Each Run spanned 50 generations
22. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
COMMON PARAMETERS
Parameter Value
Initial Population Size 300
Selection LPP Tournament
Tournament Size 2
Genetic Operators Crossover and Subtree Mutation
Initial Operator probabilities 0.5 initial, adaptive onwards
Survival Half Elitism
Function Set +, -, *, /, sin, cos, log2, log10,
loge, sqrt, power,
Terminal Set Random numbers [0.0 . . . 1.0]
Integers [2 . . . 10]. mlrVAD,
mblVAD, PI, br, fd
23. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiment 1:
Fitness function – Mean Squared Error MSE
Experiment 2:
Linear Scaling MSEs
MSEs(y, t) = 1/n
n
i
(ti − (a + byi))2
(1)
a = t − by, b =
cov(t, y)
var(y)
(2)
24. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiment 1:
Fitness function – Mean Squared Error MSE
Experiment 2:
Linear Scaling MSEs
MSEs(y, t) = 1/n
n
i
(ti − (a + byi))2
(1)
a = t − by, b =
cov(t, y)
var(y)
(2)
25. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiments 3 and 4
Selection criterion based on Gustafson et al. was used
Mating takes place between dissimilar individuals
Experiment 4:
The Maximum tree depth was reduced to 7 from 17
The results were treated to Mann-Whitney-Wilcoxon Test
for significance Analysis
Experiment 4 was found to be significantly better overall.
26. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiments 3 and 4
Selection criterion based on Gustafson et al. was used
Mating takes place between dissimilar individuals
Experiment 4:
The Maximum tree depth was reduced to 7 from 17
The results were treated to Mann-Whitney-Wilcoxon Test
for significance Analysis
Experiment 4 was found to be significantly better overall.
27. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
ON DATA COLLECTION
Nortel ND speech database containing high quality signals
with speech from 2 male and 2 female speakers was used
for analysis.
A total of 3360 distorted speech files were created for each
combination of network traffic parameters.
1177 35% were used for training
503 15% were used for testing
1680 50% were used for speaker independent validation
28. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOIP QUALITY MONITORING MODELS
MOS − LQOGP = −2.46 × log(cos(log(br)) + mlrVAD
×(br + fd/10)) + 3.17 (3)
MOS − LQOGP = −2.99 × cos(0.91 × sin(mlrVAD)
+mlrVAD + 8) + 4.20 (4)
Equation (3) Equation(4)
Data MSEs σ MSEs σ
Training 0.0370 0.9634 0.0520 0.9481
Testing 0.0387 0.9646 0.0541 0.9501
Validation 0.0382 0.9688 0.0541 0.9531
30. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SCATTER PLOTS
ON PERFORMANCE OF ITU-T P.563
31. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
CONCLUSIONS
1 The model is a good approximation to PESQ.
2 Suitable for real-time and non-intrusive estimation of
speech quality whereas PESQ is NOT.
3 Simple models; depend on 3 and 1 variable respectively.
4 Performs significantly better than ITU-T P.563
32. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
FUTURE GOALS
To include wide band codecs in the research.
To develop a unified quality estimation model for narrow
and wide band telephony