Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
1. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
REAL-TIME, NON-INTRUSIVE EVALUATION OF
VOIP
USING GENETIC PROGRAMMING
A. Raja1 A. Azad2 C. Flanagan1 C. Ryan2
1Wireless Access Research Centre
Department of Electronic and Computer Engineering
2Bio-Computing and Developmental Systems
Department of Computer Science and Information Sysmtems
University of Limerick, Limerick, Ireland
EuroGP 2007 – 10th European conference on Genetic
Programming
2. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
Preamble
The Problem of Speech Quality Assessment
Voice Over IP
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
3. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
Preamble
The Problem of Speech Quality Assessment
Voice Over IP
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
4. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
Preamble
The Problem of Speech Quality Assessment
Voice Over IP
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
5. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
Preamble
The Problem of Speech Quality Assessment
Voice Over IP
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
6. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OUTLINE
1 MOTIVATION
Preamble
The Problem of Speech Quality Assessment
Voice Over IP
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Traffic Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS
7. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
8. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
9. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
10. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
11. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
12. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
PREAMBLE
VoIP – A paradigm shift
Bandwidth redundancy exploitation
QoS remains dominated by network/transport layer
degradations
Quality assessment ...
Reflects upon the operating conditions of the network
13. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SPEECH QUALITY ASSESSMENT METHODOLOGIES
Two approaches to speech quality Assessment
1 Subjective Assessment
2 Objective Assessment
14. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SUBJECTIVE ASSESSMENT OF SPEECH QUALITY
Speech quality is estimated by humans.
Advantage – Reliable results.
Limitations
1 Expensive
2 Time Consuming
3 Laborious
4 Lack of Repeatability
Mean Opinion Score (MOS) is the measure of quality.
1 – bad
5 – Excellent
15. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
A computer automated fast and reliable program is used to
assay human perception of speech quality
Two approaches:
1 Intrusive Assessment
2 Non-Intrusive Assessment
16. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
INTRUSIVE ASSESSMENT
The signal under test is compared against a corresponding
reference signal.
Advantages:
1 The most reliable artificial means of estimating speech
quality
2 Tests can be repeated easily
Limitations:
1 Consumes considerable computing resources.
2 Is not useful for continuous monitoring of quality due to
requirement of a reference signal.
17. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
INTRUSIVE ASSESSMENT
The signal under test is compared against a corresponding
reference signal.
Advantages:
1 The most reliable artificial means of estimating speech
quality
2 Tests can be repeated easily
Limitations:
1 Consumes considerable computing resources.
2 Is not useful for continuous monitoring of quality due to
requirement of a reference signal.
18. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
ITU-T P.862 (PESQ)
PESQ algorithm is the current ITU-T Recommendation for
intrusive speech quality estimation.
The speech signal is mapped from time domain to
time-frequency representation using the psychophysical
equivalents of frequency and intensity.
19. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
ITU-T P.862 (PESQ)
It has shown a high correlation with various ITU-T
benchmark tests.
For 30 ITU-T subjective tests the Pearson’s Correlation
Coefficient (R) was 0.935
20. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
NON-INTRUSIVE ASSESSMENT
A challenging problem since a reference is not available.
Two approaches exist
1 Signal-based models
2 Parametric models
Signal-based models
Recent approaches are based on emulating
1 Human speech production model
2 Psychoacoustic processing of human ear
ITU-T P.563 is the current Recommendation.
21. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
NON-INTRUSIVE ASSESSMENT
A challenging problem since a reference is not available.
Two approaches exist
1 Signal-based models
2 Parametric models
Signal-based models
Recent approaches are based on emulating
1 Human speech production model
2 Psychoacoustic processing of human ear
ITU-T P.563 is the current Recommendation.
22. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
OBJECTIVE ASSESSMENT OF SPEECH QUALITY
PARAMETRIC MEASUREMENT OF VOIP QUALITY
Functions of transport layer metrics and other measurable
quantities.
Cogent metrics may be:
Packet Loss Rate
Variable delay – jitter
End-to-end delay
. . .
Aimed at Real-time and continuous evaluation of quality
23. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOICE OVER IP – VOIP
Packet based communication channel
Uses wire-line speech codecs
Linear Predictive Coding (LPC) is having vogue
Coded frames are packetized into RTP/UDP
Internet is used for transportation
The receiver does the reverse process
24. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
RESEARCH GOAL
Derivation of a VoIP listening Quality estimation model as a
function of transport layer metrics.
Genetic Programming based Symbolic Regression is used
Using the PESQ algorithm as the reference system
25. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOIP SIMULATION ENVIRONMENT
PACKET LOSS SIMULATION – THE GILBERT ELLIOT MODEL
mlr = p
p+q (1)
mbl = 1
q (2)
clp = 1 − q (3)
mbl = 1
1−clp (4)
Where
mlr – mean loss rate
mbl – mean burst length
clp – conditional loss probability
26. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOIP SIMULATION ENVIRONMENT
27. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
NETWORK TRAFFIC PARAMETERS
No. Parameter Name Abbreviation
1 Bit-rate (kbps) br
2 mean loss rate mlr
3 mean burst length mbl
4 Packetization Interval (ms) PI
5 Frame duration (ms) fd
28. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
NETWORK TRAFFIC SCENARIOS
No. Parameter Range
1 br G.729 (8 kbps), G.723.1 (6.3 kbps),
AMR 7.4 and 12.2 kbps
2 mlr [0,2.5,3.5,. . . 15,20,25,. . . 40]%
3 mbl 10, 50, 60, 70 and 80%
4 PI 10-60 ms
5 fd 10, 20, 30 ms
29. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
EXPERIMENTAL SETUP
GPLab
Four GP Experiments were performed with various
configurations
Commonalities
Each experiment constituted 50 runs
Each Run spanned 50 generations
30. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
COMMON PARAMETERS
Parameter Value
Initial Population Size 300
Selection LPP Tournament
Tournament Size 2
Genetic Operators Crossover and Subtree Mutation
Initial Operator probabilities 0.5 initial, adaptive onwards
Survival Half Elitism
Function Set +, -, *, /, sin, cos, log2, log10,
loge, sqrt, power,
Terminal Set Random numbers [0.0 . . . 1.0]
Integers [2 . . . 10]. mlrVAD,
mblVAD, PI, br, fd
31. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiment 1:
Fitness function – Mean Squared Error MSE
Experiment 2:
Linear Scaling MSEs
MSEs(y, t) = 1/n
n
i
(ti − (a + byi))2
(5)
a = t − by, b =
cov(t, y)
var(y)
(6)
32. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiment 1:
Fitness function – Mean Squared Error MSE
Experiment 2:
Linear Scaling MSEs
MSEs(y, t) = 1/n
n
i
(ti − (a + byi))2
(5)
a = t − by, b =
cov(t, y)
var(y)
(6)
33. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiments 3 and 4
Selection criterion based on Gustafson et al. was used
Mating takes place between dissimilar individuals
Experiment 4:
The Maximum tree depth was reduced to 7 from 17
The results were treated to Mann-Whitney-Wilcoxon Test
for significance Analysis
Experiment 4 was found to be significantly better overall.
34. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiments 3 and 4
Selection criterion based on Gustafson et al. was used
Mating takes place between dissimilar individuals
Experiment 4:
The Maximum tree depth was reduced to 7 from 17
The results were treated to Mann-Whitney-Wilcoxon Test
for significance Analysis
Experiment 4 was found to be significantly better overall.
35. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
ON DATA COLLECTION
Nortel ND speech database containing high quality signals
with speech from 2 male and 2 female speakers was used
for analysis.
A total of 3360 distorted speech files were created for each
combination of network traffic parameters.
1177 35% were used for training
503 15% were used for testing
1680 50% were used for speaker independent validation
36. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
VOIP QUALITY MONITORING MODELS
MOS − LQOGP = −2.46 × log(cos(log(br)) + mlrVAD
×(br + fd/10)) + 3.17 (7)
MOS − LQOGP = −2.99 × cos(0.91 × sin(mlrVAD)
+mlrVAD + 8) + 4.20 (8)
Equation (7) Equation(8)
Data MSEs σ MSEs σ
Training 0.0370 0.9634 0.0520 0.9481
Testing 0.0387 0.9646 0.0541 0.9501
Validation 0.0382 0.9688 0.0541 0.9531
38. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
SCATTER PLOTS
ON PERFORMANCE OF ITU-T P.563
39. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
CONCLUSIONS
1 The model is a good approximation to PESQ.
2 Suitable for real-time and non-intrusive estimation of
speech quality whereas PESQ is NOT.
3 Simple models; depend on 3 and 1 variable respectively.
4 Performs significantly better than ITU-T P.563
40. Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
FUTURE GOALS
To include wide band codecs in the research.
To develop a unified quality estimation model for narrow
and wide band telephony