Genetic programming approach for non-intrusive VoIP speech quality estimation

Motivation VoIP Simulation Environment GP Experiments Test Results Conclusions
AN EVOLUTIONARY APPROACH TO SPEECH
QUALITY ESTIMATION
USING GENETIC PROGRAMMING
A. Raja1 A. Azad2 C. Flanagan1 C. Ryan2
1Wireless Access Research Centre
Department of Electronic and Computer Engineering
2Bio-Computing and Developmental Systems
Department of Computer Science and Information Sysmtems
University of Limerick, Limerick, Ireland

OUTLINE
1 MOTIVATION
The Problem of Speech Quality Assessment
Research Goal
2 VOIP SIMULATION ENVIRONMENT
Simulation System
Network Trafﬁc Characteristics
3 GP EXPERIMENTS
4 TEST RESULTS
5 CONCLUSIONS

SPEECH QUALITY ASSESSMENT METHODOLOGIES
Two approaches to speech quality Assessment
1 Subjective Assessment
2 Objective Assessment

SUBJECTIVE ASSESSMENT OF SPEECH QUALITY
Speech quality is estimated by humans.
Advantage – Reliable results.
Limitations
1 Expensive
2 Time Consuming
3 Laborious
4 Lack of Repeatability
Mean Opinion Score (MOS) is the measure of quality.
1 – bad
5 – Excellent

OBJECTIVE ASSESSMENT OF SPEECH QUALITY
A computer automated fast and reliable program is used to
assay human perception of speech quality
Two approaches:
1 Intrusive Assessment
2 Non-Intrusive Assessment

INTRUSIVE ASSESSMENT
The signal under test is compared against a corresponding
reference signal.
Advantages:
1 The most reliable artiﬁcial means of estimating speech
quality
2 Tests can be repeated easily
Limitations:
1 Consumes considerable computing resources.
2 Is not useful for continuous monitoring of quality due to
requirement of a reference signal.

ITU-T P.862 (PESQ)
PESQ algorithm is the current ITU-T Recommendation for
intrusive speech quality estimation.
The speech signal is mapped from time domain to
time-frequency representation using the psychophysical
equivalents of frequency and intensity.

ITU-T P.862 (PESQ)
It has shown a high correlation with various ITU-T
benchmark tests.
For 30 ITU-T subjective tests the Pearson’s Correlation
Coefﬁcient (R) was 0.935

NON-INTRUSIVE ASSESSMENT
A challenging problem since a reference is not available.
Two approaches exist
1 Signal-based models
2 Parametric models
Signal-based models
Recent approaches are based on emulating
1 Human speech production model
2 Psychoacoustic processing of human ear
ITU-T P.563 is the current Recommendation.

PARAMETRIC MEASUREMENT OF VOIP QUALITY
Functions of transport layer metrics and other measurable
quantities.
Cogent metrics may be:
Packet Loss Rate
Variable delay – jitter
End-to-end delay
. . .
Aimed at Real-time and continuous evaluation of quality

RESEARCH GOAL
Derivation of a VoIP listening Quality estimation model as a
function of transport layer metrics.
Genetic Programming based Symbolic Regression is used
Using the PESQ algorithm as the reference system

VOIP SIMULATION ENVIRONMENT

NETWORK TRAFFIC PARAMETERS
No. Parameter Name Abbreviation
1 Bit-rate (kbps) br
2 mean loss rate mlr
3 mean burst length mbl
4 Packetization Interval (ms) PI
5 Frame duration (ms) fd

NETWORK TRAFFIC SCENARIOS
No. Parameter Range
1 br G.729 (8 kbps), G.723.1 (6.3 kbps),
AMR 7.4 and 12.2 kbps
2 mlr [0,2.5,3.5,. . . 15,20,25,. . . 40]%
3 mbl 10, 50, 60, 70 and 80%
4 PI 10-60 ms
5 fd 10, 20, 30 ms

EXPERIMENTAL SETUP
GPLab
Four GP Experiments were performed with various
conﬁgurations
Commonalities
Each experiment constituted 50 runs
Each Run spanned 50 generations

GP EXPERIMENTS
COMMON PARAMETERS
Parameter Value
Initial Population Size 300
Selection LPP Tournament
Tournament Size 2
Genetic Operators Crossover and Subtree Mutation
Initial Operator probabilities 0.5 initial, adaptive onwards
Survival Half Elitism
Function Set +, -, *, /, sin, cos, log2, log10,
loge, sqrt, power,
Terminal Set Random numbers [0.0 . . . 1.0]
Integers [2 . . . 10]. mlrVAD,
mblVAD, PI, br, fd

GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiment 1:
Fitness function – Mean Squared Error MSE
Experiment 2:
Linear Scaling MSEs
MSEs(y, t) = 1/n
n
i
(ti − (a + byi))2
(1)
a = t − by, b =
cov(t, y)
var(y)
(2)

GP EXPERIMENTS
EXPERIMENTAL DETAILS
Experiments 3 and 4
Selection criterion based on Gustafson et al. was used
Mating takes place between dissimilar individuals
Experiment 4:
The Maximum tree depth was reduced to 7 from 17
The results were treated to Mann-Whitney-Wilcoxon Test
for signiﬁcance Analysis
Experiment 4 was found to be signiﬁcantly better overall.

ON DATA COLLECTION
Nortel ND speech database containing high quality signals
with speech from 2 male and 2 female speakers was used
for analysis.
A total of 3360 distorted speech ﬁles were created for each
combination of network trafﬁc parameters.
1177 35% were used for training
503 15% were used for testing
1680 50% were used for speaker independent validation

VOIP QUALITY MONITORING MODELS
MOS − LQOGP = −2.46 × log(cos(log(br)) + mlrVAD
×(br + fd/10)) + 3.17 (3)
MOS − LQOGP = −2.99 × cos(0.91 × sin(mlrVAD)
+mlrVAD + 8) + 4.20 (4)
Equation (3) Equation(4)
Data MSEs σ MSEs σ
Training 0.0370 0.9634 0.0520 0.9481
Testing 0.0387 0.9646 0.0541 0.9501
Validation 0.0382 0.9688 0.0541 0.9531

SCATTER PLOTS

SCATTER PLOTS
ON PERFORMANCE OF ITU-T P.563

CONCLUSIONS
1 The model is a good approximation to PESQ.
2 Suitable for real-time and non-intrusive estimation of
speech quality whereas PESQ is NOT.
3 Simple models; depend on 3 and 1 variable respectively.
4 Performs signiﬁcantly better than ITU-T P.563

FUTURE GOALS
To include wide band codecs in the research.
To develop a uniﬁed quality estimation model for narrow
and wide band telephony

QUESTIONS

Genetic programming approach for non-intrusive VoIP speech quality estimation

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Genetic programming approach for non-intrusive VoIP speech quality estimation

Similar to Genetic programming approach for non-intrusive VoIP speech quality estimation (20)

More from adil raja

More from adil raja (20)

Recently uploaded

Recently uploaded (20)

Genetic programming approach for non-intrusive VoIP speech quality estimation