Real-Time Non-Intrusive Speech Quality Estimation for VoIP

Real-time Non-intrusiveReal-time Non-intrusive
Speech Quality Estimation forSpeech Quality Estimation for
VoIPVoIP
Adil RajaAdil Raja

Wireless Access Research Group, University of Limerick
OutlineOutline
• Research MilestonesResearch Milestones
• Theoretical AspectsTheoretical Aspects
• Evaluation PlatformsEvaluation Platforms
• ConclusionConclusion

Research MilestonesResearch Milestones
• Problem Statement.Problem Statement.
• Objectives.Objectives.
• Related Work.Related Work.
• Current Status of The Project.Current Status of The Project.
• Future Work.Future Work.

Problem StatementProblem Statement
• Lack of aLack of a Real-timeReal-time,, Non-intrusiveNon-intrusive
Speech Quality Estimation atSpeech Quality Estimation at mid-mid-
networknetwork points.points.

ObjectivesObjectives
• To develop aTo develop a Real-TimeReal-Time,, Non-intrusiveNon-intrusive
speech quality estimation model for VoIPspeech quality estimation model for VoIP
networks.networks.
• Particular emphasis is on effectiveness ofParticular emphasis is on effectiveness of
the model onthe model on “mid-network”“mid-network” points.points.
• The model should assess the over-all speechThe model should assess the over-all speech
quality by evaluating:quality by evaluating:
 Transport Layer metrics.Transport Layer metrics.
 Speech layer metrics.Speech layer metrics.
• Effective implementation of a perceptualEffective implementation of a perceptual
model is crucial.model is crucial.

Related WorkRelated Work
• Standards – E-Model, P.563 {ITU-T}.Standards – E-Model, P.563 {ITU-T}.
• Industry.Industry.
 PsyVoIP for gateways {Psytechnics}.PsyVoIP for gateways {Psytechnics}.
 VQMon/EP {Telechemy}VQMon/EP {Telechemy}
 3SQM {OPTICOM}3SQM {OPTICOM}
 PSM {Psytechnics}PSM {Psytechnics}
• Theoretical Research.Theoretical Research.
 Transport layer assessments.Transport layer assessments.
 Perceptual Models.Perceptual Models.
 Cognitive Models.Cognitive Models.

Current Status of TheCurrent Status of The
ProjectProject
• Transport layer metrics can beTransport layer metrics can be
captured using RTP packets and RTCPcaptured using RTP packets and RTCP
reports.reports.
• The metrics include:The metrics include:
Packet loss – From RTP packetsPacket loss – From RTP packets
Jitter – From RTP packetsJitter – From RTP packets
Round-trip-delay – RTCP-SR/RR reportsRound-trip-delay – RTCP-SR/RR reports

Current Status of TheCurrent Status of The
ProjectProject
• A perceptual model based on Perceptual LinearA perceptual model based on Perceptual Linear
Prediction (and MFCC) has been ported toPrediction (and MFCC) has been ported to
IXP2400 XScale processor.IXP2400 XScale processor.
• SOM_PAK has been ported to IXP2400 XScaleSOM_PAK has been ported to IXP2400 XScale
processor.processor.
• MicroEngine code for buffering of packets onMicroEngine code for buffering of packets on
SRAM has been done.SRAM has been done.
• The overall model design is based on a singleThe overall model design is based on a single
VoIP call.VoIP call.

Future WorkFuture Work
• Integration of Speech layer model withIntegration of Speech layer model with
transport layer model.transport layer model.
• Testing under various packet delay andTesting under various packet delay and
loss scenarios.loss scenarios.
• Evaluation of Model for low bit-rateEvaluation of Model for low bit-rate
codecs.codecs.
• Scalability testing for multiple VoIPScalability testing for multiple VoIP
calls.calls.

Theoretical AspectsTheoretical Aspects
• Packet Loss and Jitter Evaluation.Packet Loss and Jitter Evaluation.
• Effect of Packet Loss Distribution.Effect of Packet Loss Distribution.
• Unordered and Missing Packets.Unordered and Missing Packets.
• Computational Lag.Computational Lag.
• Methodology.Methodology.
• Perceptual Evaluation of Low Bit-Rate Vocoders.Perceptual Evaluation of Low Bit-Rate Vocoders.
• Self Organizing Maps.Self Organizing Maps.
• Hidden Markov Models.Hidden Markov Models.

Packet Loss and JitterPacket Loss and Jitter
EvaluationEvaluation
• Performance on mid-network points.Performance on mid-network points.
IXP2400
NPU
RTCP-SR and RTCP-RR
packets used to compute
round-trip delay.
ENDPOINT-A ENDPOINT-B
RTP PACKETS USED TO
COMPUTE THE VALUES OF
JITTER AND PACKET LOSS

RouterComputer ComputerRouter Router
Router
Router

• ReasonsReasons
 Routing Table updates.Routing Table updates.
 Traffic Engineering.Traffic Engineering.

• To Capture Packet loss and jitter from RTCP-SR/RRTo Capture Packet loss and jitter from RTCP-SR/RR
packets.packets.
• Other Advantages.Other Advantages.
 RTCP-SR/RR report fraction of packets lost over a certainRTCP-SR/RR report fraction of packets lost over a certain
interval of time.interval of time.
 This provides the mean loss rate for a call in the currentThis provides the mean loss rate for a call in the current
time frame as opposed to overall loss rate.time frame as opposed to overall loss rate.
 Some computation is offloaded from the IXP2400.Some computation is offloaded from the IXP2400.
 End-to-end transport layer metrics as opposed to end-toEnd-to-end transport layer metrics as opposed to end-to
mid network point metrics.mid network point metrics.

Effect of Packet LossEffect of Packet Loss
DistributionDistribution
• Most models assess the impact ofMost models assess the impact of
packet loss on speech quality in termspacket loss on speech quality in terms
of mean loss rate.of mean loss rate.
• Packet loss is bursty in nature.Packet loss is bursty in nature.
• Packet loss location has a variable effectPacket loss location has a variable effect
on the quality of speech. {H.on the quality of speech. {H.
Schulzrinne}.Schulzrinne}.
• The impact of packet loss distributionThe impact of packet loss distribution
should be used as a QoS metric.should be used as a QoS metric.

Unordered and MissingUnordered and Missing
PacketsPackets
• Packets arrive out of order.Packets arrive out of order.
• Some packets are lost and some takeSome packets are lost and some take
alternative paths.alternative paths.
• These factors can have adverseThese factors can have adverse
effects when acoustic back-end is aeffects when acoustic back-end is a
HMM (for instance).HMM (for instance).

Computational LagComputational Lag
T0
TN
Speech Layer Processing
Transport Layer Processing
• Perceptual Model reports the results of the pastPerceptual Model reports the results of the past
samples.samples.
• The computational lag between the speechThe computational lag between the speech
layer model and the perceptual modellayer model and the perceptual model
increases as the time progresses.increases as the time progresses.
• Some samples have to be skipped to overcomeSome samples have to be skipped to overcome
this lag.this lag.

MethodologyMethodology
• Transport Layer ModelTransport Layer Model
 Jitter, Loss, Delay.Jitter, Loss, Delay.
• Speech layer ModelSpeech layer Model
 Perceptual Model.Perceptual Model.
 Perceptual Linear Prediction.Perceptual Linear Prediction.
 Mel Frequency Cepstral Coefficients.Mel Frequency Cepstral Coefficients.
 Bark Spectral Distortion.Bark Spectral Distortion.
 Code-book of Clean Speech Feature VectorsCode-book of Clean Speech Feature Vectors
 Self-organizing Maps – Vector Quantization.Self-organizing Maps – Vector Quantization.
 Hidden Markov Models – Probabilistic.Hidden Markov Models – Probabilistic.

SRAM
Optional host
CPU, PCI
bus devices
External
Media
Device(s)
Scratchpad
Memory
PCI
Controller
Media Switch
Fabric
Interface
CAP
Hash Unit
SRAM
Controller 1
SRAM
Controller 0
DRAM
COntroller 0
DRAM
IXP2400
PCI (64 bit, 33/66 MHz)
SP14, CSIX QDR DDR
Packet Receive/
Transmit MEs
Packet Processing
MEs
SHaC
These MEs Receive
the packets from the
MSF interface and
forward them to
DRAM controller on
reception. And do the
opposite for
transmission
Parse various header
fields of VoIP
packets and Calculte
packet based Qos
Metrics and place the
results on SRAM.
They buffer the
speech frames on
SRAM on addresses
known to perceptual
model.
Intel XScale Core
The perceptual model
calculates distortions due to
encoding and bit-errors and
places the result on the
SRAM.
This module calculates the
objective score from all the
values accumulated on
SRAM
ObjectiveScore = S(s,c,e)

• At a given time a number ofAt a given time a number of
(contiguous) packets are buffered to(contiguous) packets are buffered to
be input to the perceptual model.be input to the perceptual model.
• Statistical Analysis.Statistical Analysis.

• Optimum number of packets to be buffered?Optimum number of packets to be buffered?
• Optimum buffering interval?Optimum buffering interval?
• The overall speech quality is a function of bothThe overall speech quality is a function of both
auditory distance and transport layer distortions.auditory distance and transport layer distortions.

• Assessment of Model for one VoIP callAssessment of Model for one VoIP call
scenario.scenario.
• G.711 is the preferred codecG.711 is the preferred codec
• Simulate Packet loss rate, packet lossSimulate Packet loss rate, packet loss
distribution, delay and jitter (Fine Tuning).distribution, delay and jitter (Fine Tuning).
• Analysis of low-bit rate codecs.Analysis of low-bit rate codecs.
• Scale the model for multiple VoIP calls.Scale the model for multiple VoIP calls.
• IXP2400 NPU is the target hardware platform.IXP2400 NPU is the target hardware platform.

Perceptual Evaluation of LowPerceptual Evaluation of Low
Bit-Rate Vocoders.Bit-Rate Vocoders.
• Real time speech quality estimation for low bitReal time speech quality estimation for low bit
rate codecs (G.729, G.723.1) without decodingrate codecs (G.729, G.723.1) without decoding
the frames.the frames.
• {Carmen Peláez-Moreno, Ascensión Gallardo-
Antolín, and Fernando} perform speech
recognition by only extracting the LP coefficients.

SOMSOM

SOM TrainingSOM Training

SOMSOM
• What is the average quantization error (QE)?What is the average quantization error (QE)?
• Auditory Distance = Distortion + QE.Auditory Distance = Distortion + QE.
• How to deal with QE?How to deal with QE?

SOM – Quantization ErrorSOM – Quantization Error
• SOM Discretizes data.SOM Discretizes data.

SOM – Data DistributionSOM – Data Distribution
Timo
Kostiainen

Growing Hierarchical SOMGrowing Hierarchical SOM
Layer 0
Layer 2
Layer 1

GHSOM - AdvantagesGHSOM - Advantages
• A desired level of granularity in discriminatingA desired level of granularity in discriminating
input data is achievable.input data is achievable.
• Horizontal Expansion.Horizontal Expansion.
• Vertical Expansion.Vertical Expansion.
• As the SOM is hierarchical, the searching timeAs the SOM is hierarchical, the searching time
is reduced.is reduced.
• What if a distorted signal of class A has lowerWhat if a distorted signal of class A has lower
AD with class B?AD with class B?
0.i imqe mqe
0.m mMQE mqe

Hidden Markov ModelsHidden Markov Models
• Auditory scores based on logically connectedAuditory scores based on logically connected
sequence of feature vectors.sequence of feature vectors.
• λλ = (A, B,= (A, B, ))
• A – transition probability matrix from oneA – transition probability matrix from one
phonemic class to the next.phonemic class to the next.
• B – Emission probability of a phonemic vector.B – Emission probability of a phonemic vector.
  - Initial State Probability.- Initial State Probability.
• Parameters are learnt during training.Parameters are learnt during training.

Hidden Markov ModelsHidden Markov Models
• A suitably trained HMM can beA suitably trained HMM can be
used to find auditory distance.used to find auditory distance.
• Continuous HMM.Continuous HMM.
• Reliable Results.Reliable Results.

Evaluation PlatformsEvaluation Platforms
• Cell Broad Band Engine ProcessorCell Broad Band Engine Processor
Architecture.Architecture.
• Programming the Cell.Programming the Cell.
• Some ConcernsSome Concerns

Cell Broadband ProcessorCell Broadband Processor

Cell BE Processor ……Cell BE Processor ……
Sr No Feature Qty
1 Power Processing Element (PPEs) 1
2 Synergistic Processing Elements (SPEs) 8
3 Element Interconnect Bus (EIB) 1
4 Direct Memory Access Controller (DMAC) 1
5 Rambus XDR memory controllers 2
6 Rambus File IO interface
7 PCI Express x 4
7 256 GFLOPS (Single precision at 4 GHz).
8 25 GFLOPS (Double precision at 4 GHz).

Mercury Computer SystemsMercury Computer Systems
Cell Technology Evaluation
System (CTES)

Programming the CellProgramming the Cell
• The primary language is C, C++ is also supportedThe primary language is C, C++ is also supported
to some exteent.to some exteent.
• Programming ModelsProgramming Models
 Job Queue – PPE schedules the jobs for SPEs.Job Queue – PPE schedules the jobs for SPEs.
 Self-multitasking of SPEs – kernel and scheduling isSelf-multitasking of SPEs – kernel and scheduling is
distributed across SPEs.distributed across SPEs.
 Stream Processing - The SPEs use shared memory for allStream Processing - The SPEs use shared memory for all
tasks.tasks.
• Development PlatformsDevelopment Platforms
 Cell BE Engine SDK (alpha version) {IBM} – Full systemCell BE Engine SDK (alpha version) {IBM} – Full system
simulator.simulator.
 Yellow Dog Linux {Mercury Computer Systems}.Yellow Dog Linux {Mercury Computer Systems}.

Some ConcernsSome Concerns
• Software design is key to effective performance of Cell.Software design is key to effective performance of Cell.
• Multithread execution – key to effective execution.Multithread execution – key to effective execution.
• The code has to be vectorisable and parrallisable.The code has to be vectorisable and parrallisable.
• To port the code to SPEs it has to be partitioned from rest ofTo port the code to SPEs it has to be partitioned from rest of
the code so that it is fully self-contained.the code so that it is fully self-contained.
• Hardware abstraction.Hardware abstraction.
• Learning Curve Effect.Learning Curve Effect.
 Support from IBM.Support from IBM.
 Credibility of SDK/APIs.Credibility of SDK/APIs.
 Comments from Peter Seebach.Comments from Peter Seebach.
• CostCost

AlternativesAlternatives
• XScale.XScale.
• Offloading of compute intensive tasksOffloading of compute intensive tasks
to another processor using gigabit port.to another processor using gigabit port.
• PCI with Pentium 4.PCI with Pentium 4.
• PCI with a suitable graphics card.PCI with a suitable graphics card.

Gigabit Alternative …Gigabit Alternative …
IP NETWORK
IP NETWORK
IXP2400
NPU
Workstation

ConclusionsConclusions
• Preliminary work of the model is complete.Preliminary work of the model is complete.
• Packet loss distribution.Packet loss distribution.
• Evaluation of low bit rate codecs.Evaluation of low bit rate codecs.
• Evaluation platform.Evaluation platform.
• Overall Research GoalsOverall Research Goals
 Real-time non-intrusive VoIP QualityReal-time non-intrusive VoIP Quality
assessment model.assessment model.
 Perceptual Distortion Measures.Perceptual Distortion Measures.
 Model Training.Model Training.

Thank you for Your TimeThank you for Your Time

Real-Time Non-Intrusive Speech Quality Estimation for VoIP

More Related Content

What's hot

Viewers also liked

Similar to Real-Time Non-Intrusive Speech Quality Estimation for VoIP

More from adil raja

Recently uploaded

Real-Time Non-Intrusive Speech Quality Estimation for VoIP