seminar4

Multi-dimensional Approach to VoX Voice Quality Measurement. Sage Instruments 1
Multi-dimensional Approach to VoX
Voice Quality Measurement
Renshou Dai, Ph.D
Sage Instruments
240 Airport Blvd.
Freedom, CA 95019
December, 2003
Atlanta, Georgia

Purposes of this seminar:
• Understand what impairments aﬀect voice quality
• Learn techniques to measure those impairments
• Apply your measurements to:
1. improve Quality-of-Service (QoS)
2. arbitrate Service-Level-Agreement (SLA)
3. optimize network conﬁguration
4. troubleshoot network components

The converged voice network
• Network backbone is migrating to packet-switching,
bringing with it new problems.
• Despite convergence, voice services are getting more
chaotic, further complicating QoS issue.
• The ultimate QoS metric for converged network
should be measured from end audio terminals.
Ethernet
PSTN
switch
Telephone
PBX
analog loop
ISDN/T1
Telephone
MSC
Base
Station
Cell
Phone
ISDN/T1
I
A
DTelephone
D
S
L
A
M
ATM
switch
xDSL, Cable
MMDS,
LMDS etc
VoIP Server
IP phone
Media
Gateway
soft
switch
SS7
Packet Network Cloud
Figure 1: One side of the simpliﬁed converged voice network

Why VoP? Pros of VoP:
• Packet data network uses bandwidth more efficiently
than circuit-switched PSTN.
• Data traffic surpassed voice on landline network.
Same will happen on wireless network.
• In “old” days, data were carried over PSTN. But
PSTN is not suitable for multi-media data stream.
• Now the trend is reversing. Voice is being carried
over packet-switched data network.
• VoP also has the potential to offer sophisticated
service features.

VoP: Voice-over-Packet or Voice-of-Problem?
• Packet-switching was designed for delay-insensitive
bursty data traffic.
• Interactive real-time voice conversation requires:
1. low delay which requires RTP/UDP and precludes TCP;
2. low jitter which necessitates de-jitter buffering with long
delay;
3. no long-delayed echoes which implies more stress on echo
cancellers;
4. constant bandwidth that entices voice compression and
silence suppression with degrading clarity.
• The connectionless, best-effort IP in particular has
no QoS control to guarantee voice stream delivery.
• VoP is by no means an easy and trivial task. VoP
requires intensive testing at all stages.

VoIP’s View of the OSI 7-layer Model
• For the convenience of later discussions, let’s review
the Open-System-Interconnect 7-layer model and its
relevance to VoIP application.
7: Application
Voice processings and playout
Voice compression
Silence suppression
Echo cancellation
AC signaling detection/generation
etc
6: Presentation
5: Session
Voice frame re-assembly
Connection/termination
RTP/RTCP wrapping of voice frame
4: Transport UDP (versus TCP) transport control
3: Network IP layer packetization,addressing,routing
2: Data Link ATM/Ethernet/Frame Relay/etc
1: Physical DS1,DS3,OC1,OC3,OC12,OC48,OC192
over Fiber/Coax/UTP/Wireless
Figure 2: OSI 7-layer model and its relation to VoIP application

Forward voice processing impairments
• At Tx side, voice signal is impaired by:
1. analog distortion that aﬀects voice level;
2. digitization that introduces quantization noise;
3. compression that degrades clarity and naturalness;
4. silence suppression that may result in voice clipping;
5. compression and packetization that introduce long delay.
Voice/Audio
Signal
PCM line card:
A/D, G.711
logarithmic encoding
10010011
Quantization
Noise
PCM byte
VAD
Voice
Clipping
Vocoder
Compression
Silence
Suppression
Clarity
Degradation
and delay
Noise
"fidelity
"
1001100.......01
compressed
voice frame
IP-layer
packetization
Delay
IP Header IP data/payload ATM
Switch
Voice Frame/IP
Packet
fragmentation
HDR Data HDR Data
DS1/DS3/OC1/
OC3/OC12/
OC48/OC192
Line Driver
RF/xDSL/Cable
modems Tx
Figure 3: Forward voice processing path

Reverse voice processing impairments
• At Rx side, voice signal is impaired by:
1. packet loss that results in voice frame erasure;
2. jitter buﬀering that introduces long delay;
3. buﬀer resizing that causes jerking/gapping voice jitters;
4. CNG that generates “uncomfortable” noise.
Front-end receivers
at DS1/DS3/
OC1..OC192 rate
or
RF/cable/xDSL
modem receivers
HDR Data
ATM Cell
ATM
Switch IP Hdr IP payload
Packet
Jitter Buffer De-packetization,
reassemble voice
frames, Lost/
delayed packet
handling
Vocoder
Decoder
Comfort
Noise
Generatoer
10010011
PCM Line
Driver.
Audio Playout
Unit
PCM stream
Errored/Lost
Frame
concealment
Noise level
and
Noise
Spectrum
Long delay
and
Voice jitter
Output speech
signal
Figure 4: Reverse voice processing path

Voice quality: a simpliﬁed 3D model
• Voice quality is a combined eﬀect of the following:
1. Delay;
2. Echo;
3. Clarity.
• The 3 dimensions are orthogonal, therefore, each
of them needs to be characterized individually.
Round-trip
delay
Echo(es)
Clarity
Voice Quality
Plane
Figure 5: A 3D view of voice quality

VQ dimension I: delay
• Interactive conversation requires short delay. If
short-delay were not required, VoP would be an
easy task;
• In fact, short-delay is the most challenging require-
ment for VoP application. The fundamental task of
VoP is to balance the conflicting needs for shorter
delay and fewer packet loss;
• Long delay affects both listener and talker:
1. Long delay causes hesitation and over-talk. A caller starts noticing
delay when it exceeds 250 ms. ITU-T G.114 specifies the maximum
desired round-trip delay as 300 ms. A delay over 500 ms will make
phone conversation impractical.
2. Long delay also exacerbates echo problems as will be shown later.
0 100 200 300 400 500 600 700 800
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
MOS vs delay
MOS
Round−trip delay in ms
Figure 6: Hypothetical round-trip delay impact on MOS

Causes of long delay in VoP
• Network delay. This includes the physical trans-
mission (propagation) delay, and the store-forward-
based routing/switching delay that depends on traf-
fic congestion.
• Jitter-buffering delay. Jitter-buffer is needed to
counter against inherent packet-network jitters. Jitter-
buffering entails voice delay.
• Packet processing delay due to packetization, packet/cell
segmentation and re-assembly etc.
• Voice processing delay due to voice compression
(vocoders), silence suppression (VAD) and even
echo cancellation etc.

Measure round-trip delay
• Only round-trip delay needs to be measured as far
as VQ is concerned;
• Generally, two instruments are needed, especially
for real network test across different geographic lo-
cations;
• Once call connection is established, one instrument
starts sending special test signal on its Tx path,
and keeps measuring the delayed signal on its Rx
path. The other instrument simply provides a sig-
nal loopback with known fixed delay.
DSP-capable
round-trip
delay
measurement
instrument
Network
Cloud
Tx
Rx
Signal
loopback
device
with
fixed
known
delay
Delay
to be
measured
Figure 7: Configuration of measuring round-trip delay

Measure one-way delay
• Occasionally, one-way delay also needs to be mea-
sured. One-way delay can be measured in two
ways:
1. Simple one-box solution. If the test ports (call originator
and call terminator) are co-located (as in a lab environ-
ment), then a single test instrument can be used. This
instrument must be capable of both originating and ter-
minating a call. The Tx then sends test signal, and the
Rx measures the delay. The delay is the one-way delay
from Tx to Rx.
2. Expensive two-box solution. If one-way delay needs to be
measured through a network with test ports located at
large distance from each other, then two instruments are
needed and the two instruments must be synchronized to
a common clock (via GPS, for example).
DSP-capable
one-way
delay
measurement
instrument
Network
Cloud
Tx
Rx
Delay
to be
measured
Figure 8: Conﬁguration of measuring one-way delay in lab

DSP-based delay measurement
• Delay measurement is far more than a simple network ping.
“PING” only accounts for network delay. Other significant
delays caused by voice processing, jitter buffering and pack-
etization are not accounted for. A real delay measurement
must be performed end-to-end with audio signal employing
DSP algorithm.
• The DSP algorithm must be cross-correlation-based for max-
imum reliability across “harsh” VoP environment. The test
signal must possess the following attributes:
1. the signal must be voice-like in PSD so that it can go through the
voice transmission media and lossy vocoders without too much dis-
tortions.
2. The signal must have special auto-correlation property so as to achieve
long range measurement with fine resolution.
3. The cross correlation must be performed over large signal frames
(which requires more real-time computations) to overcome potential
packet loss effect.
0 50 100 150 200 250
−2
0
2
Tx signal
0 50 100 150 200 250
−2
0
2
Rx signal
0 50 100 150 200 250
0
0.5
1
Cross correlation
Time in ms
Figure 9: An exemplary DSP algorithm for delay measurement

VQ dimension II: echo
• Echo is a problem unique to voice communication.
• Echo has two dimensions: delay and level.
• VoP exacerbates echo problem due to:
1. longer delay. Long-delayed echo is more annoying.
2. performance limitation of embedded echo cancellers.
3. natural echo attenuation disappears on digital loop.
• Audible echoes are absolutely unacceptable in a
phone conversation. Measures must be taken to
control echo to meet the ITU-T G.131 requirement.
50 100 150 200 250 300 350 400 450 500 550 600
−60
−55
−50
−45
−40
−35
−30
−25
−20
−15
−10
Limiting Case
Acceptable
Echo Tolerance Curve
RelativeEchoLevel
Round−trip delay in ms
Figure 10: Echo tolerance curve adapted from ITU-T G.131. Vertical axis is the
echo level in dB, and horizontal axis is the echo delay (round-trip) in ms. Any
combination of echo level and delay must fall below the limiting case in order
to meet the G.131 requirement.

Causes of echoes
• Echo will exist as long as there is one last 2-w ana-
log phone in the world.
• Analog 4-w to 2-w hybrid at the end switch causes
echo. No matter how well the hybrid is balanced,
each actual 2-wire loop will have diﬀerent impedances
depending on its length, load-coil and the phones
plugged in.
• Malfunctioning echo cancellers.
• Acoustic feedback on certain speaker phones, small
cellular phones and digital phones etc.
• Echo mostly aﬀects talker. Once talker echo is un-
der control, the listener echo will automatically dis-
appear.
Analog 2-w
loop
Class 5
switch
Hybrid
Echo source
4-w analog
Network
Echo
Canceller
Digital T1/E1
Network
Figure 11: Source of echo as a result of 4-w to 2-w hybrid

Measure echoes with Echo Sounder
• Echoes must be measured from the telephony audio
interface employing complex DSP algorithm.
• Sage’s Echo Sounder measures echo delay and echo
level of multiple echoes using a Code Domain Re-
ﬂectometry based DSP algorithm. The algorithm
is super robust that it can tolerate 50% packet loss
and work under strong interference and harsh com-
pression.
• Echo Sounder is also an ideal tool to measure 2-way
and 1-way delays.
0 50 100 150 200 250 300
−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
Echoes detected by Echo Sounder
EcholevelindB
Echo delay in ms
Reference signal
First echo
Second echo
Figure 12: The TDR-equivalent results obtained through Echo Sounder that uses
CDR (Code-Domain-Reﬂectometry)

ITU-T G.168 echo canceller tests
• G.168 specifies a total of 19 tests for echo canceller
design verification. Most notable tests are:
1. Steady-state cancellation depth and capacity (tail length).
2. Cancellation convergence time.
3. Double-talk detection.
• G.168 requires two test instruments. One performs
G.168 test. The other one is an Echo Generator
that generates multiple echoes with selectable lev-
els and delays and additional double talk emulation
capability.
Echo Canceller
under test
Echo
Generator
with
Double-talk
emulation
G.168 test suite
or
Echo Sounder
Figure 13: G.168 echo canceller performance test configuration

ITU-T O.22 ATME tests
• ITU-T O.22 ATME also recommends a suite of au-
tomatic tests to verify the performances of echo
cancellers installed in a real live network:
1. Cancellation depth and tail length.
2. EC disabling and enabling.
3. Echo level discrimination (quasi-double-talk test).
• ATME also performs some transmission tests:
1. Attenuations at 3 frequencies.
2. Quiet noise, notch noise, and S/TD.
3. DS0 BERT: 56Kbps Bit-Error-Rate-Test.
• In a sense, ATME replaces 105 tests.
ATME
Director
Near-end
Echo
Canceller
under
test
Far-end
Echo
Canceller
under
test
ATME
Responder
Network
Figure 14: ATME test conﬁguration. ATME stands for Automatic Test Mea-
surement System

VQ dimension III: clarity
• Voice clarity itself is an m-D metric determined
both static and dynamic impairments.
• Static impairments are due to voice processing:
1. Lossy compression by low-bit-rate vocoders;
2. Voice clipping by VADs or vocoders;
3. Attenuation distortion that results in improper voice level;
4. Improper noise level due to CNG, BER and cross-talk.
• Dynamic impairments are unique to VoP:
1. Packet/cell/frame loss. Packet loss is largely a result of dynamic
traﬃc congestion.
2. Voice jitter. This is a sudden delay variation as a result of dynamic
resizing of jitter buﬀers.
• So, measuring clarity alone is an m-D problem.

Measure static impairments
• Psychoacoustic models. These models measure the
perceptual degradation, not the exact dimension of
each impairment:
1. ITU-T P.861 PSQM: excellent for lossy compression.
2. BT PAMS: strong in jitter-removal and equalization.
3. Draft P.862 PESQ: hybrid of PSQM and PAMS.
4. P.861 MNBs: less popular plain mathematical model.
5. Other models based on pattern recognition, neural net-
work and cepstral distance etc.
6. None of these models can account for echo and delay ef-
fects. Their validity on dynamic impairments (packet loss
and jitter) requires further study.
• Non-psychoacoustic approach:
1. attenuation distortion with 23-tone or 3-tone.
2. lossy compression and noise with SNR.
3. voice clipping and noise level with PVIT.

“PVITing” dynamic impairments
• No panacea model can summarize all impairments into one
magic number such as MOS. The actual dimension of each
impairment needs to be measured.
• Packet loss measured from the end audio-terminal is most
relevant. Packet loss measured at bit, byte and packet-layers
is hard to correlate to end-user perception.
• Voice jitter is not related to network jitter, and must be
measured from audio terminal.
• Sage’s PVIT (Packet-Voice-Impairment-Test) measures packet
loss and voice jitter from end audio interface (analog 2-w, 4-w
and T1/E1). It also measures clipping and comfort noise.
500 1000 1500 2000 2500
−1.5
−1
−0.5
0
0.5
1
1.5
Original voice signal
500 1000 1500 2000 2500
−1.5
−1
−0.5
0
0.5
1
1.5
Delayed and impaired signal
Packet loss
Missing noise
and jitter
voice clipping
voice samples
Figure 15: Sample of packet loss, jitter, improper silence noise voice clipping

SMOS for real live network testing
• For end-to-end test through a real network that
has both static (such as compression) and dynamic
impairments (such as voice jitters), SMOS provides
a complete solution.
• SMOS consists of accurate psychoacoustic core and
robust de-jittering and equalization schemes.
• SMOS performs automated real-time measurements
with robust in-band telemetry and synchronization
that makes it particularly useful for real live net-
work testing.
• SMOS provides a complete picture of your network
performance by measuring:
1. MOS, Mean-Opinion-Score in scale of 1 to 5.
2. Round-trip delay.
3. Codec type (G711PCM, G726ADPCM and Vocoder etc).
4. Voice jitters.
5. Voice level change.
6. Eﬀective bandwidth (attenuation distortion).
7. Silence noise level.

One more dimension: signaling and protocol
• Signaling refers to the signal exchanges among CPEs and switches in
order to set up and tear down a call. Protocols dictate the meaning of
those signals.
• Signaling can range from DC (oﬀ-hook current), to AC (DTMF/MF/R2
digits and call progress tones), to in-band bits (CAS-based T1 robbed-
bits) and bytes (CCS-based E1), to packets (ISDN and SS7) and to cur-
rent message-based stateless protocols (SIP, MGCP, H.323 etc). In-depth
discussion of signaling and protocol is beyond the scope of this seminar.
• But DTMF digit is a special exception. DTMF digits are not only used
for call addressing before call connection, they are also used for IVR sys-
tems during the call. VoP-based CPE, IAD and Gateway need to detect
the DTMF digits during the call, specially encode them, and “faithfully”
regenerate the digits at the other end. Such DTMF digits handling ca-
pability of course needs to be tested with DTMF transmission distortion
analysis. Important parameters are: DTMF dual-tone frequency accu-
racy, level bias, and ON-OFF duration.
• Other QoS aspects are dial tone delay (how fast one can hear the dial
tone) and call connection time (how long it takes to route the call).
0 20 40 60 80 100 120 140 160 180
−3
−2
−1
0
1
2
3
DTMF digit sequence
Time in ms
ON
OFF OFF
DTMF digit 1
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
50
100
150
Frequency−domain analysis of digit 1
Frequency in Hz
Figure 16: DTMF digit transmission distortion measurement

Application example I: QoS monitoring
• Unlike PSTN, the QoS of a VoP application de-
pends on network traffic, which varies from time
to time during a day.
• By continuously “PVITing” the traffic associated
dynamic impairments (packet loss and voice jitter)
on a given call, one can obtain crucial statistics
about network usage and QoS. The results can be
used to improve data traffic control/routing and
prioritization schemes so as to improve QoS with-
out sacrificing overall network through-put (or us-
age).
0 5 10 15 20
0
2
4
6
8
10
12
Hypothetical dynamic impairments vs local time
Occurenceofpacketlossorvoicejitterperminute
Local time in hour
Figure 17: Continuous monitoring of traffic-sensitive dynamic impairments
(packet loss and jitter) with PVIT (Packet-Voice-Impairment-Test) for 24
hours

Application example II: SLA arbitration
• SLA (Service-Level-Agreement): the essence of SLA
is a contract between service providers and cus-
tomers that quantitatively speciﬁes a reasonable
set of QoS parameters in accordance with the fees
paid.
• QoS is a multi-dimensional metric. Each of its di-
mension (delay, echo, clarity, dynamic impairments
and signaling) must be measured and tabulated to
check the conformance to prescribed contract.
QoS dimension SLA-dictated target
Round-trip delay < 250 ms
Talker echo level < −45 dB
Clarity (MOS) > 4.0
Packet loss and jitter < 5%
Dial tone delay < 500 ms
Call connection time < 3 s
Table 1: Hypothetical list of SLA (Service-Level-Agreement) elements

App. example III: configuration optimization
• A voice gateway or IAD typically has the following
configurable parameters:
1. Vocoding schemes such as G.711 PCM, G.726 ADPCM,
G.729 CS-ACELP and G.723.1 MP-MLQ.
2. Disable/enable VAD for silence suppression.
3. Packet/frame size.
4. Jitter buffer size. Static or dynamic re-sizing.
5. Disable/enable echo canceller and non-linear processor.
6. Configure echo canceller capacity (tail length) and den-
sity (number of voice channels to be cancelled).
• The optimal configuration is achieved when the fol-
lowing goals are met:
1. reasonably short delay (< 250 ms, for example).
2. absence of audible echoes.
3. decent voice intelligibility (MOS> 4.0, for example).
4. No excessive packet loss and voice jitters (< 3%, for ex-
ample).
• Of course, a test equipment is needed to verify the
configuration optimality.

App. example IV: troubleshooting guidelines
• Follow these steps when using Sage’s testing tech-
nologies.
1. Perform SMOS test and examine MOS number and codec
type.
2. If MOS number falls short of theoretical expectation,
check the analog-type of distortion indicated by effective
bandwidth, silence noise and signal loss or gain.
3. If no analog-type of distortion, perform PVIT test to
measure the amount of dynamic impairments such as
packet loss and jitters.
4. If no dynamic impairments are found, the codec imple-
mentation should be verified. For PCM and ADPCM,
use 23-tone’s SNR. For G729 and G723.1 vocoders, use
SMOS or PSQM’s MOS number along with the codec
detection.
5. Even if MOS is crystal clear, check the total amount of
jitters reported by SMOS and make sure, the less the
better.
6. Check the round-trip delay and make sure the delay is
less than 300 ms. If longer, the configurations need to be
optimized.
7. Once delay is longer than 50 ms, echo becomes a concern.
One should perform Echo Sounder test to make sure there
are no audible echoes, or the echo level and delay are
within ITU specifications.

Conclusions
• QoS is an m-D problem that requires m-D approach.
• 3 key dimensions of voice quality are delay, echo and clarity.
Each dimension needs to be characterized individually.
• Short delay is a challenging requirement for VoP application.
• Long delay in VoP exacerbates echo problem, which deserves
special care in testing.
• Clarity itself is an m-D metric determined by both static
impairments and dynamic impairments.
• Static impairments such as voice compression, clipping and
improper noise level can be measured through psychoacoustic
models.
• Dynamic impairments such as packet loss and voice jitter
need to be precisely determined from audio interface to help
optimize network usage and improve voice quality.
• DTMF digit transmission distortion needs to be analyzed to
guarantee the IVR functionality. Short dial tone delay and
short call connection time present the impression of better
service availability.
• Sage’s main test features, SMOS/PSQM, PVIT, Echo Sounder,
23-tone etc provide all dimensions of voice quality. Results
from these tests also provide enough information for calcu-
lating ITU G.107 E-model’s R rating.

seminar4

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to seminar4

Similar to seminar4 (20)

More from Renshou Dai

More from Renshou Dai (18)

seminar4