1. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
TOWARDS GLITCH-FREE
VOIP AND VIDEO CONFERENCING
JIN LI
MICROSOFT RESEARCH
Outline
2
Introduction
Anatomy of VoIP and Video Conferencing Systems
Audio/Video Components
Network Components
Summary
Jin Li, Microsoft Research 1
2. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
3 Introduction
Booming of IP Based Communication
4
Advanced voice over IP (VoIP)
Web-, audio-, video-conferencing
Tele-presence
Instant messaging
Calendar and other PIM functions
Email, fax and voice mail
Jin Li, Microsoft Research 2
3. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Worldwide VoIP subscribers
5
• Worldwide VoIP service revenue was $24.1B in 2007, up 52% over 2006.
• It is expected that worldwide VoIP service to more than double over the next 4 years, to
$61.3B in 2011, with an annual growth rate of 26%.
Source: 2008 Infonetics Research Inc,
US Broadband Telephony Forecast,
6
2007-2013
VoIP subscriber base are predicted to double from 2007 to 2013.
Source: Jupiter Research, US Broadband Telephony Forecast, 2008 to 2013
Jin Li, Microsoft Research 3
4. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
VoIP Trend
7
IP networks are the next gen networks for all forms of
communication.
Broadband penetration is a key driver of VoIP expansion
Worldwide DSL subscriptions were at 205.9M at the end of
2007, up 23% from 2011. It is predicted to increase to 363.6M
in 2011.
Cable subscriptions were up 15% annually to 68M at the end of
2007, climbing to 97.3M in 2011.
Passive Optical Network (PON) subscribers were at 10.9M in
2007
Ethernet FTTH subscribers were at 1.7M in 2007
2004/2005 are breakthrough years for VoIP adoption
High End Systems – Tele-Presence
8
Cisco Telepresence $299K Tandberg Experia $225K
HP Halo $425K + $18K/mo Polycom RPX210M $269K + $18.5K/mo
Jin Li, Microsoft Research 4
5. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Worldwide Tele-presence Forecast
9
(2006-2012)
# of end points
Revenue forecast
Source: 2008 IDC Research
Desktop Video Conferencing
10
Multiple solutions, often acted as add on to VoIP
Benefit
See faces of people you may not have met before
See facial expressions & gestures
Easier to follow a conversation
More interactive than phone
Get the general mood of ambience
See and show documents/objects
Drawback
Difficult to setup and planning
Network reliability
Without(or poor) video, people talk; without(or poor) audio, people walk.
Interpersonal factors
Jin Li, Microsoft Research 5
6. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
11
Anatomy of VoIP and Video
Conferencing Systems
Infrastructure vs. P2P
12
Infrastructure based P2P based
Microsoft Unified Skype
Communication
Cisco
Gtalk
Jin Li, Microsoft Research 6
7. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
13
Infrastructure Based VoIP:
Microsoft Unified Communication
Unified Communication: Architecture
14
Jin Li, Microsoft Research 7
8. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Unified Communication: P2P Call
15
Key Steps
16
Alice calls Bob
Find Bob’s registered SIP endpoints
Jin Li, Microsoft Research 8
9. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Unified Communication: To VoiceMail
17
Key Steps
18
Alice calls Bob
Find Bob’s registered SIP endpoints
Bob doesn’t answer after a certain period, call re-routes
Voicemail system plays a greeting, records Alice’s msg, send the msg
to Bob’s email, and use speech server to transcribe the msg
Jin Li, Microsoft Research 9
10. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Unified Communication: PSTN UC
19
Key Steps
20
PSTN user Alice calls Bob
IP-PSTN gateway terminates the call
MS/Gateway routes call to mediation server, which
performs transcoding & ICE, etc..
Through director, the proper UC client is found
Jin Li, Microsoft Research 10
11. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
21 P2P VoIP: Skype
P2P VoIP: Skype
22
Information
Debut: 08/2003, by N. Zennstrom and J. Friis, who
founded KaZaA
A P2P overlay network for VoIP and other app
Free intra-net VoIP and fee-based
SkypeOut/SkypeIn
Jin Li, Microsoft Research 11
12. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Skype Usage (Apr. 2008)
23
11 million concurrent Skype users on line in peak time
(180,000+ simultaneous calls)
309 million registered users worldwide, the largest
registered user base within eBay portfolio (33 million
added users for Q1FY08)
$126M revenue in Q1FY08 (61% YOY growth, 5.6
billion SkypeOut minutes in FY2007)
100 billion cumulative Skype-to-Skype minutes
Skype Share of International VoIP
24
Traffic
Jin Li, Microsoft Research 12
13. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Skype Gadget
25
IPDRUM mobile Skype
Cable
Motorola CN620 IPEVO Free-1
WiFi Cellphone USB Skype Phone
Netgear Skype
Wi-Fi Phone
USB Mouse with Phone
50 hardware partners, 150+ Skype certificated device.
Skype vs. VoIP
26
Public VoIP standard
H.323, SIP
Skype is a proprietary VoIP solution
Rely on P2P network for user directory
Scalable without costly infrastructure
Route calls through supernodes in Skype
Universal firewall/NAT traversal
Encrypted traffic (but you have to trust eBay/Skype)
Jin Li, Microsoft Research 13
14. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Skype Ingredient (1)
27
User retrieves ID from
a skype server
Skype Network
28
Skype
Server
authentication
Supernode Overlay:
any computer w/ sufficient CPU, memory
& network bw & not behind firewall
For distributed directory service
Relay traffic for computer behind
NAT/firewall
Jin Li, Microsoft Research 14
15. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
NAT Traversal (Skype)
29
NAT/Firewall detection
Try UDP connection
Try TCP connection (arb port, 80 (http), 443(https) )
Traversal
Direct connection if a) both clients have no NAT, b) one
client has no NAT, and one behind cone-NAT
Relay by supernode otherwise
Since Skype doesn’t need to pay for relay cost
High bitrate wideband voice codec (>24kbps)
Skype : Call Routing Through Supernode
30
Skype
Server
authentication
Supernode Overlay:
Route call through
supernodes
High bitrate wideband voice
codec (>24kbps)
Jin Li, Microsoft Research 15
16. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Skype Encryption
31
Peer 1
Peer 2
256-bit AES over 128 bit data block
1536/2048 RSA for key negotiation (2048/2048
for paid service)
Skype: Complete Black box
(Security by Obfuscation )
32
Almost everything is obfuscated
Many protections, anti-debugging tricks, ciphered code
Avoid static disassembly: xor binary with a hard-coded key,
erasure beginning of the code, own packer
Code integrity check: use checksum to avoid breakpoint
Anti-debugging technique: anti softice, integrity check
Code obfuscation
Network obfuscation
Jin Li, Microsoft Research 16
17. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
33 Audio/Video Component
Audio/Video Component
34
Audio Codec
Video Codec
Acoustic Echo Cancellation
Jin Li, Microsoft Research 17
18. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
35 Audio Codec
G.711 (PCM)
Still widely used today: PSTN interface
If uniform quantization
12 bits * 8 k/sec = 96 kbps
Non-uniform quantization
65 kbps DS0 rate
North America: µ-law
Other countries: A-law
MOS of about 4.3
µ = 255 , A = 87.6
Jin Li, Microsoft Research 18
19. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
G.722.1: Siren
Audio bandwidth: 14 kHz
Sample rate: 32 kHz
Bit rate: 24, 32, and 48 kbit/s
Algorithm: Transform coding (Siren14TM)
Frame size: 20 ms
Algorithmic delay: 40 ms
Complexity: <11 WMOPS (enc/dec)
Available on royalty-free licensing terms (from Polycom)
Siren Encoder
Jin Li, Microsoft Research 19
20. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Siren Decoder
39
Siren Codec
Audio sampled at 32kHz
Operates on frames of 20 ms corresponding to 640
samples
Based on transform coding, using a Modulated
Lapped Transform (MLT)
A Look-ahead of 20 ms due to 50% overlap between
frames
Total algorithmic delay of 40 ms
Jin Li, Microsoft Research 20
21. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
MLT - Modulated Lapped Transforms
41/75
Spatial Response Frequency Domain
Categorization & SQVH
42
Quantization Used by SQVH
Expected # of Bits For Each Category
Vector Property Used in SQVH
Jin Li, Microsoft Research 21
22. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
AMR-WB Basics
“Wideband coding of speech at around 16kbit/s using
adaptive multi-rate wideband (AMR-WB)”
Adopted as ITU-T G722.2, and also as 3GPP spec TS
26.190.
“Foreseen applications are: VoIP and internet
applications, Mobile Com., PSTN app, ISNDN wideband
telephony, ISDN videophone and videoconf.”
Sampling rate 16KHz;
Bitrate: 6.60, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85,
23.05, and 23.85 kbit/s.
20 ms frame.
ACELP (algebraic code excited LPC).
Pre-processing
Sampling rate conversion: 16 to 12.8KHz; (now a
20ms frame has 256 samples…)
HP filter (cut off @ 50Hz)
Pre-emphasis filter ( 1 -.68 z-1 )
Jin Li, Microsoft Research 22
23. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
LP analysis and Quant.
One 30 ms asymmetric window
5 ms look-ahead
Obtain LPC Coef.:
Compute correlation;
Multiply by window (add 60HZ BW expansion);
R(0) = 1.0001R(0) ( adds 40dB noise floor);
levinson-durbin to compute LP coefficients.
LP to ISP
Quantize in ISP q-domain.
LP analysis and Quant. (2)
Quantization bottom line:
46 bits/frame on most modes;
36 bits/frame on 6.60 Kbps mode;
M.A. prediction with 1/3 gain;
Quantizer: S-MSVQ (split multistage VQ)
Both quantized and unquantized coefs will be used in
algorithm.
Jin Li, Microsoft Research 23
24. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
subframes
Each 20ms (256 samples) frame is divided in 4 sub-
frames (64 samples each).
Interpolated LPC coefficients obtained for each sub-
frame
Interpolation done in ISP q-domain
Perceptual weighting
Weighting filter is:
W(z) = A(z/γ1).Hde-emph(z)
This helps solving the tilt problem, which is worse in
WB speech.
Jin Li, Microsoft Research 24
25. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Excitation
Searched for each 5ms sub-frame.
Two components:
Adaptive codebook (past excitation)
Algebraic codebook
“target” signal obtained by filtering the LPC residual
(for the sub-frame) through the synthesis LPC filter
and weighting filter.
Adaptive codebook
Start with “open loop” pitch estimation
based on cross correlation;
Low-value bias;
‘last value’ value bias (actually 5-frame median), if voiced.
Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.
“Analysis by synthesis” based;
Restrict to values allowed by encoding step.
Start with “open loop” pitch estimation
based on cross correlation;
Low-value bias;
‘last value’ value bias (actually 5-frame median), if voiced.
Re-compute with “closed loop”, around initial value ±7, and up to ¼ sample precision.
“Analysis by synthesis” based;
Restrict to values allowed by encoding step.
Jin Li, Microsoft Research 25
26. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Algebraic codebook
Remove contribution of (unquantized) prediction
from adaptive codebook from the “target signal”
to obtain new target.
Divide sub-frame into 4 alternating tracks.
Algebraic codebook (2)
Select best pulses, for a total of 24 (6),
18(5-4), 16 (4), 12(3), 10(3-2), 8(2), 4(1), 2(.5),
depending on bitrate.
Pulses + Two filters:
Periodicity enhancement: 1/(1-.85z-T);
Tilt: 1/(1- β1 z -1)
Tricks to save bits in encoding pulse position;
Tricks to save computation on pulse search.
Jin Li, Microsoft Research 26
27. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Wrap up
High pass, de-emphasis;
Upsample back to 16KHz;
Add high frequency components.
High Freq. Components
Random noise used as excitation
LP filter is extended to 8KHz.
Energy of excitation based on energy of base-band
residual, and voicing info, except in highest bitrate
mode.
Extension of LPC filter is equivalent to mapping 5.1 to
5.6Khz to 6.4 to 7.0KHz;
Band-pass filtered to 6-7KHz, and added to output
signal.
Jin Li, Microsoft Research 27
28. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
55 Video Codec
H.264/AVC Encoder
56
Jin Li, Microsoft Research 28
29. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
H.264/AVC Decoder
57
Reference Picture Management
58
Reference pictures are stored in decoded picture buffer (DPB)
Short/long term reference picture, a decoded frame may be
marked as
unused for reference
short term picture
long term picture
Sliding Window” memory management
Keep #(long_term_pic+ short_term_pic)
Remove short term picture if lack of space
Adaptive memory control
issued by encoder
change the type of the ref frame
IDR (Instantaneous Decoder Refresh)
clear ref buffer
I frame
Jin Li, Microsoft Research 29
30. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Slice Group
59
Former called “FMO” (Flexible Macroblock
Ordering)
A subset of the macroblocks and may contain one or
more slices
Error resilience
Inter Prediction
60
Variable block size
¼ pixel motion compensation
Interpolation
Jin Li, Microsoft Research 30
31. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Motion Vector (MV) Prediction
61
Efficiently encode correlated MV
Other than 16×8 and 8×16, MVp=(MVA+MVB+MVC) /3
16×8, MVp of the upper =MVB ;MVp of the lower =MVA
8×16, MVp of the left =MVA ;MVp of the right =MVC
For skipped macroblocks, do as 16 × 16 Inter mode
Intra Prediction
62
For Luma samples
4*4 block: 9 prediction modes
16*16 block: 4 modes
I_PCM: transmit the encoded samples w/o pred. &
trans
Jin Li, Microsoft Research 31
32. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Prediction Modes
63
4x4 Luma
Intra 16x16
8x8 Chroma is similar to 16x16 luma intra
Signaling of Intra Prediction Modes
64
Mode choices need to be signaled to the decoder, but compactly
The prediction mode for luma coded in Intra-16 16 mode or
×
chroma coded in Intra mode is signaled in the macroblock header
Intra modes for neighboring 4 4 blocks are often correlated
×
B
A C
If A and B are available, C = min (A,B)
else if (neither A nor B are available) C = 2 (DC)
else C = available (A,B)
Use prev_intra4x4_pred_mode flag & rem_intra4x4_pred_mode
flag to indicate mode selected.
Jin Li, Microsoft Research 32
33. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Deblocking filter
65
Filter 4 vertical/horizontal boundaries of luma
Filter 2 vertical/horizontal boundaries of chroma
Affect up to 3 samples on the either side.
The filter is stronger at places where there is likely to be
significant blocking distortion
e.g.: such as the boundary of an intra coded macroblock or a boundary
between blocks that contain coded coefficients.
Transform and Quantisation
66
3 transforms
DCT-base transform for all 4*4 residual block
a=1/2, b = (2/5)1/2, d = 1/2
Hadamard transform for 4*4 luma DC coefficient (in
16*16 intra)
Hadamard transform for 2*2 chroma DC coefficient
Jin Li, Microsoft Research 33
34. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Combine Quantization into Scaling
67
of Transform
4x4 DC Intra Luma
|ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)
sign (ZD(i, j)) = sign (YD(i, j))
|ZD(i, j)| = (|YD(i, j)| MF(0,0) + 2f ) >> (qbits +1)
sign (ZD(i, j)) = sign (YD(i, j))
CAVLC: Context-Based Adaptive
68
Variable Length Coding
Characteristics:
Run-level coding to compact zero string
Trailing ones (+1, -1 after 0)
Number of nonzero coefficient in neighboring blocks is
correlated
Choice VLC lookup table for level parameter for level
magnitude
Jin Li, Microsoft Research 34
35. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
CAVLC Encoding
69
1. Encode the number of coefficients and trailing ones (coeff token)
TotalCoeffs : 0 ~ 16
TrailingOnes : 0 ~ 3
if more than 3 TrailingOnes, only last three are treated as ‘special cases’
Four look up table
Three variable-length, one fixed-length
Choice depend on neighboring blocks
2. Encode the sign of each TrailingOne: In reverse order
3. Encode the levels of the remaining nonzero coefficients
level_prefix, level_suffix
4.Encode the total number of zeros before the last coefficient
Zero-runs at start of the array need not to be encoded
5. Encode each run of zeros
If less then 3 TrailingOnes, the first nonzero coefficient is adjusted
70 Acoustic Echo Cancellation
Jin Li, Microsoft Research 35
36. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Acoustic Echo Cancellation
71
From Audio
Decoder
To Audio
Encoder
Acoustic Echo Cancellation
Acoustic Echo Cancellation Module
72
Jin Li, Microsoft Research 36
37. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Adaptive Traversal Filter
73
FIR filter – inherently stable
Length of the filter affects other performance, convergence,
goodness, and complexity.
Filter introduces errors since it is trying to model IIR response.
Short Filters
128 – 256 coefficients (taps)
Faster convergence, but final solution has more residual error
Less complex O(N).
Long Filters
512-1024
Slower convergence, but final solution has less error.
More complex, as algorithm can be O(N2)
Challenges
74
Dynamic range of the human ear = 120dB.
Even quiet echoes can be heard.
Longer delays from satellite (300-500ms), VoIP
Ear is more sensitive to longer delays.
More difficult to find the beginning of the echo.
Long filters (~1000 taps) are needed (complexity &
convergence)
Near-end noise: corrupt the echo, decreasing the
cancellers ability to converge.
Acoustic echo paths can change rapidly
More difficult for the AEC to remain converged.
Nonlinear echo components
Speakers driven beyond linear region.
Jin Li, Microsoft Research 37
38. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
75 Network Component
IP-based VoIP / Video Conference
76
Jin Li, Microsoft Research 38
39. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
77 Internet Primer
Internet : Grand View
78
Jin Li, Microsoft Research 39
40. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Impact on ISPs
79
Economics of ISP relationships
transit peering entity
sibling relationship
boundary several ISPs belong to same org
peering
peering relationship
mutual beneficial free
agreement (to certain extent)
sibling sibling entity transit relationship
boundary
one ISP pays another
Inside ISP
80
Jin Li, Microsoft Research 40
41. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
ISP POP (Point of Presence)
81
Home Networking
82
Jin Li, Microsoft Research 41
42. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
83 Network Characteristics
Under-provisioned Links
84
Branch Branch
Jin Li, Microsoft Research 42
43. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Growth Trends
85
Packet Loss vs. Jitter (vs. Delay?)
86
Jin Li, Microsoft Research 43
44. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
The Usual Suspects
87
Packet Bursts
88
Jin Li, Microsoft Research 44
45. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
What kind of Enterprise User?
89
How QoS can help
90
Jin Li, Microsoft Research 45
46. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
QoS helps inside and between
91
branches!
Observation
92
IP-based communication in the enterprise is growing
Empirical results show poor calls for Wireless and
VPN users
QoS (DiffServ) is both used and useful!
Jin Li, Microsoft Research 46
47. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
93 Available Bandwidth Estimation
What is Available Bandwidth (ABW)?
94
ABW is the left-over capacity along an Internet
path
Jin Li, Microsoft Research 47
48. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Why Is It Useful?
Maximizing QoE (Quality of Experience) in A/V
conferencing
Audio prefers minimum delay (high priority)
Video prefers maximum rate (low priority)
One Way Delay (OWD) = propagation delay (constant) + queuing delay (variable)
One solution: measure ABW, encode and send
video at the ABW rate
Typical Targeting Scenario
First hop is the bottleneck
Cable modem, DSL, high-speed link…
Timescale for the ABW estimation: 2-4 seconds
Jin Li, Microsoft Research 48
49. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Why Is Measuring ABW Hard?
Available bandwidth changes over time
ABW measurements must be quick
Audio packets (along the same path) should
experience minimum delay
Measurement must be non-intrusive
Two Models
Probe Rate Model (PRM) based solutions
Pathload, TOPP, Pathchirp, Bfind, PTR …
Probe Gap Model (PGM) based solutions
Spruce, Delphi, IGI, Moseab …
Jin Li, Microsoft Research 49
50. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Pathload (PRM) [Jain & Dovrolis]
Send probe trains at various rates
ABW is the probe rate at transition, where OWD is
increasing (queuing delay is observed)
Spruce (PGM) [Jacob et. al.]
Send probe pairs/train at Ri (Ri > A), measure
sending gaps and receiving gaps
Compute A directly
Jin Li, Microsoft Research 50
51. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Advantage/Disadvantages of The
Approaches
Advantages Disadvantages
PGM based Fast estimation: Assumptions are not easy
approaches to verify in practice
Estimation can be done in
single probe.
PRM based No assumption Slow estimation:
approaches
iterative probes
102 Forward Error Correction
Jin Li, Microsoft Research 51
52. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Block Based Erasure Resilient Coding
103
Original data: 1 2 3 k k messages
ERC: 1 2 3 k k+1 n
At a certain
instance X X X X
X X
Some of the blocks may be lost in delivery. However, as long as there
are at least k blocks delivered, the original data can be reconstructed.
ERC in VoIP and Video Conferencing
104
VoIP
Mainly packet replication, due to small VoIP packet size
& low delay requirement
Video Conferencing
Packet loss protection (for I frame or P frame in HD)
Each frame is separate into k msg, and protect by n-k
msg. As long as there are less than n-k loss, the
transmission succeeds
Jin Li, Microsoft Research 52
53. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
ERC Terms
105
Number of Original Block: k
Number of Coded Block: n
Rate of ERC: k/n
MDS: Maximum Distance Separable
Any k of n coded block may recover the original
The theoretical optimal performance
Erasure Encoding: Mathematics
Original data: x1 x2 xk
Coded data: y1 y2 yn
: Vectors on Galois Field.
106
Jin Li, Microsoft Research 53
54. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Example: ERC of 10MB
Original data x1 x2 xk k=10, GF(28), each vector is 1MB.
(10MB):
Coded data: y1 y2 yn
(n=30)
30
10 1M 1M
107
Erasure Decoding: Mathmatics
108
Original data: x1 x2 xk
Coded data: y1 y2 yn
Available
Code select
Jin Li, Microsoft Research 54
55. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Erasure Decoding: Mathmatics
109
Original data: x1 x2 xk
Coded data: y1 y2 yn
Original data can be recovered if the sub-generator matrix
has a full rank k.
Systematic vs Non-Systematic ERC
110
Original data: 1 2 3 k k messages
Non systematic 1 2 3 k k+1 n
ERC:
Systematic 1 2 3 k k+1 n
ERC:
Systematic ERC
Slightly low encoding & decoding complexity
Even can’t recover, we can still use some original msg
Jin Li, Microsoft Research 55
56. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Reed-Solomon
111
Has been around for decades
Has systematic form
Cauchy Reed-Solomon Code
Tutorial, Jin Li
Reed-Solomon Decoding
Inverse
Receive
112
Jin Li, Microsoft Research 56
57. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
113 Dejitter Buffer
Variable Delay & Dejitter Buffer
Queuing Queuing Queuing
Delay Delay Delay
Dejitter
Buffer
Queuing delay
Dejitter buffers
Variable packet sizes
Jin Li, Microsoft Research 57
58. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Fixed Dejitter Buffer – Budget For Worst Case
Coder Queuing
Delay Delay Dejitter Buffer
40 ms 4-50 ms 50 ms
Site A Site B
Propagation
Delay—8 ms
(128kbps Bandwidth
Total End-to-End Delay
Codec delay: 40ms
Propagation delay: 8ms
Dejitter buffer: 50ms
To accommodate queuing delay: 0-50 ms
Total delay: 98ms
Dejitter Buffer Size & Late Loss
late loss
buffering delay
Fixed playout deadline and jitter
Playout Jitter absorption:
The playout rate is constant
The tradeoff is between Dejitter
buffer size and late loss
Delay Packet Loss
Jin Li, Microsoft Research 58
59. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Adaptive Playout and Dejitter Buffer Adaptation
buffering delay
Adaptive playout and jitter adaptation
Playout Jitter Scaling of voice/video packets in highly dynamic
way
Playout schedule set according to past delays
recorded
Usually dejitter buffer size expand quickly to late
packet arrival, and shrink slowly when jitter reduces
Delay Packet Loss
Improved tradeoff between buffering delay and
late loss
Playout rate is not constant
Adaptive Play Out
118
Audio Adaptive
Playout
Packets push into Adaptive Playout module
Render requests new waveform seg for playout
Playout module passes packet to audio decoder
Jin Li, Microsoft Research 59
60. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
119 Packet Loss Concealment
Audio Packet Loss Concealment
L ∆L
i-2 i-1 i lost i+1 i+2
alignment found by correlation time
i-2 i-1 i+1 i+2
time
2L
1.3 L
Depend on voiced & unvoiced segment
Jin Li, Microsoft Research 60
61. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Voiced segments
Unvoiced segments
Jin Li, Microsoft Research 61
62. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Concealment as (bi-directional)
stretching
Video Packet Loss Concealment
124
Spatial Concealment
Use spatial correlation
E.g., bilinear interpolation
Projection onto convex sets
Temporal Concealment
Use correlation exists between consecutive frames
Temporal replacement
Boundary matching
Jin Li, Microsoft Research 62
63. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Spatial-Temporal Concealment
125
126 Summary
Jin Li, Microsoft Research 63
64. CCNC 2010 Tutorial: Towards Glitch Free VoIP and Video Conferencing 1/12/2010
Summary
127
VoIP/Video Conference Systems
Infrastructure based
P2P based
Audio/Video Components
Audio codec
Video codec
Acoustic echo cancellation
Network components
Primer of the Internet
Network characteristics
Available bandwidth estimation
Forward error correction (FEC)
Dejitter buffer
Packet loss concealment
Jin Li, Microsoft Research 64