Question(s): 17 Meeting, date: Luleå, march 2010
Study Group: 12 Working Party: 3 Intended type of document (R-C-TD): TD
Source: France Telecom
Title: Play-out buffer events as performance parameters
Contact: Jean-Raymond Louvion Tel: +33 2 9605 3707
Orange Labs Fax: +33 2 9605 1252
France Email: jeanraymond.louvion@orange-
Contact: Pierre Boyer Tel: +33 2 9605 2239
Orange Labs Fax: +33 2 9605 1252
France Email: pierre.boyer@orange-
Please don’t change the structure of this table, just insert the necessary information.
This contribution addresses the impact of jitter-inducing networks (ATM, IP or Ethernet) on the
behaviour of real-time applications (VoIP, IPTV, VoD) implementing de-jittering (playout) buffers
in the reception side. It proposes to complement the delay-related performance parameters already
defined in Y.1540 and in IETF with new ones based on the events related to playout buffer
ITU-T recommendation Y.1540 defines an end-to-end 2-point IP packet delay variation.
End-to-end 2-point IP packet delay variation (PDV) is defined based on the observations of
corresponding IP packet arrivals at ingress and egress MP (e.g., MPDST, MPSRC). These observations
characterize the variability in the pattern of IP packet arrival events at the egress MP and the pattern
of corresponding events at the ingress MP with respect to a reference delay.
The delay variation of an individual packet is naturally defined as the difference between the actual
delay experienced by that packet and a nominal (or reference) delay. The preferred reference (used
in Y.1541 IPDV objectives) is the minimum delay of the population of interest. This ensures that all
variations will be reported as positive values, and this simplifies reporting the range of variation
(the maximum value of variation is equal to the range).
The preferred method (used in Y.1541 objectives) for summarising the delay variation of a
population of interest is to select upper and lower quantiles of the delay variation distribution and
then measure the distance between those quantiles. For example, select the 1-10-3 quantile and the 0
quantile (or minimum), make measurements, and observe the difference between the delay variation
values at these two quantiles. This example would help application designers determine the de-jitter
buffer size for no more than 0.1% total buffer over-flow.
This parameter is referred to as the PDV (Packet Delay Variation) and is a 2-point metric.
In addition to this definition, IETF provides another definition: Inter-Packet Delay Variation, IPDV,
where the reference is the previous packet in the stream (according to sending sequence), and the
reference changes for each packet in the stream. This form was called Instantaneous Packet Delay
Variation in early IETF contributions, and is similar to the packet spacing difference metric used for
interarrival jitter calculations in [RFC3550].
Although it relies on the observation of the arrival times of consecutive packets, IPDV is a 2-point
metric since it is derived from delays between a source and a destination.
RFC 5481 makes a very complete and exhaustive comparison of these two definitions in their
capability of being used for:
•Inferring queue occupation on a path,
•Determining de-jitter buffer size,
•Composing values obtained on different sub-paths in order to derive values for the entire
•Designing application-layer FEC.
As said before both IPDV and PDV are 2-point metrics, i.e. they are obtained from the comparison
of delays between two measurement points (e.g. a source and a destination). These measurement
points should accurately be time-related through clock synchronization. Several options are
available (GPS, CDMA or NTP), which provide a relative accuracy of the order of 1 ms.
Clock synchronization may be inconvenient or subject to appreciable errors. Round-trip
measurements may give a cumulative indication of the delay variation present on both directions of
the path. But this solution is not satisfying because delay distributions are rarely symmetrical, so it
is difficult to infer much about the one-way-delay variation from round-trip measurements.
This contribution proposes and describes new metrics based on events related to playout buffer
behaviour (dry-out and overflow). These metrics are 1-point metrics which solve the clock
synchronisation problem described above. In addition these metrics are relevant for real-time
applications (VoIP, IPTV, VoD) needs, since they mimic the behaviour of the playout buffers used
in these applications.
2. Dimensioning Playout Buffers (PoBs)
Real-time sources generate streams of audio signal or video images that are transmitted over packet-
based networks as IP packets or Ethernet frames.
In order to smooth out the natural effect of asynchronous networks (the so-called delay variation or
jitter), the arriving packets are temporary stored in a PoB, i.e. a device that is designed to counter
the jitter that is introduced by the network, until the moment the audio signal or the image needs to
be delivered to the decoding scheme. In order to ensure a continuous playout of streaming audio or
video, it is important to tune the PoB parameters such that, at the moment an audio or an image is to
be played out, all packets of that audio or image reside in the buffer.
The implementation of a playout scheme for streaming media over a packetized network (IP or
Ethernet) usually involves delaying the first packet of the stream at the PoB in the subscriber’s STB
over a sufficiently long period of time - the build-up time - so that the majority of the packet delays
incurred in the network can be absorbed. Adaptive playout strategies, such as adaptation of the
playout rate when the buffer is almost empty in an effort to avoid PoB starvation, are often included
Packet loss at the PoB originates from two different types of events: on the one hand there may be
overflow, which is due to having a full PoB upon arrival of a new packet, and on the other hand a
packet will be lost because of underflow when it arrives in the PoB after its designated playout time.
B. Steyaert and all has studied and modelled the probabilities of these events, which should not
exceed some target levels. Dimensioning the PoB amounts to expressing the required values of the
build-up time and the buffer size in terms of these target levels.
In the ATM context, a similar work had been done based of the 2-pt CDV (Cell Delay Variation)
which had been used in order to dimension the build-up time. It had been shown that the build-up
time could be estimated by a given quantile of the 2-pt CDV distribution.
1-pt delay variation may also be used in order to dimension the buffer size in case of applications
which send a periodic flow (which is the case of most real-time applications). Even for variable
bitrate applications the output rate of the buffer should be adapted from the quantity of data arriving
in the buffer (2 thresholds, EWMA,…).
With PDV, it is difficult to analyse the effects of delay variation on PoB behaviour and to take into
account network discrepancies (loss, reordering, path changes,…).
3. New metrics: PoB events
In terms of performance characterization, a Playout Buffer may be represented by 3 parameters: its
buffer size, the initial delay and the service rate.
The initial delay δ, also called build-up time, is the delay introduced by the buffer to play out the
first packet, in order to minimize the risk that the buffer would dry out.
The service rate T may be known by the destination application but it is usually not known inside
the network. Therefore this metric may be applied to packet flows containing a service clock. Well-
known examples are RTP flows and MPEG video flows.
3.1. IP packets containing a clock
In RTP flows, IP packets contain a 4-bytes timestamp which is used to enable the receiver to play
back the received samples at appropriate intervals. In addition this feature generalizes the use of this
protocol to variable bitrate flows.
MPEG2-TS flows are organized in streams related to the different components of the audio-visual
application (video, audio,…). Each stream gives birth to 188-bytes long PES (Packet Elementary
Stream) packets. These PES packets are then assembled by groups of 7 in 1316-bytes long IP
packets (1316 = 7 * 188).
PES packets may contain a Program Clock Reference (PCR) enabling the decoder to present
synchronized content, such as audio tracks matching the associated video. Usually the PCR is
imbedded in the video PES packets.
3.2. PoB events
H.222.0 (Information technology – Generic coding of moving pictures and associated audio
information: Systems) contain a lot of material which can be useful in defining the PoB behavior.
In particular, Annex D/H.222.0 (Systems timing model and application implications of this
Recommendation) gives some suggestions for implementing decoder systems to suit some typical
applications. It makes use of the clock reference timestamps, which are samples of the system time
clock, and applicable both to a decoder and to an encoder. They have a resolution of one part in 27
000 000 per second. As such, they can be utilized to implement clock reconstruction control loops
in decoders with sufficient accuracy for all identified applications.
In practice a decoder's free-running system clock frequency will not match the encoder's system
clock frequency which is sampled and indicated in the PCR values. The decoder's system time
clock can be made to slave its timing to the encoder using the received PCRs. The prototypical
method of slaving the decoder's clock to the received data stream is via a phase-locked loop (PLL).
This may be used in order to derive a generic PoB behaviour.
Annex J/H.222.0 (Interfacing jitter-inducing networks to MPEG-2 decoders) provides guidance and
insight to entities concerned with sending system streams over jitter-inducing networks.
Annex Q/H.222.0 (T-STD and P-STD buffer models for ISO/IEC 13818-7 ADTS) defines a buffer
model for audio streams which may be used for modeling PoBs.
The PoB may be modeled in order to determine the occurrence of a degraded situation. Indeed in
PoBs behavior two undesirable situations may happen: the FIFO may dry out or the FIFO may
overflow. Both situations lead to QoS degradations.
In such a model, for each IP packet, the clock reference PCR may be extracted from one of the PES
packets contained in the IP packet (in case of MPEG2-TS) or the clock reference may be extracted
from the RTP packet embedded in the IP packet.
4. Operational use of PoB events
4.1. Where these measurements should be performed?
These measurements could be performed at the source, i.e. at the output of a network head end, just
before entering the IP network before and/or after FEC capabilities (if any). Indeed an operator
should be in a position to check flows coming from service providers.
They could also be performed at network outputs (at DSLAMs or at routers connecting DSLAMs).
Indeed an operator should be in a position to check flows coming out of its own network and verify
that it conforms QoS expectations.
They could also be performed at INIs (Inter Network Interfaces), i.e. at interfaces where different
network operators interconnect. Indeed each operator should be in a position to verify that a flow
coming from another operator is conforming to QoS expectations.
Different statistics may be associated to PoB events, for example:
•The number of PCR packets finding a saturated FIFO
•The time when the first FIFO saturation is observed
•The number of PCR packets finding an empty
•The time when the first FIFO dry-out is observed
Other statistics are being defined in the Broadband Forum.
The following measurements have been obtained with a probe, named Amelie, designed by Orange
Labs in Lannion in 2005.
Amelie is a 2/4-port passive probe measuring the traffic passing on a 10/100/1000 Base-TX
Ethernet copper or optical interface and it complies with the 802.1Q (a.k.a. "VLAN tagging")
standard. Amelie can be connected directly to a network equipment port mirroring an ad hoc
selection of frames to be analyzed; alternatively, Amelie can receive the whole traffic carried by a
network link via an optical splitter —this latter option being preferred for traffic rate and jitter
measurements. Amelie is hosted by a 64-bit workstation supporting dual processors. The operating
system is based on a v2.6 LINUX kernel. Amelie basically runs an "on-the-fly" time-stamping
process of every incoming Ethernet frame.
Amelie emulates the behaviour of the decoder in the Set Top Box (STB) when it smoothes jitter out
of the TV digital flow. Figure 1 below shows the waiting time empirical distribution of the
MPEG2/4-TS packets in the end FIFO. Furthermore, Amelie studies the occurrence of FIFO
underflow and overflow by processing the program clock successive references (PCR) imbedded in
the video elementary stream. It gives the time of first passage into the underflow and overflow
states and subsequent inter-arrival times of such unexpected events.
End FIFO underflow and overflow induce image freezing and artefacts experienced by the client
and therefore must be accounted against the network performance.
The end FIFO behaviour assumes that values have been set for a build-up time and an MPEG
packet delay variation tolerance. Default values are 3 ms for both. The digital TV flow bit rate is
derived from the program clock references; thus, it can be either constant or variable and does not
need to be specified.
Program clock references are computed by the encoder in a recursion loop involving successively
every preceding packet. As seen above, Amelie has no reliable information on which packets have
been lost in the network so that it cannot alter the program clock references to take loss into
account. Therefore, the queuing process in the end FIFO is only due to MPEG packet propagation
delay variations throughout the network.
Figure 1 shows an end FIFO waiting time empirical distribution for a SD TV digital flow. It can be
seen that the end FIFO waiting time empirical distribution spreads over [0, 6 ms] which is the full
range of possible values set by the operator.
Figure 1 - End FIFO waiting time empirical distribution for a SD TV digital flow
In this measurement, 2 728 731 PCR packets have been observed. Less than 1% (0.78%) of PCR
packets experience a waiting time in the FIFO lower than 537.9 µs and less than 1% (0.78%) of
PCR packets experience a waiting time in the FIFO greater than 5 828 µs.
No program clock discontinuity has been observed.
The number of PCR packets finding a saturated FIFO is equal to 961. The time when the first FIFO
saturation is observed is equal to 17 442 s.
The number of PCR packets finding an empty FIFO is equal to 2. The time when the first FIFO dry-
out is observed is equal to 55 981 s.
In addition to these measurements, it may be interesting to compare the discrepancies between the
Delay Factor and the end FIFO waiting time respective densities. Indeed, the Delay Factor does not
take into account the behaviour of the specific algorithm (PLL: Phase-Locked Loop) which is
implemented in the decoder to slave its timing to the encoder. As opposed to that, the emulation of
the end FIFO addresses the case when the operational limits of this algorithm are met in the Set Top
Box due to clock inaccuracy or network shortage of bandwidth.
ITU-T recommendation I.356 (2000), B-ISDN ATM layer cell transfer performance.
ITU-T recommendation Y.1540 (2007), Internet protocol data communication service – IP packet
transfer and availability performance parameters.
ITU-T recommendation H.222.0 (2006), Information technology – Generic coding of moving
pictures and associated audio information: Systems.
IETF RFC 5481 (March 2009), A. Morton, B. Claise, Packet delay variation applicability
B. Steyaert, K. Laevens, D. De Vleeschauwer, H. Bruneel, Analysis and design of a playout buffer
for VBR streaming video, Ann Oper Res (2008) n° 162.