Accurate packet-by-packet measurement and analysis of video ...
ARTICLE IN PRESS
Signal Processing: Image Communication 22 (2007) 69–85
Accurate packet-by-packet measurement and analysis of video
streams across an Internet tight link
M. Paredes Farrera, M. FleuryÃ, M. Ghanbari
Electronic Systems Engineering Department, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
Received 15 March 2006; received in revised form 3 November 2006; accepted 14 November 2006
The response to a video stream across an Internet end-to-end path particularly depends on the performance at the path’s
tight link, which can be examined in a simple network testbed. A packet-by-packet (PbP) measurement methodology
applied to tight link analysis requires a real-time operating system to gain the desired timing resolution during trafﬁc
generation experiments. If, as is common for other purposes, the analysis was simply in terms of average packet rate per
second, no burst pattern would be apparent, and without packet-level measurement of instantaneous bandwidth the
differing overheads would not be apparent. An illustrative case study, based upon the H.263+video codec, conﬁrms the
advantage of the PbP methodology in determining received video characteristics according to packetization scheme, inter-
packet gap, router response, and background trafﬁc. Tests show that routers become unreliable if the packet arrival rate
passes a critical threshold, one consequence of which is that reported router processor load also becomes unreliable. Video
stream application programmers should take steps to reduce packet rates and aggregate packet rates may be reduced
through network management. In the case study, a burst of just nine packets increased the probability of packet loss, while
the video quality could be improved by packing at least two slices into a packet. The paper demonstrates that an
appropriate packetization scheme has an important role in ensuring received video quality, but a physical testbed and a
precise measurement methodology are needed to identify that scheme.
r 2006 Elsevier B.V. All rights reserved.
Keywords: Video streaming; Packet-by-packet analysis; Router response
1. Introduction applications, as opposed to 30% by Web trafﬁc.
Current video applications include streaming of pre-
The most recent Sprint inter-packet (IP) backbone encoded video, the exchange of personal video clips
survey  reported that 60% of trafﬁc on some links (peer-to-peer streaming), and the delivery of sports
is now generated by streaming or ﬁle sharing and news clips (possibly involving real-time (RT)
generation of video). The Advanced Network and
Services Surveyor Project  monitors the large-
ÃCorresponding author. Tel.: +44 1026 872817;
scale behavior of the Internet to determine its
fax: +44 1026 872900.
characteristics. A key ﬁnding  of this and other
E-mail addresses: firstname.lastname@example.org (M. Paredes Farrera),
ﬂeum@essex.ac.uk (M. Fleury), email@example.com surveys is that most of the Internet core is relatively
(M. Ghanbari). lightly loaded, with hotspots at intersections between
0923-5965/$ - see front matter r 2006 Elsevier B.V. All rights reserved.
ARTICLE IN PRESS
70 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
networks and at access points. This should not be every 33 ms and if individual Common Intermediate
surprising, as over provisioning on packet networks Format (CIF)-sized pictures are broken into 18
is a common way  to protect the failure of network variable-sized slices, with one slice per IP, then the
elements and support trafﬁc growth. Therefore, to IP gap (IPG) is 1.9 ms if packets are generated at
ﬁnd likely causes of video quality degradation needs equal intervals. Video bandwidths can clearly be
a careful examination of video stream behavior at the much higher than other ﬂows , which is a
tight or bottleneck link,1 which normally occurs at problem if the video trafﬁc takes up a sizeable
network boundaries. proportion of the bandwidth across a tight link.
In this study, IP video trafﬁc is measured in Video delivery should avoid regimes that result in
isolation on a network testbed, with competing signiﬁcant packet losses at the router queues.
Internet trafﬁc being represented as generated However, some observers note  that streaming
background trafﬁc across a critical bottleneck or applications often send a bursty stream, either for
tight link. A tight link is the link with the least reasons of coding efﬁciency or when a none RT
available bandwidth on an entire end-to-end path. operating system (o.s.) falls behind its schedule and
In a study to test the ability of the STAB bandwidth releases a packet burst. This paper assumes that a
probing tool  to locate thin links,2 the link most bursty stream is present at the tight link, possibly
likely to be tight in terms of available bandwidth resulting from one session amongst a series of
across a 15/16 hop Internet path was found to be parallel sessions generated by a server behind a fast
located close to the edge of the end-to-end path, link. Other sessions may not generate bursty trafﬁc
which may well be a general conclusion. Modeling a and burstiness may be reduced if the server lies
tight link in a network testbed says nothing about behind a slow link. In most reasonable o.s./driver
overall delay or variation of delay across an entire implementations, the driver is relatively immune
path. Nor does the model necessarily represent from scheduling, implying that, if application
realistic background trafﬁc. Measurements taken scheduling is not applied, that coding efﬁciency is
directly from the Internet are needed for this a cause of packet bursts.
purpose, especially if unusual trafﬁc events are of Choice of metric is an important issue in video
interest. However, the testbed approach is designed streaming research. For example, in  a standar-
to stress the video stream as it passes through dized congestion control unit and standardized
typical routers working at the limits of their set of reported metrics is proposed. In particular,
performance range. The intention is to identify per-packet or instantaneous bandwidth is carefully
potential problems that a video stream will en- deﬁned in , as the author considers that ‘‘per-
counter with a view to guiding the design of a packet bandwidth reporting is the most appropriate
streaming application. Many simulation studies of for adaptive streaming applications’’, because of
video stream congestion control, e.g. [22,26], use a its responsiveness to changes in available band-
similar simple network topology to the one in this width. The work in  is of a theoretical nature,
study. Consequentially, the testbed has also been whereas the measurement methodology developed
used as a means of calibrating network simulators, on this paper’s network testbed could directly be
although this topic is not pursued in this paper. transferred to a congestion control unit. TCP
As video and audio packets are often closely (not generally used for video streaming) adjusts its
spaced, loss correlation is a more serious problem window size according to the packet loss rate
for video streaming than for other applications , (from dropped acknowledgements) and round-trip
for which losses may appear as essentially random time (from packet timers), and it is likely that
. Video IP spacing in time is typically closer than higher-end routers adopt a similar strategy for
audio (averaging around 20 ms) and unlike audio, queue management. The Cisco 7500 series invokes
video packets vary in length. For example, for a input buffering upon ﬁnding the output queue is
30 frame/s stream, each frame must be delivered congested. The lower-end Cisco router employed
in this paper allows the value of metrics such as
Strictly, a bottleneck link also includes the possibility of a packet loss rate and CPU usage to be reported
narrow link, one with minimum capacity on a network path. The
back to the user, and we have taken advantage of
link with minimum capacity is not always the same as the link
with least available bandwidth on a network path . that in the experiments. In active queue manage-
A thin link has less available bandwidth than all those ment systems  other metrics are possible such as
preceding it on the path. TCP goodput, TCP and User Datagram Protocol
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 71
(UDP) loss rate, queueing delay and consecutive 2. Trafﬁc measurement methodology
In summary, the main objective of this paper is to 2.1. Network testbed conﬁguration
provide a measurement and analysis methodology
that will aid the design of video streaming applica- Fig. 1 shows the simple network testbed em-
tions. If analysis is only in terms of average packet ployed in the experiments. Clearly, the bottleneck
rate per second, as might be used for network link is located in the 2 Mb/s serial link between the
dimensioning and similar purposes, no burst pattern two Cisco 2600 routers. Otherwise, 100 Mb/s fast
would be apparent, and without packet-level Ethernet links connect the testbed components to
measurement of instantaneous bandwidth the vary- ensure no other source of congestion. The sender
ing overheads would not be visible. In particular, machine hosts the trafﬁc generator stg (Section
the methodology is applied to the study of bottle- 2.3), while the Linux router monitors and stores
neck links, and a case study on packetization trafﬁc data ﬂowing onto the bottleneck link with
schemes for an H.263+  encoded video stream tcpdump (Section 2.2). Likewise, the receiver
demonstrates the value of the approach. The PSNR monitors and stores trafﬁc data arriving at the
of the delivered video stream is signiﬁcantly receiver. To aid replication of the setup the
improved if an appropriate packetization scheme conﬁguration details are given in Table 1. Small-
is selected. sized output queues are employed at routers to
The remainder of this paper is organized as avoid delay to TCP packets, as TCP—of course, the
follows. Section 2 details the video streaming dominant Internet protocol—relies on round-trip
network testbed, the software tools employed, and times to govern ﬂow control. Although, the video
the measurement methodology. Section 3 illustrates stream generated in Section 4 is carried by UDP, the
the need for a testbed by examining router response default buffer size settings of the Cisco router were
at a tight link. Section 4 applies a video stream initially retained, as these would be the likely sizes in
across the congested tight link and identiﬁes the a realistic Internet environment. In the interests of
role of appropriate packetization in improving accurate scheduling of packets in time, the Linux
the delivered video quality, given the likely sender o.s. kernel is run with the KURT RT patch,
router response. Finally, Section 5 presents some as further discussed in Section 2.3. Network
conclusions. planners  commonly recommend to clients a
100 Mbit/s 100 Mbit/s 2 Mbit/s 100 Mbit/s
Linux Sender Linux Router Cisco Cisco Linux
Router_A Router_B Receiver
Fig. 1. Simple network testbed employed to model the effect of a tight link on an Internet path with Cisco routers.
Network component settings
Linux machines Routers
CPU Pentium-IV 1.7 GHz Model Cisco 2600
NIC Intel Pro 10/100 Software Version 12.2 (13a) of Cisco IOS
Queue policy Fast FIFO Queue policy FIFO
Queue length (QL) 100 Queue length (QL) In 75, out 40
MTU 1500 MTU 1500
OS Linux kernel v. 2.4.9
ARTICLE IN PRESS
72 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
T1 or E1 link with bandwidth of, respectively, 1.544 may not be a cost-effective solution if used as part
or 2.048 Mb/s between a LAN and the border of a congestion control unit.
gateway (or a satellite link although with greater
latency). Cost is also a signiﬁcant consideration in 2.3. Trafﬁc generator
selection of a Cisco 2600 series router, and, hence,
the same router is commonly found at the LAN In this paper’s methodology, video and audio
edge in network plans . A Cisco 2600 series router trafﬁc patterns are classiﬁed into two types,
has a Motorola MPC860 40 MHz CPU with a constant bit-rate (CBR) and variable bit-rate
20 MHz internal bus clock and a default 4 MB (VBR), with CBR trafﬁc being deﬁned as ‘‘a trafﬁc
DRAM packet memory . pattern with a steady bit rate during a given time
interval’’ and VBR as ‘‘a trafﬁc pattern with a
2.2. Trafﬁc monitoring changing bit rate during a given time interval’’. For
the experiments, the trafﬁc generator components
The monitoring tool employed was the well- were stg and rtg, respectively, for sending and
known software utility tcpdump  layered on the receiving trafﬁc. Both are part of the NCTUns 
libpcap monitoring library, which runs on an network simulator package. NCTUns is a simulator
Ethernet interface set to promiscuous mode. Mon- that employs protocol stack re-entry to allow an
itoring points were setup on the three Linux PCs in application to be accurately emulated. As the TCP/
the testbed. The tcpdump program may in some IP protocol stack is directly incorporated into
circumstances  present ‘bugs’ and timing errors NCTUns, stg and rtg are easily transferred to
that will affect the accuracy (nearness to the true work in a real network environment rather than
value) and precision (consistency of measurement) within a simulator. The generator was modiﬁed4 to
of timestamps. Example measures taken to avoid work on a normal Linux system (as NCTUns
errors were: originally ran on the OpenBSD o.s.). One can
create packet-by-packet (PbP) trafﬁc patterns under
Placing tcpdump on a separate Linux router the UDP by establishing the behavior of the packet
rather than the Linux sender to avoid CPU length (PL) and IPG in every packet through an
overload of the sender machine, which would input trace ﬁle.
result in packet drops by the monitor process. A fundamental requirement of PbP analysis is the
Only taking relative time measurements, thus, ability to create extremely precise and predictable
avoiding the need to synchronize clocks. trafﬁc patterns. Generating packets with hard
Not using a high-speed link, which otherwise can deadlines requires a RT o.s. Accordingly, the Linux
also lead to timestamp repetitions. kernel on the testbed machines was patched with the
Monitoring the CPU load (Pentium-IV in Table Kansas University RT (KURT) kernel . The
1) to avoid packet drops while monitoring with KURT kernel modiﬁcation allows event scheduling
tcpdump. with a resolution of tens of microseconds. KURT
Making a sanity check to ensure that all packets decreases kernel latency by running Linux itself as a
sent could be accounted for. background process controlled by a small RT
executive (RTE). The desired accuracy was obtained
Measurement errors under Linux can still occur if by running stg as a normal Linux process rather
the time resolution is too brief. In order to establish than a speciﬁcally RT process under the control of
conﬁdence in the accuracy and precision of any the RTE.
timestamps, tests were carried out at the monitoring Precise event scheduling was established in order
points in order to ﬁnd the time range that curtailed to perform reliable experiments. In live applications,
errors in the measurements. Based on test results we the PIAT may vary from the desired value because
found that a safe range for the experiments was for of application-level scheduling; prior network jitter
time values greater than 90–100 ms. A DAG3 card on previous links; and smoothing by decoder
 with Global Positioning System (GPS) module buffers. The experiments represent a ground truth,
to create time stamps is an alternative solution that without these effects included.
avoids tcpdump’s vagaries. However, a DAG card
The modiﬁed version can be downloaded from: http://
DAG is not an acronym. privatewww.essex.ac.uk/$mpared/perf-tools/srtg.tgz.
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 73
2.4. Trafﬁc metrics their IP ﬂow characteristics, as recommended by the
Internet Engineering Task Force (IETF) . An
For the PbP video experiments, three metrics IP-ﬂow is deﬁned as a group of packets that share
selected were: PL, Packet Inter-Arrival Time some or all the following characteristics: source and
(PIAT),5 and Packet Throughput (PT). Surpris- destination address; source and destination ports;
ingly, PL is not widely utilized as a metric in and protocol. Tcpﬂw can read tcpdump and ns-2
measurement studies and trafﬁc analysis, although  traceﬁles. Every ﬂow is visualized linearly (by
this metric provides an insight into common time) and by a frequency histogram. Second-order
application trafﬁc patterns. In an encoded video statistics are also obtained for every metric.
streaming session, the PL varies depending on the
packetization scheme employed and headers added
by the protocol used to transmit the video session. 3. Testbed characterization
Apart from IP and UDP headers, a Real-time
Protocol (RTP) or equivalent header  is added at Two scenarios established ﬁrstly the accuracy that
the application layer. The PIAT is another impor- the trafﬁc generator was capable of when generating
tant performance metric when observing packet packets and secondly the response to trafﬁc of the
spacing during a video streaming session. The PIAT Cisco routers on either side of the bottleneck link.
metric is one of the most sensitive to network
condition changes: transmission delays, queuing
delays, packet loss, packets routed by different 3.1. Trafﬁc generator accuracy
routes, fragmentation and other hardware and
software processes involved during packet transfer. The stg trafﬁc generator was operated under
Hence, it is not common to observe regular patterns UDP, and in traceﬁle mode. For this test, the two
for this metric. Finally, as mentioned in Section 1, it Cisco routers and the serial link of Fig. 1 were
is important  to deﬁne PT carefully. It represents removed so that the Linux sender was connected to
the throughput arising from one packet. For the Linux router, which in turn was connected to
application-level studies the PT affects a router’s the Linux receiver over the 100 Mb/s link. PIAT
response and, hence is more relevant than the measurements were compared on a normal Linux
available bandwidth. The PT was calculated in the kernel and then the KURT patched Linux kernel,
following way. If a pair of packets is observed, becoming an RT kernel. The trafﬁc pattern was
the ﬁrst packet’s length is divided by the time CBR. The PL was ﬁxed at 60 byte (B). The trafﬁc
difference between the second and ﬁrst packet, i.e. generator generated streams of 2-min duration (a
by the ‘PIAT’. This can be expressed for packet stream per data point in Fig. 2), with the source
number n at arrival time tn by Eq. (1). PIAT varying from 1 Â 10À4 to 1 Â 10À1 s. There-
fore, each of the streams resulted in a minimum of
PTn ¼ ¼ , (1) 1200 packets transmitted. (Note also that the
tnþ1 À tn PIATn
estimated plot in Fig. 2 is the ideal measured
in which the fti g; i ¼ 1; 2; . . . ; n are arrival times at PIAT.) From Fig. 2, observe that the PIAT
the receiver. Again, although used in Paxson’s well- measured for a normal kernel is a constant value
known tcptrace tool , this metric is otherwise of 0.02 s for any value fed into the trafﬁc generator
not common in trafﬁc analysis. less than 0.02 s. This implies that even if the trafﬁc
generator is instructed to deliver packets with (say)
a 0.01 PIAT it will only be able to send packets at
2.5. Analysis tool
0.02 s. The RT kernel improves the accuracy and
stability of generated UDP CBR trafﬁc. Detailed
In order to analyze the tcpdump traceﬁles, an
analysis shows that, over acceptable PIAT measure-
especially designed tool called tcpﬂw was prepared.
ments, the error ranged from 0.15% to 13% for the
Tcpﬂw is actually applicable to UDP and not
RT kernel, while for equivalent measurements with
simply TCP. Tcpﬂw categorizes trafﬁc based on
the normal kernel the error is considerably greater.
Note that as elsewhere in the literature, PIAT refers to the Hence, the RT kernel was employed for the video
desired PIAT as generated at the video packet source, and is experiments. Further detailed analysis of the beha-
synonymous with IPG. vior of the RT kernel and stg can be found in .
ARTICLE IN PRESS
74 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
Fig. 2. PIAT replication by the trafﬁc generator using a normal and RT kernel for measured (m) and estimated (e) PIAT, with limits to
resolution of each kernel indicated.
3.2. Router response of the packet rate by simply counting packets over
the 2-min duration, which procedure increases
The response of the two routers to injected trafﬁc accuracy. The CPU load reading was taken over
in the testbed network of Fig. 1 will affect the the middle 1 min, ensuring that sufﬁcient packets
measurements. In the testbed, Router A works as a had passed through the router’s buffers in the initial
trafﬁc shaper maximally receiving at its Ethernet 30 s. If other measurement metrics are employed
interface at most 100 Mb/s and then reducing that other than packet rate then a misleading impression
trafﬁc to 2 Mb/s at the serial interface. In order to results, as was analyzed in .
reduce the rate, the router must drop any excessive In Fig. 3, the processing load at the router sharply
packets, usually at its output queue, as this is where increases when the arrival rate is in the region of
the bandwidth constriction occurs. The CPU 4000 packet/s (PIAT ¼ 0.00025), for all PLs6 from
processing load was recorded for both routers by 65 to 1500 B. Based on this result it appears that the
setting ‘‘show processes cpu’’ taking only the ‘‘cpu CPU load is largely dependent upon the packet rate
utilization every minute’’ in the router conﬁgura- and not on the PL. After the 4000 packet/s break
tion. The ideal setting for this analysis might be an point the router behaves erratically. Apparently the
average over a period shorter than 1 min to increase CPU load decreases with a high packet arrival rate.
the measurement resolution. However, more fre- However, this is not the case, as the router is under
quent timings actually put more stress upon the such stress that CPU performance reporting be-
routers. The desired outcomes were: the trafﬁc comes erroneous. Other symptoms of this break-
conditions when the router becomes unstable; and down are reported ‘failure on the serial link’ and
the best packet size for Internet applications based other alarms. Further characterization of the router
on router response to UDP trafﬁc. The following behavior is restricted as Cisco’s IOS o.s. is
Ethernet frame sizes were generated: 65, 90, 130, proprietary software.
1200 and 1500 B. Thirty streams of 2-min duration The queuing policy in router A of Fig. 1 is First-
were generated for each frame/packet size. Each In-First-Out (FIFO) or drop tail, which means that
stream had a constant PIAT. Then, the range when the output queue is full the router will become
1 Â 10À4 to 1Â10À1 s was divided in equal portions
across the 30 streams for each frame/packet size. 6
The number of data points for the PL of 500 B is reduced for
For clarity, the 30 data points are not marked on compatibility with later Fig. 10. No difference in behavior is
the plots of Fig. 3. In fact, measurements were taken masked by this change.
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 75
Fig. 3. Router A packet rate response with different PL.
Fig. 4. Packet loss with increasing packet rate.
busy discharging packets. Therefore, small packet payload, discounting headers (IP/UDP/RTP
bursts can trigger the same response as that created 20+8+12 ¼ 40 B), than a typical VoIP (30 B).
by a continuous rate of around 4000 packet/s and Therefore, in practice the smaller frame size is
above. Packet loss is largely independent of CPU unlikely to occur and certainly will not normally
load, since, as Fig. 4 illustrates, the loss rates occur for video streams, except for fragmented
increase linearly for a given PL. (The plot for 1200 B packets or feedback messages.
at the resolution of Fig. 4 is superimposed on that In summary, the CPU load response on typical
for a PL of 1500 B.) (Cisco 2600) Internet routers was measured to
It is likely that Cisco 2600 routers are provisioned determine how bit rate, PL, and packet rate affect
to cope with Voice-over-IP (VoIP) trafﬁc. However, the router’s response. In making these observations,
the 65 B frame size plot in Fig. 3 has a smaller no special weakness of Cisco 2600 routers is
ARTICLE IN PRESS
76 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
implied, as these routers are perfectly suitable for multi-slice packetization may appear as an intuitive
their tasks. The trafﬁc characteristics determine the improvement (as it reduces header overhead),
router’s CPU load and hence: because of the possibility of packet loss bursts,
there is uncertainty as to the relative advantages of
The router CPU load response is largely related one scheme or another. Although not explored in
to the packet rate and not to the PL or bit rate, this paper, when a slice exceeds the frame size, or
and it is this aggregate packet rate that should be when two slices exceed the frame size and so on is an
checked in network management. For a given issue. The loss of part of a slice will nullify the
PL, packet loss is largely independent of CPU successful reception of the other part. In , the
load being linearly related to packet rate. burst length is also identiﬁed (for the H.264 codec)
The recorded CPU load response may be as a source of degradation, as much as the average
symptomatic of a general processing bottleneck, packet loss rate.
which may or may not be attributed to other sub- In this case study, a VBR H.263+ coded video
processors such as the serial interface sub- sequence represented the test video stream. Every
processor. CIF frame was split into the usual 18 macro-block
In the experiments, after the 4000 packet/s point row-wise slices, to prevent the propagation of
the router become unstable for the default router channel errors within a picture by providing
conﬁguration in use. However, practical video synchronization markers, and then transmitted
streaming applications are unlikely to require a using one or two slices per packet. If slices were to
sustained rate of 4000 packet/s or above, be split between packets then the presence of the
although small packet bursts may approach this slice header in one of the packets and the use of
rate. variable length coding would cause more data to be
The best trafﬁc conditions were found when the lost than present in any single packet. The method
PL was larger, PL (1500 B) for standard Ethernet of delivery was varied, either per frame packet
frames, because larger packets require smaller bursts, or uniform (constant) IPG.
packet rates to transmit data, which application
programmers should bear in mind. 4.1. Video characteristics
4. PbP analysis applied to video streaming Table 2 shows the test video characteristics, which
was an ‘‘Interview’’ recording. This recording is a
The packetization method used to stream video ‘‘head and shoulders’’ video sequence with CIF
on an Internet plays a vital role in controlling format that results in suitable data for the desired
packet loss and, hence, received video quality. This packetization lengths without causing packet frag-
in turn will be affected by the likely router response. mentation. The frame rate was 30 frame/s, resulting
Some studies of packetization schemes for the in a 1-slice scheme generating a mean 540 packet/s,
H.263+video codec, for example , tend to and a 2-slice scheme generating a mean 270 packet/
assume the one slice per packet recommendation s. Although, the mean rate is below the maximum
contained in RFC 2429 Section 3 . Similarly, in rate in Fig. 3 of 4000 packet/s, nevertheless because
 single spatial row of macro-blocks or Group of of the burstiness of the source, small packet bursts
Blocks (GOB) is assigned per packet, when the easily exceed that rate. For example in Fig. 5, for
optional H.263 Annex K slice-structured mode is frame 298 of the sequence, an instantaneous rate of
not applied. We have set a slice to correspond to a 115 384 packet/s occurs. A 17-B header was also
GOB, which is similar to the MPEG-2 deﬁnition of added to each packet to keep track of the frame
a slice. However, RFC 2429 points out the
possibility of rectangular slices and other arrange-
ments to aid error concealment . In , all even Table 2
‘‘Interview’’ encoded video stream characteristics
GOBs and all odd GOBs are packed into two
different packets (called slice interleaving) at QCIF Average bit-rate (kb/s) 187
resolution. We have assumed a simple (perhaps Frame size (CIF) 352 Â 288
oversimpliﬁed) packetization strategy, but the ﬁnd- Frame rate (f/s) 30
Video duration (s) 60
ings could be equally well applied to more Intra refresh period (f) 10
sophisticated strategies. Although, application of a
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 77
Fig. 5. Illustrative packet burst showing timings and packet lengths.
sequence number, media type, frame number, 4.2. Measurement results
packet number and timestamp. All these ﬁelds are
used to re-construct the video at the receiver side. Statistics were collected at the network level: PbP
(An RTP header, which serves a similar purpose with tcpdump and tcpﬂw; and at the video level:
although with reduced functionality, would be 12 B with the decoder and encoder information. It is of
in size. Cisco 2600 series routers do not support interest to observe how packet loss can affect
discriminatory treatment of RTP against UDP, objective video quality, luminance peak-signal-to-
although higher-end Cisco routers do, as do some noise ratio (PSNR) taken on a frame-by-frame
Ethernet drivers which perform trafﬁc analysis.) basis, comparing source with received decoded
The 10-frame refresh period implies an Intra (I) frame. Table 4 presents packet losses for the test
picture inserted into the Predicted (P) pictures. No Interview video analyzed by picture type. There was
B-pictures were used in this experiment. at the very least a twofold reduction in total packet
Figs. 6(a) and (b) show the PL frequency losses, when using the two-slice rather than the one-
distributions for, respectively, the one- and two- slice scheme. In part, this was due to the reduced
slice schemes (as taken from encoder packet header header overhead, illustrated by the constant offset
information). (The ‘Ethernet’ bars are simply offset between the measured one- and two-slice scheme bit
by 59 B representing the extra UDP and the frame rates in Fig. 8 for a uniform delivery method. Table
header overhead.) 4 also shows that the uniform method reduced
Two delivery techniques were applied in the packet losses, by 44% or 58%, depending on the
experiments: (1) Uniform: IPG of 1/540 s for all packetization scheme. We postulate that this effect
packets in a frame, and (2) Burst: IPG of 1/30 s. In occurs due to router queue behavior when faced
order to test video delivery under difﬁcult condi- with a sudden rush of packets. Notice that the more
tions, for all experiments in this section we added important I-pictures are more favorably treated by
background trafﬁc at 1.8 Mb/s with a normal the two-slice scheme.
probability density function (pdf) (mean PL Now compare the best- and the worst-case
1000 B, with standard deviation of 100 B) and performance for this video communication. Fig. 9
constant IPG of 0.004444 s. In Fig. 7(a) and (b), plots the PSNR on a frame-by-frame basis of the
observe the markedly different PL patterns between worst (one-slice burst) and best (two-slice uniform)
the two schemes. The larger packets (against the y- cases in terms of total packet loss. The plot marked
axis in Fig. 7(b) are caused by the leading I-picture. ‘‘Source’’ is the PSNR of the source video clip
The video statistics analyzed by picture type for the without any loss but after passing through the
different experiments are shown in Table 3. codec. The best-case plot consistently tracks the
ARTICLE IN PRESS
78 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
Fig. 6. PL frequency distribution comparison for (a) 1- and (b) 2-slice as at the encoder, and output to the network.
source PSNR curve. The behavior of the one-slice ever, it is possible that appropriate error conceal-
burst PSNR plot is erratic and most of the time ment techniques (not present in H.263), when
remains below the best-case plot, in some frames applied, would signiﬁcantly improve the quality of
being 20 dB below the source PSNR. At these the one-slice plot.
PSNR levels, the one-slice video would be unwatch-
able. Appropriate error resilience techniques for 4.3. Further measurement results
H.263  streams were applied, namely selection of
H.263 Annex K slice structured mode, Annex R To check the impact of changing router queue
independent segment (slice) decoding mode, and length (QL), and differing background trafﬁc, a
Annex N reference picture selection mode. How- further set of experiments were conducted. Of
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 79
Fig. 7. PL distribution in time for (a) 1-slice uniform and (b) burst delivery schemes.
Table 3 practical necessity, the experimental setup varied,
Slice structure characteristics by I- and P-pictures although the measurement methodology, equip-
ment, and compressed video source remained the
1-Slice I 1-Slice P 2-Slice I 2-Slice P same. Linux kernel version 2.6.18 was installed
Total slices (n) 3240 29 160 1620 14 580 allowing timing by means of the Hrtimer from
Min. size (B) 159 6 345 13 Linutronix , which is a successor to UTIME
Max. size (B) 750 178 1123 346 employed by KURT. As in some of the experiments
Mean size (B) 281.5 16.9 563.1 33.7 higher background trafﬁc rates are generated,
Std. dev. (B) 89.3 18.9 163.2 36.7
Median (B) 266 11 544 23
background trafﬁc generation was delegated to a
second Linux sender, allowing the original Linux
ARTICLE IN PRESS
80 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
Packet loss numbers by slice and delivery method, and picture type
Burst Uniform Burst Uniform
I P I P I P I P
Packet loss (PLoss) 538 7992 304 4456 259 2946 41 1255
PLoss (%) 16.7 27.4 9.4 15.3 16.0 20.2 2.5 8.6
Fig. 8. Bandwidth comparison (measurement at 1 s intervals) for 1-slice and 2-slice schemes.
Fig. 9. PSNR comparison for the worst- and best-case packet loss schemes, over the range of frame numbers 800–900.
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 81
Fig. 10. Router A packet rate response with PL ¼ 500 and differing QL.
Fig. 11. Packet loss with increasing packet rate and differing QL.
sender to specialize in video trafﬁc generation. The default buffer size refers to the Cisco default setting
Cisco routers’ o.s. was upgraded to IOS C2600-I-M, of Table 1. The buffer size was then stepped at
version 12.2(13a), release fc2. intervals of 100 packets until the pattern became
In Fig. 10, the same experiment as recorded in apparent. As the buffer size is increased the bit rate
Fig. 3 was repeated7 but with a ﬁxed PL of 500 B. at which reporting becomes unstable (see Section
This PL is close to the Maximum Transport Unit 3.2) is lowered. The onset of this behavior also
(MTU) that must be supported by all routers occurs at a lower recorded CPU load. We surmise
without subsequent fragmentation. In Fig. 10, the that management of the buffer places a greater load
on the CPU itself or a sub-processor. Fig. 11
The number of data points is less than that of Fig. 3 but the demonstrates that packet loss rate is largely
essential response pattern is retained. independent of buffer size and CPU load, as the
ARTICLE IN PRESS
82 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
Packet loss by queue length with 1.8 Mb/s normal pdf background trafﬁc
QL (packets) 75 100 200 300 400 500 600 700 800 900 1000
1-Slice burst PLoss (packets) 6845 5236 4011 4400 3894 3805 3609 3337 3387 2819 2938
2-Slice uniform PLoss (packets) 657 629 674 425 472 373 313 291 224 151 113
Packet loss with normal and Pareto background trafﬁc pdfs at various mean rates, with QL ¼ 200
Background trafﬁc Normal 1.5 Mb/s Normal 1.8 Mb/s Normal 1.9 Mb/s Pareto 1.5 Mb/s Pareto 1.8 Mb/s Pareto 1.9 Mb/s
PLoss (packets) 1 3880 6593 2584 8194 8096
PLoss (packets) 0 608 1484 269 1688 1790
QL only has a temporary effect in stemming packet by a linear scaling so that the mean was at desired
losses. In Fig. 11, the resolution of the plot does not mean bitrates. The intention of applying a Pareto
show small variations in loss numbers. pdf to PIATs was to judge the effect of a different
Table 5 records packet losses recorded when packet arrival pattern upon the router. No claim is
altering both the input and output buffers (on both made that this distribution mimics the effect of
Cisco routers) to the given QL. Prior experiments typical Web server trafﬁc, for which an on-off
established that altering the input buffer size (Table model with Pareto distribution of burst length has
1) alone did not impact on packet loss numbers. The been applied .
same background trafﬁc as in Section 4.2, 1.8 Mb/s From Table 6, a normal pdf background at mean
normal pdf, was injected alongside the video stream. rate 1.5 Mb/s results in just one packet loss with one
Setting the QLs to 75 packets equates to the slice per packet. Packet losses for normal pdf
experiments in Section 4.2, when it will be seen background at mean rate 1.8 Mb/s differ somewhat
from Table 5 that the packet loss numbers are from those in Table 4, as is usual due to system
somewhat reduced on Table 4’s ﬁgures, resulting effects such as process scheduling. The main effect
from the changes described in the previous para- of introducing another density is that packet losses
graph. A check with input QL set to 75 and output are much greater, including those for a mean rate of
set to 100, exactly as in Table 1, did not appreciably 1.5 Mb/s, which indicates the burstiness of the
alter the loss numbers. Clearly, when the QL is background trafﬁc source. Burstiness also affects
increased then there is a decreasing trend in packet the relative packet losses at rates of 1.8 and 1.9 Mb/s
losses for both the packetization methods. which are similar in the presence of Pareto back-
In Table 6, for a QL of 200 packets, three ground trafﬁc, and in fact for 1-slice packetization
different bitrates are selected and two different and burst delivery actually result in more losses at a
background trafﬁc densities. In other experiments, lower cross trafﬁc.
the trend of Table 6’s results was repeated for other The effect of different background trafﬁc rates on
QLs. When aggregated to the mean input video rate the same sequence of Fig. 9, with the same default
of 0.187 Mb/s (Table 2) mean background trafﬁc of buffer setting from Table 1, is shown in Fig. 12 for a
1.8 Mb/s closely approaches the bottleneck link 1-slice per packet burst delivery method. A rate of
capacity of 2.0 Mb/s, whereas injecting background 1.5 Mb/s normally distributed does not stress the
trafﬁc of 1.5 Mb/s does not. As an alternative to a router and consequently the PSNR is close to that
normal pdf of PLs, a Pareto pdf with shape factor of the original encoded video stream. The result of
a ¼ 1.3 and location k ¼ 1 was applied to PIATs, the changes noted in the ﬁrst paragraph is an
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 83
Fig. 12. PSNR comparison for differing background trafﬁc rates with a normal pdf, with 1-slice per packet and burst delivery.
Fig. 13. PSNR comparison for differing background trafﬁc rates with a Pareto pdf, with 1-slice per packet and burst delivery.
improvement in the PSNR for the 1.8 Mbit/s degradation was particular marked over these
background trafﬁc rate. The results also differ frames with this background rate). Fig. 13 illustrates
because system ‘noise’ affecting the scheduling times the impact on the received PSNR of background
of both video source and background trafﬁc packets trafﬁc with the Pareto pdf at various input rates. As
means that, unlike in a simulation, the same burst might be expected by the similarity in packet losses,
patterns are not repeated across successive runs. there is no clear distinction between the PSNR in
However, the PSNR still remains relatively low for the face of the two higher background rates.
much of the sample sequence at a rate of 1.8 Mb/s Comparing the effect of 1.8 Mb/s background trafﬁc
and going beyond this rate to 1.9 Mb/s drastically between the two background trafﬁc pdfs, the PSNR
reduces quality during this particular sequence is lower for a Pareto density background for this
(visual inspection showed that coincidentally conﬁguration.
ARTICLE IN PRESS
84 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
4.4. Discussion short with just 9 or 18 packets (depending on
packetization method) along with any background
H.264/AVC is the ITU’s most recent video codec, trafﬁc packets. If the analysis was simply in terms of
and its picture segmentation scheme (slicing)  average packet rate per second, no burst pattern
builds upon the earlier H.263+(and H.263++) would be apparent, and without packet-level
standard. In H.264, a slice is normally formed from measurement of instantaneous bandwidth the dif-
macro-blocks in raster scan order, without formal fering overheads would not be visible. The case
restriction on the number of macro-blocks in a slice. study indicated that a two-slice packetization
Additionally, ﬂexible macro-block ordering (FMO) scheme results in a signiﬁcant improvement in
in the interests of error concealment is possible. PSNR over the conventional one-slice scheme for
Slice interleaving is also possible. In , both FMO compressed H.263+ video at the bit-rates tested. It
and slice interleaving are experimented with, also reinforced the need to avoid short frame bursts
although the impact on delay is not formally if consistently high-quality video is to be delivered.
analyzed and this might affect conversational Extensions to multiple-slice packing remain to be
applications such as videotelephony and video explored.
conferencing. Ideally in H.264, a slice should match
the MTU size, but the end-to-end MTU is very
difﬁcult to ﬁnd  and in the case of wireless References
networks could be as low as 255 B.
 A.K. Aggrawala, D. Sanghi, Network dynamics: an experi-
mental study of the Internet, in: IEEE Conference on Global
5. Conclusion Communication (GLOBECOM), 1992, pp. 782–786.
 P. Barford, M. Crovella, Generating representative Web
This paper has presented a PbP measurement workloads for network and server performance evaluation,
methodology, describing key metrics, packet cap- in: ACM Sigmetrics/Performance, July 1998, pp. 151–160.
 C. Borman, et al., RFC 2429—RTP Payload Format for the
ture and analysis tools, and network testbed 1998 Version of ITU-T Rec. H.263 Video (H.263+), 1998.
conﬁguration intended to model tight link re-  L. Breaslu, K. Estrin, D. Fall, S. Floyd, J. Heidemann, A.
sponses. While individual ﬁndings in this paper Helmy, P. Huang, S. McCanne, K. Varadhan, X. Ya, H. Yu,
have been anticipated in other works, the whole has Advances in network simulation, IEEE Comput. 33 (5)
not been previously collected into a methodology (2000) 59–67.
 Cisco Systems, Inc., LAN Design Guide for the Midmarket,
for video stream measurement and analysis. The San Jose, CA, 2000.
single message that emerges from this study is that ˆ ´
 G. Cote, F. Kossentini, Optimal intra coding of macro-
selection of a packetization scheme has a consider- blocks for robust (H.263) video communication over the
able impact on delivered PSNR, which is best Internet, Image Commun. 15 (1) (1999) 25–34.
 C. Dovrolis, P. Ramanathan, D. Moore, Packet-dispersion
revealed by a physical testbed and a precise
techniques and a capacity-estimation methodology, IEEE/
measurement methodology. ACM Trans. Networking 12 (6) (2004) 963–977.
Tests indicate that routers can become unreliable  C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D.
if the packet arrival rate is too great. One Moll, R. Rockell, T. Seely, C. Diot, Packet-level trafﬁc
consequence is that once the critical rate is reached measurements from the Sprint IP backbone, IEEE Network
then measurements of the throughput also become 17 (6) (2003) 6–17.
 R. Hill, B. Srinivasan, S. Pather, D. Niehaus, Temporal
unreliable, as the processor workload is too great. resolution and real-time extensions to Linux, Technical
There are practical implications as well, for video Report ITTC-FY98-TR-11510-03, University of Kansas,
streaming application programmers, who should 1998.
seek to reduce the output packet rate and to trafﬁc  G. Iannacone, M. May, C. Diot, Aggregate trafﬁc perfor-
managers who should take steps to avoid excessive mance with active queue measurement and drop from tail,
Comput. Commun. Rev. 31 (3) (2001) 4–13.
packet rates, for example by setting up additional  S. Kalidindi, M.J. Zekauska, Surveyor: an infrastructure of
routers. Measurement accuracy of experiments is Internet performance measurements, in: Proceedings of
not assured if the o.s. of the host machine is unable INET Conference, June 1999.
to support packet generation with the desired  S. Kieffer, W. Spicer, A. Schmidt, S. Lyszyk, Planning a
Data Center, Technical Report, Network System Architects,
resolution. This result has implications for those
Inc., Denver, CO, 2003.
measurement studies conducted without a RT o.s.  Y.J. Liang., J.G. Apostopoulos, B. Girod, Analysis of
In a case study, a burst pattern increased the packet loss for compressed video: does burst-length matter?,
probability of packet loss, even if the burst was ICASSP V (2001) 684–687.
ARTICLE IN PRESS
M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 85
 E. Masala, H. Yuang, K. Rose, J.C. De Martin, Rate-  G. Sackett, Cisco Router Handbook, second ed., McGraw-
distortion optimized slicing, packetization and coding for Hill, New York, 2000.
error resilient video transmission, in: Data Compression  H. Schulzrinne, IP networks, in: M.-T. Sun, A.R. Reibmen
Conference, 2004, pp. 182–191. (Eds.), Compressed Video Over Networks, Marcel Dekker,
 J. Micheel, I. Graham, N. Brownlee, The Auckland data set: New York, 2001, pp. 81–138.
An access link observed, in: Proceedings of 14th ITC  D. Sisalem, A. Wolisz, LDA+ TCP-friendly adaptation: a
Specialist Seminar, 2000. measurement and comparison study, in: 10th International
 A. Odlyzko, Data networks are lightly utilized, and will stay Workshop on Network and Operating Systems Support for
that way, Technical Report, ATT Labs, 1998. Digital Audio and Video (NOSDAV), June 2000.
 M. Paredes Farrera, M. Fleury, M. Ghanbari, Precision and  Tcpdump Manual Pages, available from /http://
accuracy of network trafﬁc generators for packet-by-packet www.tcpdump.org/tcpdump_man.htmlS.
trafﬁc analysis, in: Proceedings of IEEE TridentCom  D. Turaga, T. Chen, Fundamentals of video compression:
Conference, March 2006, pp. 32–37. H.263 as an example, in: M.-T. Sun, A.R. Reibmen (Eds.),
 M. Paredes Farrera, M. Fleury, M. Ghanbari, Router Compressed Video Over Networks, Marcel Dekker, New
response to trafﬁc at a bottleneck link, in: Proceedings of York, 2001, pp. 3–33.
IEEE TridentCom Conference, March 2006, pp. 38–46.  S.Y. Wang, C.L. Chou, C.H. Huang, Z.M. Yang, C.C. Chiou,
 V. Paxson, Measurement and analysis of end-to-end internet C.C. Lin, The design and implementation of the NCTUns 1.0
dynamics, Ph.D. Dissertation, University of California, network simulator, Comput. Networks 42 (2) (2003) 175–197.
Berkeley, 1997. ˆ ´
 S. Wenger, G. Cote, Using RFC2429 and H.263+ at low to
 V. Paxson, Automated packet trace analysis of TCP medium bit-rates for low-latency applications, in: PacketVi-
implementations, in: Proceedings of the ACM SIGCOMM deo Workshop, New York, April 1999.
‘97, France, September 1997, pp. 167–179.  S. Wenger, H.264/AVC over IP, IEEE Trans. Circuits
 V.J. Rebeiro, R.H. Riedl, R.G. Baraniuk, Locating available Systems Video Technol. 13 (7) (July 2003) 645–655.
bandwidth bottlenecks, IEEE Internet Comput. 8 (5) (2004)  S. Wenger, G. Knorr, J. Ott, F. Kossentini, Error resilience
34–41. support in H.263+, IEEE Trans. Circuits Systems Video
 R. Rejaie, M. Handley, D. Estrin, RAP: an end-to-end rate- Technol. 8 (7) (November 1998) 867–877.
based congestion control mechanism for realtime streams  M. Yajnik, J. Kurose, D. Towsley, Packet loss correlation in
in the Internet, in: IEEE INFOCOM’99, vol. 3, 1999, the Mbone multicast network, in: Proceedings of the Global
pp. 1337–1345. Internet Conference, 1996.
 R. Rejaie, On integration of congestion control with Internet  T. Zseby, J. Quittek, Standardizing IP trafﬁc ﬂow measure-
streaming applications, in: Proceedings of the PacketVideo ment at the IETF, in: Proceedings of the Second SCAMPI
Workshop, April 2003. Workshop, 2003.