ARTICLE IN PRESS



                                  Signal Processing: Image Communication 22 (2007) 69–85
             ...
ARTICLE IN PRESS
70                       M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–...
ARTICLE IN PRESS
                      M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 ...
ARTICLE IN PRESS
72                        M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69...
ARTICLE IN PRESS
                        M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8...
ARTICLE IN PRESS
74                      M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8...
ARTICLE IN PRESS
                    M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85   ...
ARTICLE IN PRESS
76                    M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
...
ARTICLE IN PRESS
                    M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85   ...
ARTICLE IN PRESS
78                    M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85
...
ARTICLE IN PRESS
                           M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 6...
ARTICLE IN PRESS
80                       M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–...
ARTICLE IN PRESS
                        M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–8...
ARTICLE IN PRESS
82                       M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–...
ARTICLE IN PRESS
                       M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85...
ARTICLE IN PRESS
84                  M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85

4...
ARTICLE IN PRESS
                         M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–...
Upcoming SlideShare
Loading in...5
×

Accurate packet-by-packet measurement and analysis of video ...

803

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
803
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Accurate packet-by-packet measurement and analysis of video ...

  1. 1. ARTICLE IN PRESS Signal Processing: Image Communication 22 (2007) 69–85 www.elsevier.com/locate/image Accurate packet-by-packet measurement and analysis of video streams across an Internet tight link M. Paredes Farrera, M. FleuryÃ, M. Ghanbari Electronic Systems Engineering Department, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK Received 15 March 2006; received in revised form 3 November 2006; accepted 14 November 2006 Abstract The response to a video stream across an Internet end-to-end path particularly depends on the performance at the path’s tight link, which can be examined in a simple network testbed. A packet-by-packet (PbP) measurement methodology applied to tight link analysis requires a real-time operating system to gain the desired timing resolution during traffic generation experiments. If, as is common for other purposes, the analysis was simply in terms of average packet rate per second, no burst pattern would be apparent, and without packet-level measurement of instantaneous bandwidth the differing overheads would not be apparent. An illustrative case study, based upon the H.263+video codec, confirms the advantage of the PbP methodology in determining received video characteristics according to packetization scheme, inter- packet gap, router response, and background traffic. Tests show that routers become unreliable if the packet arrival rate passes a critical threshold, one consequence of which is that reported router processor load also becomes unreliable. Video stream application programmers should take steps to reduce packet rates and aggregate packet rates may be reduced through network management. In the case study, a burst of just nine packets increased the probability of packet loss, while the video quality could be improved by packing at least two slices into a packet. The paper demonstrates that an appropriate packetization scheme has an important role in ensuring received video quality, but a physical testbed and a precise measurement methodology are needed to identify that scheme. r 2006 Elsevier B.V. All rights reserved. Keywords: Video streaming; Packet-by-packet analysis; Router response 1. Introduction applications, as opposed to 30% by Web traffic. Current video applications include streaming of pre- The most recent Sprint inter-packet (IP) backbone encoded video, the exchange of personal video clips survey [8] reported that 60% of traffic on some links (peer-to-peer streaming), and the delivery of sports is now generated by streaming or file sharing and news clips (possibly involving real-time (RT) generation of video). The Advanced Network and Services Surveyor Project [11] monitors the large- ÃCorresponding author. Tel.: +44 1026 872817; scale behavior of the Internet to determine its fax: +44 1026 872900. characteristics. A key finding [16] of this and other E-mail addresses: mpared@essex.ac.uk (M. Paredes Farrera), fleum@essex.ac.uk (M. Fleury), ghan@essex.ac.uk surveys is that most of the Internet core is relatively (M. Ghanbari). lightly loaded, with hotspots at intersections between 0923-5965/$ - see front matter r 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2006.11.004
  2. 2. ARTICLE IN PRESS 70 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 networks and at access points. This should not be every 33 ms and if individual Common Intermediate surprising, as over provisioning on packet networks Format (CIF)-sized pictures are broken into 18 is a common way [8] to protect the failure of network variable-sized slices, with one slice per IP, then the elements and support traffic growth. Therefore, to IP gap (IPG) is 1.9 ms if packets are generated at find likely causes of video quality degradation needs equal intervals. Video bandwidths can clearly be a careful examination of video stream behavior at the much higher than other flows [25], which is a tight or bottleneck link,1 which normally occurs at problem if the video traffic takes up a sizeable network boundaries. proportion of the bandwidth across a tight link. In this study, IP video traffic is measured in Video delivery should avoid regimes that result in isolation on a network testbed, with competing significant packet losses at the router queues. Internet traffic being represented as generated However, some observers note [23] that streaming background traffic across a critical bottleneck or applications often send a bursty stream, either for tight link. A tight link is the link with the least reasons of coding efficiency or when a none RT available bandwidth on an entire end-to-end path. operating system (o.s.) falls behind its schedule and In a study to test the ability of the STAB bandwidth releases a packet burst. This paper assumes that a probing tool [21] to locate thin links,2 the link most bursty stream is present at the tight link, possibly likely to be tight in terms of available bandwidth resulting from one session amongst a series of across a 15/16 hop Internet path was found to be parallel sessions generated by a server behind a fast located close to the edge of the end-to-end path, link. Other sessions may not generate bursty traffic which may well be a general conclusion. Modeling a and burstiness may be reduced if the server lies tight link in a network testbed says nothing about behind a slow link. In most reasonable o.s./driver overall delay or variation of delay across an entire implementations, the driver is relatively immune path. Nor does the model necessarily represent from scheduling, implying that, if application realistic background traffic. Measurements taken scheduling is not applied, that coding efficiency is directly from the Internet are needed for this a cause of packet bursts. purpose, especially if unusual traffic events are of Choice of metric is an important issue in video interest. However, the testbed approach is designed streaming research. For example, in [23] a standar- to stress the video stream as it passes through dized congestion control unit and standardized typical routers working at the limits of their set of reported metrics is proposed. In particular, performance range. The intention is to identify per-packet or instantaneous bandwidth is carefully potential problems that a video stream will en- defined in [23], as the author considers that ‘‘per- counter with a view to guiding the design of a packet bandwidth reporting is the most appropriate streaming application. Many simulation studies of for adaptive streaming applications’’, because of video stream congestion control, e.g. [22,26], use a its responsiveness to changes in available band- similar simple network topology to the one in this width. The work in [23] is of a theoretical nature, study. Consequentially, the testbed has also been whereas the measurement methodology developed used as a means of calibrating network simulators, on this paper’s network testbed could directly be although this topic is not pursued in this paper. transferred to a congestion control unit. TCP As video and audio packets are often closely (not generally used for video streaming) adjusts its spaced, loss correlation is a more serious problem window size according to the packet loss rate for video streaming than for other applications [33], (from dropped acknowledgements) and round-trip for which losses may appear as essentially random time (from packet timers), and it is likely that [1]. Video IP spacing in time is typically closer than higher-end routers adopt a similar strategy for audio (averaging around 20 ms) and unlike audio, queue management. The Cisco 7500 series invokes video packets vary in length. For example, for a input buffering upon finding the output queue is 30 frame/s stream, each frame must be delivered congested. The lower-end Cisco router employed in this paper allows the value of metrics such as 1 Strictly, a bottleneck link also includes the possibility of a packet loss rate and CPU usage to be reported narrow link, one with minimum capacity on a network path. The back to the user, and we have taken advantage of link with minimum capacity is not always the same as the link with least available bandwidth on a network path [7]. that in the experiments. In active queue manage- 2 A thin link has less available bandwidth than all those ment systems [10] other metrics are possible such as preceding it on the path. TCP goodput, TCP and User Datagram Protocol
  3. 3. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 71 (UDP) loss rate, queueing delay and consecutive 2. Traffic measurement methodology packet loss. In summary, the main objective of this paper is to 2.1. Network testbed configuration provide a measurement and analysis methodology that will aid the design of video streaming applica- Fig. 1 shows the simple network testbed em- tions. If analysis is only in terms of average packet ployed in the experiments. Clearly, the bottleneck rate per second, as might be used for network link is located in the 2 Mb/s serial link between the dimensioning and similar purposes, no burst pattern two Cisco 2600 routers. Otherwise, 100 Mb/s fast would be apparent, and without packet-level Ethernet links connect the testbed components to measurement of instantaneous bandwidth the vary- ensure no other source of congestion. The sender ing overheads would not be visible. In particular, machine hosts the traffic generator stg (Section the methodology is applied to the study of bottle- 2.3), while the Linux router monitors and stores neck links, and a case study on packetization traffic data flowing onto the bottleneck link with schemes for an H.263+ [28] encoded video stream tcpdump (Section 2.2). Likewise, the receiver demonstrates the value of the approach. The PSNR monitors and stores traffic data arriving at the of the delivered video stream is significantly receiver. To aid replication of the setup the improved if an appropriate packetization scheme configuration details are given in Table 1. Small- is selected. sized output queues are employed at routers to The remainder of this paper is organized as avoid delay to TCP packets, as TCP—of course, the follows. Section 2 details the video streaming dominant Internet protocol—relies on round-trip network testbed, the software tools employed, and times to govern flow control. Although, the video the measurement methodology. Section 3 illustrates stream generated in Section 4 is carried by UDP, the the need for a testbed by examining router response default buffer size settings of the Cisco router were at a tight link. Section 4 applies a video stream initially retained, as these would be the likely sizes in across the congested tight link and identifies the a realistic Internet environment. In the interests of role of appropriate packetization in improving accurate scheduling of packets in time, the Linux the delivered video quality, given the likely sender o.s. kernel is run with the KURT RT patch, router response. Finally, Section 5 presents some as further discussed in Section 2.3. Network conclusions. planners [12] commonly recommend to clients a 100 Mbit/s 100 Mbit/s 2 Mbit/s 100 Mbit/s Linux Sender Linux Router Cisco Cisco Linux Router_A Router_B Receiver Fig. 1. Simple network testbed employed to model the effect of a tight link on an Internet path with Cisco routers. Table 1 Network component settings Linux machines Routers CPU Pentium-IV 1.7 GHz Model Cisco 2600 NIC Intel Pro 10/100 Software Version 12.2 (13a) of Cisco IOS Queue policy Fast FIFO Queue policy FIFO Queue length (QL) 100 Queue length (QL) In 75, out 40 MTU 1500 MTU 1500 OS Linux kernel v. 2.4.9
  4. 4. ARTICLE IN PRESS 72 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 T1 or E1 link with bandwidth of, respectively, 1.544 may not be a cost-effective solution if used as part or 2.048 Mb/s between a LAN and the border of a congestion control unit. gateway (or a satellite link although with greater latency). Cost is also a significant consideration in 2.3. Traffic generator selection of a Cisco 2600 series router, and, hence, the same router is commonly found at the LAN In this paper’s methodology, video and audio edge in network plans [5]. A Cisco 2600 series router traffic patterns are classified into two types, has a Motorola MPC860 40 MHz CPU with a constant bit-rate (CBR) and variable bit-rate 20 MHz internal bus clock and a default 4 MB (VBR), with CBR traffic being defined as ‘‘a traffic DRAM packet memory [24]. pattern with a steady bit rate during a given time interval’’ and VBR as ‘‘a traffic pattern with a 2.2. Traffic monitoring changing bit rate during a given time interval’’. For the experiments, the traffic generator components The monitoring tool employed was the well- were stg and rtg, respectively, for sending and known software utility tcpdump [27] layered on the receiving traffic. Both are part of the NCTUns [29] libpcap monitoring library, which runs on an network simulator package. NCTUns is a simulator Ethernet interface set to promiscuous mode. Mon- that employs protocol stack re-entry to allow an itoring points were setup on the three Linux PCs in application to be accurately emulated. As the TCP/ the testbed. The tcpdump program may in some IP protocol stack is directly incorporated into circumstances [19] present ‘bugs’ and timing errors NCTUns, stg and rtg are easily transferred to that will affect the accuracy (nearness to the true work in a real network environment rather than value) and precision (consistency of measurement) within a simulator. The generator was modified4 to of timestamps. Example measures taken to avoid work on a normal Linux system (as NCTUns errors were: originally ran on the OpenBSD o.s.). One can create packet-by-packet (PbP) traffic patterns under Placing tcpdump on a separate Linux router the UDP by establishing the behavior of the packet rather than the Linux sender to avoid CPU length (PL) and IPG in every packet through an overload of the sender machine, which would input trace file. result in packet drops by the monitor process. A fundamental requirement of PbP analysis is the Only taking relative time measurements, thus, ability to create extremely precise and predictable avoiding the need to synchronize clocks. traffic patterns. Generating packets with hard Not using a high-speed link, which otherwise can deadlines requires a RT o.s. Accordingly, the Linux also lead to timestamp repetitions. kernel on the testbed machines was patched with the Monitoring the CPU load (Pentium-IV in Table Kansas University RT (KURT) kernel [9]. The 1) to avoid packet drops while monitoring with KURT kernel modification allows event scheduling tcpdump. with a resolution of tens of microseconds. KURT Making a sanity check to ensure that all packets decreases kernel latency by running Linux itself as a sent could be accounted for. background process controlled by a small RT executive (RTE). The desired accuracy was obtained Measurement errors under Linux can still occur if by running stg as a normal Linux process rather the time resolution is too brief. In order to establish than a specifically RT process under the control of confidence in the accuracy and precision of any the RTE. timestamps, tests were carried out at the monitoring Precise event scheduling was established in order points in order to find the time range that curtailed to perform reliable experiments. In live applications, errors in the measurements. Based on test results we the PIAT may vary from the desired value because found that a safe range for the experiments was for of application-level scheduling; prior network jitter time values greater than 90–100 ms. A DAG3 card on previous links; and smoothing by decoder [15] with Global Positioning System (GPS) module buffers. The experiments represent a ground truth, to create time stamps is an alternative solution that without these effects included. avoids tcpdump’s vagaries. However, a DAG card 4 The modified version can be downloaded from: http:// 3 DAG is not an acronym. privatewww.essex.ac.uk/$mpared/perf-tools/srtg.tgz.
  5. 5. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 73 2.4. Traffic metrics their IP flow characteristics, as recommended by the Internet Engineering Task Force (IETF) [34]. An For the PbP video experiments, three metrics IP-flow is defined as a group of packets that share selected were: PL, Packet Inter-Arrival Time some or all the following characteristics: source and (PIAT),5 and Packet Throughput (PT). Surpris- destination address; source and destination ports; ingly, PL is not widely utilized as a metric in and protocol. Tcpflw can read tcpdump and ns-2 measurement studies and traffic analysis, although [4] tracefiles. Every flow is visualized linearly (by this metric provides an insight into common time) and by a frequency histogram. Second-order application traffic patterns. In an encoded video statistics are also obtained for every metric. streaming session, the PL varies depending on the packetization scheme employed and headers added by the protocol used to transmit the video session. 3. Testbed characterization Apart from IP and UDP headers, a Real-time Protocol (RTP) or equivalent header [25] is added at Two scenarios established firstly the accuracy that the application layer. The PIAT is another impor- the traffic generator was capable of when generating tant performance metric when observing packet packets and secondly the response to traffic of the spacing during a video streaming session. The PIAT Cisco routers on either side of the bottleneck link. metric is one of the most sensitive to network condition changes: transmission delays, queuing delays, packet loss, packets routed by different 3.1. Traffic generator accuracy routes, fragmentation and other hardware and software processes involved during packet transfer. The stg traffic generator was operated under Hence, it is not common to observe regular patterns UDP, and in tracefile mode. For this test, the two for this metric. Finally, as mentioned in Section 1, it Cisco routers and the serial link of Fig. 1 were is important [23] to define PT carefully. It represents removed so that the Linux sender was connected to the throughput arising from one packet. For the Linux router, which in turn was connected to application-level studies the PT affects a router’s the Linux receiver over the 100 Mb/s link. PIAT response and, hence is more relevant than the measurements were compared on a normal Linux available bandwidth. The PT was calculated in the kernel and then the KURT patched Linux kernel, following way. If a pair of packets is observed, becoming an RT kernel. The traffic pattern was the first packet’s length is divided by the time CBR. The PL was fixed at 60 byte (B). The traffic difference between the second and first packet, i.e. generator generated streams of 2-min duration (a by the ‘PIAT’. This can be expressed for packet stream per data point in Fig. 2), with the source number n at arrival time tn by Eq. (1). PIAT varying from 1 Â 10À4 to 1 Â 10À1 s. There- fore, each of the streams resulted in a minimum of PLn PLn PTn ¼ ¼ , (1) 1200 packets transmitted. (Note also that the tnþ1 À tn PIATn estimated plot in Fig. 2 is the ideal measured in which the fti g; i ¼ 1; 2; . . . ; n are arrival times at PIAT.) From Fig. 2, observe that the PIAT the receiver. Again, although used in Paxson’s well- measured for a normal kernel is a constant value known tcptrace tool [20], this metric is otherwise of 0.02 s for any value fed into the traffic generator not common in traffic analysis. less than 0.02 s. This implies that even if the traffic generator is instructed to deliver packets with (say) a 0.01 PIAT it will only be able to send packets at 2.5. Analysis tool 0.02 s. The RT kernel improves the accuracy and stability of generated UDP CBR traffic. Detailed In order to analyze the tcpdump tracefiles, an analysis shows that, over acceptable PIAT measure- especially designed tool called tcpflw was prepared. ments, the error ranged from 0.15% to 13% for the Tcpflw is actually applicable to UDP and not RT kernel, while for equivalent measurements with simply TCP. Tcpflw categorizes traffic based on the normal kernel the error is considerably greater. 5 Note that as elsewhere in the literature, PIAT refers to the Hence, the RT kernel was employed for the video desired PIAT as generated at the video packet source, and is experiments. Further detailed analysis of the beha- synonymous with IPG. vior of the RT kernel and stg can be found in [17].
  6. 6. ARTICLE IN PRESS 74 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 Fig. 2. PIAT replication by the traffic generator using a normal and RT kernel for measured (m) and estimated (e) PIAT, with limits to resolution of each kernel indicated. 3.2. Router response of the packet rate by simply counting packets over the 2-min duration, which procedure increases The response of the two routers to injected traffic accuracy. The CPU load reading was taken over in the testbed network of Fig. 1 will affect the the middle 1 min, ensuring that sufficient packets measurements. In the testbed, Router A works as a had passed through the router’s buffers in the initial traffic shaper maximally receiving at its Ethernet 30 s. If other measurement metrics are employed interface at most 100 Mb/s and then reducing that other than packet rate then a misleading impression traffic to 2 Mb/s at the serial interface. In order to results, as was analyzed in [18]. reduce the rate, the router must drop any excessive In Fig. 3, the processing load at the router sharply packets, usually at its output queue, as this is where increases when the arrival rate is in the region of the bandwidth constriction occurs. The CPU 4000 packet/s (PIAT ¼ 0.00025), for all PLs6 from processing load was recorded for both routers by 65 to 1500 B. Based on this result it appears that the setting ‘‘show processes cpu’’ taking only the ‘‘cpu CPU load is largely dependent upon the packet rate utilization every minute’’ in the router configura- and not on the PL. After the 4000 packet/s break tion. The ideal setting for this analysis might be an point the router behaves erratically. Apparently the average over a period shorter than 1 min to increase CPU load decreases with a high packet arrival rate. the measurement resolution. However, more fre- However, this is not the case, as the router is under quent timings actually put more stress upon the such stress that CPU performance reporting be- routers. The desired outcomes were: the traffic comes erroneous. Other symptoms of this break- conditions when the router becomes unstable; and down are reported ‘failure on the serial link’ and the best packet size for Internet applications based other alarms. Further characterization of the router on router response to UDP traffic. The following behavior is restricted as Cisco’s IOS o.s. is Ethernet frame sizes were generated: 65, 90, 130, proprietary software. 1200 and 1500 B. Thirty streams of 2-min duration The queuing policy in router A of Fig. 1 is First- were generated for each frame/packet size. Each In-First-Out (FIFO) or drop tail, which means that stream had a constant PIAT. Then, the range when the output queue is full the router will become 1 Â 10À4 to 1Â10À1 s was divided in equal portions across the 30 streams for each frame/packet size. 6 The number of data points for the PL of 500 B is reduced for For clarity, the 30 data points are not marked on compatibility with later Fig. 10. No difference in behavior is the plots of Fig. 3. In fact, measurements were taken masked by this change.
  7. 7. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 75 Fig. 3. Router A packet rate response with different PL. Fig. 4. Packet loss with increasing packet rate. busy discharging packets. Therefore, small packet payload, discounting headers (IP/UDP/RTP bursts can trigger the same response as that created 20+8+12 ¼ 40 B), than a typical VoIP (30 B). by a continuous rate of around 4000 packet/s and Therefore, in practice the smaller frame size is above. Packet loss is largely independent of CPU unlikely to occur and certainly will not normally load, since, as Fig. 4 illustrates, the loss rates occur for video streams, except for fragmented increase linearly for a given PL. (The plot for 1200 B packets or feedback messages. at the resolution of Fig. 4 is superimposed on that In summary, the CPU load response on typical for a PL of 1500 B.) (Cisco 2600) Internet routers was measured to It is likely that Cisco 2600 routers are provisioned determine how bit rate, PL, and packet rate affect to cope with Voice-over-IP (VoIP) traffic. However, the router’s response. In making these observations, the 65 B frame size plot in Fig. 3 has a smaller no special weakness of Cisco 2600 routers is
  8. 8. ARTICLE IN PRESS 76 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 implied, as these routers are perfectly suitable for multi-slice packetization may appear as an intuitive their tasks. The traffic characteristics determine the improvement (as it reduces header overhead), router’s CPU load and hence: because of the possibility of packet loss bursts, there is uncertainty as to the relative advantages of The router CPU load response is largely related one scheme or another. Although not explored in to the packet rate and not to the PL or bit rate, this paper, when a slice exceeds the frame size, or and it is this aggregate packet rate that should be when two slices exceed the frame size and so on is an checked in network management. For a given issue. The loss of part of a slice will nullify the PL, packet loss is largely independent of CPU successful reception of the other part. In [13], the load being linearly related to packet rate. burst length is also identified (for the H.264 codec) The recorded CPU load response may be as a source of degradation, as much as the average symptomatic of a general processing bottleneck, packet loss rate. which may or may not be attributed to other sub- In this case study, a VBR H.263+ coded video processors such as the serial interface sub- sequence represented the test video stream. Every processor. CIF frame was split into the usual 18 macro-block In the experiments, after the 4000 packet/s point row-wise slices, to prevent the propagation of the router become unstable for the default router channel errors within a picture by providing configuration in use. However, practical video synchronization markers, and then transmitted streaming applications are unlikely to require a using one or two slices per packet. If slices were to sustained rate of 4000 packet/s or above, be split between packets then the presence of the although small packet bursts may approach this slice header in one of the packets and the use of rate. variable length coding would cause more data to be The best traffic conditions were found when the lost than present in any single packet. The method PL was larger, PL (1500 B) for standard Ethernet of delivery was varied, either per frame packet frames, because larger packets require smaller bursts, or uniform (constant) IPG. packet rates to transmit data, which application programmers should bear in mind. 4.1. Video characteristics 4. PbP analysis applied to video streaming Table 2 shows the test video characteristics, which was an ‘‘Interview’’ recording. This recording is a The packetization method used to stream video ‘‘head and shoulders’’ video sequence with CIF on an Internet plays a vital role in controlling format that results in suitable data for the desired packet loss and, hence, received video quality. This packetization lengths without causing packet frag- in turn will be affected by the likely router response. mentation. The frame rate was 30 frame/s, resulting Some studies of packetization schemes for the in a 1-slice scheme generating a mean 540 packet/s, H.263+video codec, for example [14], tend to and a 2-slice scheme generating a mean 270 packet/ assume the one slice per packet recommendation s. Although, the mean rate is below the maximum contained in RFC 2429 Section 3 [3]. Similarly, in rate in Fig. 3 of 4000 packet/s, nevertheless because [6] single spatial row of macro-blocks or Group of of the burstiness of the source, small packet bursts Blocks (GOB) is assigned per packet, when the easily exceed that rate. For example in Fig. 5, for optional H.263 Annex K slice-structured mode is frame 298 of the sequence, an instantaneous rate of not applied. We have set a slice to correspond to a 115 384 packet/s occurs. A 17-B header was also GOB, which is similar to the MPEG-2 definition of added to each packet to keep track of the frame a slice. However, RFC 2429 points out the possibility of rectangular slices and other arrange- ments to aid error concealment [30]. In [30], all even Table 2 ‘‘Interview’’ encoded video stream characteristics GOBs and all odd GOBs are packed into two different packets (called slice interleaving) at QCIF Average bit-rate (kb/s) 187 resolution. We have assumed a simple (perhaps Frame size (CIF) 352 Â 288 oversimplified) packetization strategy, but the find- Frame rate (f/s) 30 Video duration (s) 60 ings could be equally well applied to more Intra refresh period (f) 10 sophisticated strategies. Although, application of a
  9. 9. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 77 Fig. 5. Illustrative packet burst showing timings and packet lengths. sequence number, media type, frame number, 4.2. Measurement results packet number and timestamp. All these fields are used to re-construct the video at the receiver side. Statistics were collected at the network level: PbP (An RTP header, which serves a similar purpose with tcpdump and tcpflw; and at the video level: although with reduced functionality, would be 12 B with the decoder and encoder information. It is of in size. Cisco 2600 series routers do not support interest to observe how packet loss can affect discriminatory treatment of RTP against UDP, objective video quality, luminance peak-signal-to- although higher-end Cisco routers do, as do some noise ratio (PSNR) taken on a frame-by-frame Ethernet drivers which perform traffic analysis.) basis, comparing source with received decoded The 10-frame refresh period implies an Intra (I) frame. Table 4 presents packet losses for the test picture inserted into the Predicted (P) pictures. No Interview video analyzed by picture type. There was B-pictures were used in this experiment. at the very least a twofold reduction in total packet Figs. 6(a) and (b) show the PL frequency losses, when using the two-slice rather than the one- distributions for, respectively, the one- and two- slice scheme. In part, this was due to the reduced slice schemes (as taken from encoder packet header header overhead, illustrated by the constant offset information). (The ‘Ethernet’ bars are simply offset between the measured one- and two-slice scheme bit by 59 B representing the extra UDP and the frame rates in Fig. 8 for a uniform delivery method. Table header overhead.) 4 also shows that the uniform method reduced Two delivery techniques were applied in the packet losses, by 44% or 58%, depending on the experiments: (1) Uniform: IPG of 1/540 s for all packetization scheme. We postulate that this effect packets in a frame, and (2) Burst: IPG of 1/30 s. In occurs due to router queue behavior when faced order to test video delivery under difficult condi- with a sudden rush of packets. Notice that the more tions, for all experiments in this section we added important I-pictures are more favorably treated by background traffic at 1.8 Mb/s with a normal the two-slice scheme. probability density function (pdf) (mean PL Now compare the best- and the worst-case 1000 B, with standard deviation of 100 B) and performance for this video communication. Fig. 9 constant IPG of 0.004444 s. In Fig. 7(a) and (b), plots the PSNR on a frame-by-frame basis of the observe the markedly different PL patterns between worst (one-slice burst) and best (two-slice uniform) the two schemes. The larger packets (against the y- cases in terms of total packet loss. The plot marked axis in Fig. 7(b) are caused by the leading I-picture. ‘‘Source’’ is the PSNR of the source video clip The video statistics analyzed by picture type for the without any loss but after passing through the different experiments are shown in Table 3. codec. The best-case plot consistently tracks the
  10. 10. ARTICLE IN PRESS 78 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 Fig. 6. PL frequency distribution comparison for (a) 1- and (b) 2-slice as at the encoder, and output to the network. source PSNR curve. The behavior of the one-slice ever, it is possible that appropriate error conceal- burst PSNR plot is erratic and most of the time ment techniques (not present in H.263), when remains below the best-case plot, in some frames applied, would significantly improve the quality of being 20 dB below the source PSNR. At these the one-slice plot. PSNR levels, the one-slice video would be unwatch- able. Appropriate error resilience techniques for 4.3. Further measurement results H.263 [32] streams were applied, namely selection of H.263 Annex K slice structured mode, Annex R To check the impact of changing router queue independent segment (slice) decoding mode, and length (QL), and differing background traffic, a Annex N reference picture selection mode. How- further set of experiments were conducted. Of
  11. 11. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 79 Fig. 7. PL distribution in time for (a) 1-slice uniform and (b) burst delivery schemes. Table 3 practical necessity, the experimental setup varied, Slice structure characteristics by I- and P-pictures although the measurement methodology, equip- ment, and compressed video source remained the 1-Slice I 1-Slice P 2-Slice I 2-Slice P same. Linux kernel version 2.6.18 was installed Total slices (n) 3240 29 160 1620 14 580 allowing timing by means of the Hrtimer from Min. size (B) 159 6 345 13 Linutronix [2], which is a successor to UTIME Max. size (B) 750 178 1123 346 employed by KURT. As in some of the experiments Mean size (B) 281.5 16.9 563.1 33.7 higher background traffic rates are generated, Std. dev. (B) 89.3 18.9 163.2 36.7 Median (B) 266 11 544 23 background traffic generation was delegated to a second Linux sender, allowing the original Linux
  12. 12. ARTICLE IN PRESS 80 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 Table 4 Packet loss numbers by slice and delivery method, and picture type 1-Slice 2-Slice Burst Uniform Burst Uniform I P I P I P I P Packet loss (PLoss) 538 7992 304 4456 259 2946 41 1255 PLoss (%) 16.7 27.4 9.4 15.3 16.0 20.2 2.5 8.6 Fig. 8. Bandwidth comparison (measurement at 1 s intervals) for 1-slice and 2-slice schemes. Fig. 9. PSNR comparison for the worst- and best-case packet loss schemes, over the range of frame numbers 800–900.
  13. 13. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 81 Fig. 10. Router A packet rate response with PL ¼ 500 and differing QL. Fig. 11. Packet loss with increasing packet rate and differing QL. sender to specialize in video traffic generation. The default buffer size refers to the Cisco default setting Cisco routers’ o.s. was upgraded to IOS C2600-I-M, of Table 1. The buffer size was then stepped at version 12.2(13a), release fc2. intervals of 100 packets until the pattern became In Fig. 10, the same experiment as recorded in apparent. As the buffer size is increased the bit rate Fig. 3 was repeated7 but with a fixed PL of 500 B. at which reporting becomes unstable (see Section This PL is close to the Maximum Transport Unit 3.2) is lowered. The onset of this behavior also (MTU) that must be supported by all routers occurs at a lower recorded CPU load. We surmise without subsequent fragmentation. In Fig. 10, the that management of the buffer places a greater load on the CPU itself or a sub-processor. Fig. 11 7 The number of data points is less than that of Fig. 3 but the demonstrates that packet loss rate is largely essential response pattern is retained. independent of buffer size and CPU load, as the
  14. 14. ARTICLE IN PRESS 82 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 Table 5 Packet loss by queue length with 1.8 Mb/s normal pdf background traffic QL (packets) 75 100 200 300 400 500 600 700 800 900 1000 1-Slice burst PLoss (packets) 6845 5236 4011 4400 3894 3805 3609 3337 3387 2819 2938 2-Slice uniform PLoss (packets) 657 629 674 425 472 373 313 291 224 151 113 Table 6 Packet loss with normal and Pareto background traffic pdfs at various mean rates, with QL ¼ 200 Background traffic Normal 1.5 Mb/s Normal 1.8 Mb/s Normal 1.9 Mb/s Pareto 1.5 Mb/s Pareto 1.8 Mb/s Pareto 1.9 Mb/s 1-Slice burst PLoss (packets) 1 3880 6593 2584 8194 8096 2-Slice uniform PLoss (packets) 0 608 1484 269 1688 1790 QL only has a temporary effect in stemming packet by a linear scaling so that the mean was at desired losses. In Fig. 11, the resolution of the plot does not mean bitrates. The intention of applying a Pareto show small variations in loss numbers. pdf to PIATs was to judge the effect of a different Table 5 records packet losses recorded when packet arrival pattern upon the router. No claim is altering both the input and output buffers (on both made that this distribution mimics the effect of Cisco routers) to the given QL. Prior experiments typical Web server traffic, for which an on-off established that altering the input buffer size (Table model with Pareto distribution of burst length has 1) alone did not impact on packet loss numbers. The been applied [2]. same background traffic as in Section 4.2, 1.8 Mb/s From Table 6, a normal pdf background at mean normal pdf, was injected alongside the video stream. rate 1.5 Mb/s results in just one packet loss with one Setting the QLs to 75 packets equates to the slice per packet. Packet losses for normal pdf experiments in Section 4.2, when it will be seen background at mean rate 1.8 Mb/s differ somewhat from Table 5 that the packet loss numbers are from those in Table 4, as is usual due to system somewhat reduced on Table 4’s figures, resulting effects such as process scheduling. The main effect from the changes described in the previous para- of introducing another density is that packet losses graph. A check with input QL set to 75 and output are much greater, including those for a mean rate of set to 100, exactly as in Table 1, did not appreciably 1.5 Mb/s, which indicates the burstiness of the alter the loss numbers. Clearly, when the QL is background traffic source. Burstiness also affects increased then there is a decreasing trend in packet the relative packet losses at rates of 1.8 and 1.9 Mb/s losses for both the packetization methods. which are similar in the presence of Pareto back- In Table 6, for a QL of 200 packets, three ground traffic, and in fact for 1-slice packetization different bitrates are selected and two different and burst delivery actually result in more losses at a background traffic densities. In other experiments, lower cross traffic. the trend of Table 6’s results was repeated for other The effect of different background traffic rates on QLs. When aggregated to the mean input video rate the same sequence of Fig. 9, with the same default of 0.187 Mb/s (Table 2) mean background traffic of buffer setting from Table 1, is shown in Fig. 12 for a 1.8 Mb/s closely approaches the bottleneck link 1-slice per packet burst delivery method. A rate of capacity of 2.0 Mb/s, whereas injecting background 1.5 Mb/s normally distributed does not stress the traffic of 1.5 Mb/s does not. As an alternative to a router and consequently the PSNR is close to that normal pdf of PLs, a Pareto pdf with shape factor of the original encoded video stream. The result of a ¼ 1.3 and location k ¼ 1 was applied to PIATs, the changes noted in the first paragraph is an
  15. 15. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 83 Fig. 12. PSNR comparison for differing background traffic rates with a normal pdf, with 1-slice per packet and burst delivery. Fig. 13. PSNR comparison for differing background traffic rates with a Pareto pdf, with 1-slice per packet and burst delivery. improvement in the PSNR for the 1.8 Mbit/s degradation was particular marked over these background traffic rate. The results also differ frames with this background rate). Fig. 13 illustrates because system ‘noise’ affecting the scheduling times the impact on the received PSNR of background of both video source and background traffic packets traffic with the Pareto pdf at various input rates. As means that, unlike in a simulation, the same burst might be expected by the similarity in packet losses, patterns are not repeated across successive runs. there is no clear distinction between the PSNR in However, the PSNR still remains relatively low for the face of the two higher background rates. much of the sample sequence at a rate of 1.8 Mb/s Comparing the effect of 1.8 Mb/s background traffic and going beyond this rate to 1.9 Mb/s drastically between the two background traffic pdfs, the PSNR reduces quality during this particular sequence is lower for a Pareto density background for this (visual inspection showed that coincidentally configuration.
  16. 16. ARTICLE IN PRESS 84 M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 4.4. Discussion short with just 9 or 18 packets (depending on packetization method) along with any background H.264/AVC is the ITU’s most recent video codec, traffic packets. If the analysis was simply in terms of and its picture segmentation scheme (slicing) [31] average packet rate per second, no burst pattern builds upon the earlier H.263+(and H.263++) would be apparent, and without packet-level standard. In H.264, a slice is normally formed from measurement of instantaneous bandwidth the dif- macro-blocks in raster scan order, without formal fering overheads would not be visible. The case restriction on the number of macro-blocks in a slice. study indicated that a two-slice packetization Additionally, flexible macro-block ordering (FMO) scheme results in a significant improvement in in the interests of error concealment is possible. PSNR over the conventional one-slice scheme for Slice interleaving is also possible. In [31], both FMO compressed H.263+ video at the bit-rates tested. It and slice interleaving are experimented with, also reinforced the need to avoid short frame bursts although the impact on delay is not formally if consistently high-quality video is to be delivered. analyzed and this might affect conversational Extensions to multiple-slice packing remain to be applications such as videotelephony and video explored. conferencing. Ideally in H.264, a slice should match the MTU size, but the end-to-end MTU is very difficult to find [31] and in the case of wireless References networks could be as low as 255 B. [1] A.K. Aggrawala, D. Sanghi, Network dynamics: an experi- mental study of the Internet, in: IEEE Conference on Global 5. Conclusion Communication (GLOBECOM), 1992, pp. 782–786. [2] P. Barford, M. Crovella, Generating representative Web This paper has presented a PbP measurement workloads for network and server performance evaluation, methodology, describing key metrics, packet cap- in: ACM Sigmetrics/Performance, July 1998, pp. 151–160. [3] C. Borman, et al., RFC 2429—RTP Payload Format for the ture and analysis tools, and network testbed 1998 Version of ITU-T Rec. H.263 Video (H.263+), 1998. configuration intended to model tight link re- [4] L. Breaslu, K. Estrin, D. Fall, S. Floyd, J. Heidemann, A. sponses. While individual findings in this paper Helmy, P. Huang, S. McCanne, K. Varadhan, X. Ya, H. Yu, have been anticipated in other works, the whole has Advances in network simulation, IEEE Comput. 33 (5) not been previously collected into a methodology (2000) 59–67. [5] Cisco Systems, Inc., LAN Design Guide for the Midmarket, for video stream measurement and analysis. The San Jose, CA, 2000. single message that emerges from this study is that ˆ ´ [6] G. Cote, F. Kossentini, Optimal intra coding of macro- selection of a packetization scheme has a consider- blocks for robust (H.263) video communication over the able impact on delivered PSNR, which is best Internet, Image Commun. 15 (1) (1999) 25–34. [7] C. Dovrolis, P. Ramanathan, D. Moore, Packet-dispersion revealed by a physical testbed and a precise techniques and a capacity-estimation methodology, IEEE/ measurement methodology. ACM Trans. Networking 12 (6) (2004) 963–977. Tests indicate that routers can become unreliable [8] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. if the packet arrival rate is too great. One Moll, R. Rockell, T. Seely, C. Diot, Packet-level traffic consequence is that once the critical rate is reached measurements from the Sprint IP backbone, IEEE Network then measurements of the throughput also become 17 (6) (2003) 6–17. [9] R. Hill, B. Srinivasan, S. Pather, D. Niehaus, Temporal unreliable, as the processor workload is too great. resolution and real-time extensions to Linux, Technical There are practical implications as well, for video Report ITTC-FY98-TR-11510-03, University of Kansas, streaming application programmers, who should 1998. seek to reduce the output packet rate and to traffic [10] G. Iannacone, M. May, C. Diot, Aggregate traffic perfor- managers who should take steps to avoid excessive mance with active queue measurement and drop from tail, Comput. Commun. Rev. 31 (3) (2001) 4–13. packet rates, for example by setting up additional [11] S. Kalidindi, M.J. Zekauska, Surveyor: an infrastructure of routers. Measurement accuracy of experiments is Internet performance measurements, in: Proceedings of not assured if the o.s. of the host machine is unable INET Conference, June 1999. to support packet generation with the desired [12] S. Kieffer, W. Spicer, A. Schmidt, S. Lyszyk, Planning a Data Center, Technical Report, Network System Architects, resolution. This result has implications for those Inc., Denver, CO, 2003. measurement studies conducted without a RT o.s. [13] Y.J. Liang., J.G. Apostopoulos, B. Girod, Analysis of In a case study, a burst pattern increased the packet loss for compressed video: does burst-length matter?, probability of packet loss, even if the burst was ICASSP V (2001) 684–687.
  17. 17. ARTICLE IN PRESS M. Paredes Farrera et al. / Signal Processing: Image Communication 22 (2007) 69–85 85 [14] E. Masala, H. Yuang, K. Rose, J.C. De Martin, Rate- [24] G. Sackett, Cisco Router Handbook, second ed., McGraw- distortion optimized slicing, packetization and coding for Hill, New York, 2000. error resilient video transmission, in: Data Compression [25] H. Schulzrinne, IP networks, in: M.-T. Sun, A.R. Reibmen Conference, 2004, pp. 182–191. (Eds.), Compressed Video Over Networks, Marcel Dekker, [15] J. Micheel, I. Graham, N. Brownlee, The Auckland data set: New York, 2001, pp. 81–138. An access link observed, in: Proceedings of 14th ITC [26] D. Sisalem, A. Wolisz, LDA+ TCP-friendly adaptation: a Specialist Seminar, 2000. measurement and comparison study, in: 10th International [16] A. Odlyzko, Data networks are lightly utilized, and will stay Workshop on Network and Operating Systems Support for that way, Technical Report, ATT Labs, 1998. Digital Audio and Video (NOSDAV), June 2000. [17] M. Paredes Farrera, M. Fleury, M. Ghanbari, Precision and [27] Tcpdump Manual Pages, available from /http:// accuracy of network traffic generators for packet-by-packet www.tcpdump.org/tcpdump_man.htmlS. traffic analysis, in: Proceedings of IEEE TridentCom [28] D. Turaga, T. Chen, Fundamentals of video compression: Conference, March 2006, pp. 32–37. H.263 as an example, in: M.-T. Sun, A.R. Reibmen (Eds.), [18] M. Paredes Farrera, M. Fleury, M. Ghanbari, Router Compressed Video Over Networks, Marcel Dekker, New response to traffic at a bottleneck link, in: Proceedings of York, 2001, pp. 3–33. IEEE TridentCom Conference, March 2006, pp. 38–46. [29] S.Y. Wang, C.L. Chou, C.H. Huang, Z.M. Yang, C.C. Chiou, [19] V. Paxson, Measurement and analysis of end-to-end internet C.C. Lin, The design and implementation of the NCTUns 1.0 dynamics, Ph.D. Dissertation, University of California, network simulator, Comput. Networks 42 (2) (2003) 175–197. Berkeley, 1997. ˆ ´ [30] S. Wenger, G. Cote, Using RFC2429 and H.263+ at low to [20] V. Paxson, Automated packet trace analysis of TCP medium bit-rates for low-latency applications, in: PacketVi- implementations, in: Proceedings of the ACM SIGCOMM deo Workshop, New York, April 1999. ‘97, France, September 1997, pp. 167–179. [31] S. Wenger, H.264/AVC over IP, IEEE Trans. Circuits [21] V.J. Rebeiro, R.H. Riedl, R.G. Baraniuk, Locating available Systems Video Technol. 13 (7) (July 2003) 645–655. bandwidth bottlenecks, IEEE Internet Comput. 8 (5) (2004) [32] S. Wenger, G. Knorr, J. Ott, F. Kossentini, Error resilience 34–41. support in H.263+, IEEE Trans. Circuits Systems Video [22] R. Rejaie, M. Handley, D. Estrin, RAP: an end-to-end rate- Technol. 8 (7) (November 1998) 867–877. based congestion control mechanism for realtime streams [33] M. Yajnik, J. Kurose, D. Towsley, Packet loss correlation in in the Internet, in: IEEE INFOCOM’99, vol. 3, 1999, the Mbone multicast network, in: Proceedings of the Global pp. 1337–1345. Internet Conference, 1996. [23] R. Rejaie, On integration of congestion control with Internet [34] T. Zseby, J. Quittek, Standardizing IP traffic flow measure- streaming applications, in: Proceedings of the PacketVideo ment at the IETF, in: Proceedings of the Second SCAMPI Workshop, April 2003. Workshop, 2003.

×