Adaptive Video Streaming: Pre-encoded MPEG-4 with Bandwidth ...
Adaptive Video Streaming:
Pre-encoded MPEG-4 with Bandwidth Scaling
A. Balk, M. Gerla, and M. Sanadidi
Network Research Laboratory, UCLA, Los Angeles, CA 90024 USA
abalk, gerla, medy @cs.ucla.edu ¡
Department of Informatics and Communication, Universit` degli Studi di Milano
Abstract the streamed media in the Internet. The quality of the content
delivered by these programs varies, but they are generally asso-
The increasing popularity of streaming video is a cause for ciated with low resolution, small frame size video. One reason
concern for the stability of the Internet because most streaming these contemporary streaming platforms exhibit limited quality
video content is currently delivered via UDP, without any end- streaming is their inability to dynamically adapt to trafﬁc condi-
to-end congestion control. Since the Internet relies on end sys- tions in the network during a streaming session. Although the
tems implementing transmit rate regulation, there has recently aforementioned applications claim to be adaptive, there is no
been signiﬁcant interest in congestion control mechanisms that conclusive evidence as to what degree of adaptivity they employ
are both fair to TCP and effective in delivering real-time streams. as they are proprietary, closed software . Their video streams
In this paper we design and implement a protocol that at- are usually delivered via UDP with no transport layer conges-
tempts to maximize the quality of real-time MPEG-4 video tion control. A large-scale increase in the amount of streaming
streams while simultaneously providing basic end-to-end con- audio/video trafﬁc in the Internet over a framework devoid of
gestion control. While several adaptive protocols have been pro- end-to-end congestion control will not scale, and could poten-
posed in the literature [28, 37], the unique feature of our proto- tially lead to congestion collapse.
col, the Video Transport Protocol (VTP), is its use of receiver UDP is the transport protocol of choice for video streaming
side bandwidth estimation. Such estimation is transmitted to the platforms mainly because the fully reliable and strict in-order
sender and enables it to adapt to network conditions by altering delivery semantics of TCP do not suit the real-time nature of
its sending rate and the bitrate of the transmitted video stream. video transmission. Video streams are loss tolerant and delay
We deploy our protocol in a real network testbed and extensively sensitive. Retransmissions by TCP to ensure reliability intro-
study its behavior under varying link speeds and background duce latency in the delivery of data to the application, which in
trafﬁc proﬁles using the FreeBSD Dummynet link emulator . turn leads to degradation of video image quality. Additionally,
Our results show that VTP delivers consistent quality video in the steady state behavior of TCP involves the repeated halving
moderately congested networks and fairly shares bandwidth with and growth of its congestion window, following the well known
TCP in all but a few extreme cases. We also describe some of Additive Increase/Multiplicative Decrease (AIMD) algorithm.
the challenges in implementing an adaptive video streaming pro- Hence, the throughput observed by a TCP receiver oscillates
tocol. under normal conditions. This presents another difﬁculty since
video is usually streamed at a constant rate (VTP streams are ac-
tually piecewise-constant). In order to provide the best quality
1 Introduction video with minimal buffering, a video stream receiver requires
relatively stable and predictable throughput.
As the Internet continues to grow and mature, transmission Our protocol, the Video Transport Protocol (VTP), is de-
of multimedia content is expected to increase and compose a signed with the primary goal of adapting an outgoing video
large portion of the overall data trafﬁc. Film and television dis- stream to the characteristics of the network path between sender
tribution, digitized lectures, and distributed interactive gaming and receiver. If it determines there is congestion, the VTP sender
applications have only begun to be realized in today’s Internet, will reduce its sending rate and the video encoding rate to a level
but are rapidly gaining popularity. Audio and video streaming the network can accommodate. This enables VTP to deliver a
capabilities will play an ever-increasing role in the multimedia- larger portion of the overall video stream and to achieve inter-
rich Internet of the near future. Real-time streaming has wide protocol fairness with competing TCP trafﬁc. A secondary goal
applicability beyond the public Internet as well. In military and of VTP is the minimal use of network and computer resources.
commercial wireless domains, virtual private networks, and cor- We make several trade-offs to limit processing overhead and
porate intra-nets audio and video are becoming commonplace buffering requirements in the receiver. In general, VTP follows
supplements to text and still image graphics. a conservative design philosophy by sparingly using bandwidth
Currently, commercial programs such as RealPlayer  and and memory during the streaming session.
Windows Media Player  provide the predominant amount of In essence, the VTP sender asks the receiver the question: are
you receiving at least as fast as I am sending? If so, the sender In the spirit of RAP, N. Feamster proposes SR-RTP [12, 13],
increases its rate by a small amount to probe the network for a backward compatible extension to the Real Time Protocol
unused bandwidth. If not, the sender immediately reduces its (RTP). SR-RTP uses a quality adaptation mechanism similar to
rate by an amount based on the receiver’s bandwidth, the current RAP, but “binomial” congestion control reduces the congestion
sending rate and video bitrate. window size proportional to the square root of its value rather
An important aspect of VTP is that it is completely end-to- than halving it in response to loss. This is shown to assuage os-
end. VTP does not rely on QoS functionality in routers, random cillations in the sending rate and produce smoother throughput.
early drop (RED), other active queue management (AQM) or ex- Binomial algorithms also display a reasonable amount of TCP
plicit congestion notiﬁcation (ECN). It could potentially beneﬁt fairness .
from such network level facilities, but in this paper we focus The main beneﬁts of SR-RTP come from its features of se-
only on the case of real-time streaming in a strictly best effort lective retransmission of certain video packets and decoder post-
network. Possible interactions between VTP and QoS routers, processing to conceal errors due to packet loss. However, the
AQM or ECN are areas of future work. effectiveness of selective retransmission depends strongly on the
VTP is implemented entirely in user space and designed round trip time (RTT) between sender and receiver. Further, in
around open video compression standards and codecs for which , the receiver post-processing is performed ofﬂine for ease
the source code is freely available. The functionality is split of analysis. It is not clear such recovery techniques are viable in
between two distinct components, each embodied in a separate real time or with limited processing resources.
software library with its own API. The components can be used The Stream Control Transmission Protocol (SCTP)  is
together or separately, and are designed to be extensible. VTP a recently proposed protocol with many novel features de-
sends packets using UDP, adding congestion control at the ap- signed to accommodate real-time streaming. SCTP supports
plication layer. multi-streaming, where a sender can multiplex several outgoing
This paper discusses related work in the next section and streams into one connection. This can potentially be very advan-
presents an overview of the MPEG-4 compression standard in tageous for compressed video formats since packets belonging to
Section 3. The VTP design is described in Section 4. Section different parts of the video stream can be treated differently with
5 covers the VTP implementation and receiver buffering strate- respect to retransmission and order of delivery. The congestion
gies. The experimental evaluation of VTP is treated in Section 6 control mechanism in SCTP is identical to TCP, where the con-
and is followed by the conclusion. gestion window is reduced by half in the event of packet loss.
Like TCP, SCTP employs slow start to initially seek out avail-
able bandwidth and congestion avoidance to adapt to changing
path conditions. This results in perfect fairness with TCP, but
2 Related Work leads to high variability in throughput at the receiver. An inves-
tigation of the applicability of SCTP to MPEG-4 streaming is the
Recent research approaches to address the lack of a suitable
subject of .
end-to-end service model for multimedia streaming generally
The work of J. Padhye, et. al.  has led to TCP-Friendly
fall into two categories: 1) modiﬁcations or enhancements to
Rate Control (TFRC) . TFRC is not itself a protocol, but an
AIMD congestion control to better accommodate streaming ap-
algorithm for maintaining the sending rate at the level of a TCP
plications, or 2) model-based ﬂow control based primarily on
ﬂow under the same conditions. The TFRC sender adjusts its
the results of . We give several examples of each technique
rate according to an equation that speciﬁes throughput in terms
before presenting the motivation and design of VTP.
of packet size, loss event rate, RTT, and the retransmission timer
The Rate Adaptation Protocol (RAP)  is a rate based value. TFRC is meant to serve as a congestion control frame-
AIMD protocol intended for transmitting real-time video. The work for any applications that do not require the full reliability
RAP sender uses receiver feedback about congestion conditions of TCP and would beneﬁt from low variation in sending rate.
to make decisions about its sending rate and the transmitted
Application domains appropriate for TFRC include multime-
video quality. The RAP algorithm does not result in fairness with
dia streaming, interactive distributed games, Internet telephony,
TCP in many cases, but router support in the form of Random
and video conferencing. Several authors have applied the TFRC
Early Drop (RED) can improve RAP’s inter-protocol behavior
model to video streaming. In , a new error-resilient video
to some extent.
compression method is developed which relies on simpliﬁed
A major difference between VTP and RAP is the degree to derivation of the TCP throughput equation. The relationship be-
which they comply to AIMD. While RAP is a full AIMD proto- tween the compression level and the congestion control model
col, VTP performs additive increase but it does not decrease its is examined. The Multimedia Streaming TCP-Friendly Protocol
sending rate multiplicatively. Rather, it adjusts its sending rate (MSTFP) is part of a comprehensive resource allocation strategy
to the rate perceived by the receiver. RAP and VTP also dif- proposed in  which uses a TFRC model to adapt streaming
fer in the type of video encoding they stream. RAP is based on MPEG-4 video.
layered video encoding where the sender can decide how many Ostensibly, any rate adjustment scheme derived from TCP
layers can be sent at any given time. On the other hand, VTP would suffer the same limitations of TCP itself.1 TCP’s behav-
assumes a discrete encoding scheme, where the sender chooses iors of poor link utilization in high-loss environments and un-
one of several pre-encoded streams and exclusively sends from
that stream until it decides to change the video quality. Video 1 The TCP throughput equation in TFRC is derived for TCP New
compression is described in further detail in the next section. Reno in particular.
fairness against ﬂows with large RTTs have been documented separate foreground VO while the background images compose
repeatedly (see, for example, ). Although VTP decreases its another object. VO motion is achieved by a progression of video
sending rate in response to packet loss, the decrease decision object planes (VOPs).
does not assume that all packet loss is a result of overﬂowed There are three different types of VOPs in the MPEG-4 for-
router buffers. At the same time, the amount of decrease is suf- mat: (1) Intra-coded VOPs (I-VOPs) that are encoded indepen-
ﬁcient to restrict the sending rate to within its fair share of the dently and can be considered “key” VOPs; (2) Predicted VOPs
network bandwidth. (P-VOPs) that depend on preceding I- or P-VOPs and contain
In this paper we argue that it is possible to build a stable and predicted motion data and information about the error in the
scalable network protocol that is not underpinned by AIMD. predicted values; and (3) Bi-directionally predicted VOPs (B-
VTP borrows the idea of additive increase from AIMD, but VOPs) that depend on both previous and next VOPs. Figure 1
its decrease step is not strictly multiplicative. VTP also uses shows a sequence of MPEG-4 VOPs, known as a Group of Video
network bandwidth estimation, but in a different way than the Object Planes (GOV), with the dependencies represented above
model-based approaches described above. By combining el- each plane. If a VOP upon which other VOPs depend is dam-
ements of AIMD and model-based congestion control while aged during network transmission, decoding errors will manifest
not directly following either, VTP attempts to beneﬁt from the in the damaged VOP as well as all its dependent VOPs, a phe-
strengths of each approach. VTP aims to be adaptive and ﬂexi- nomenon known as propagation of errors. RFC 30162 describes
ble by making minimal assumptions about the network and us- a structured packetization scheme that improves error resiliency,
ing network feedback as a rough indicator, not as rigorous set making error concealment and error recovery more effective to
of input parameters. These principles encompass the motivating counteract error propagation.
factors of the VTP design.
B B B B B B
3 MPEG-4 Background Enhancement
The MPEG-4 video compression speciﬁcation [18, 25] has
been developed as an open standard to encourage interoperabil-
ity and widespread use. MPEG-4 has enjoyed wide acceptance
in the research community as well as in commercial develop-
ment owing to its high bitrate scalability and compression ef- Base
ﬁciency. Packetization markers in the video bitstream are an- Layer
other feature which make MPEG-4 especially attractive for net-
work video transmission. MPEG-4 is a natural choice for VTP
I P P I
since abundant documentation exists and numerous codecs are
freely available. Like other MPEG video compression tech- Figure 2: 2 Layered MPEG-4 encoding, VOPs at the head of an arrow
niques, MPEG-4 takes advantage of spatial and temporal redun- depend on the VOPs at the tail.
dancy in individual frames of video to improve coding efﬁciency.
A unique capability of MPEG-4 is support for object-based en-
Each VO can be composed of “layers”. A base layer contains
the basic representation of the VO and additional enhancement
layers can be added by the codec to improve video resolution
if needed. Figure 2 depicts a simple 2-layered MPEG-4 encod-
I P B B P B B P ing, with B-VOPs comprising the enhancement layer. Since each
VOP sequence can be accessed and manipulated independently,
MPEG-4 encodes information about the scene composition in a
separate stream within the video bitstream. The decoder’s job is
somewhat complex: in order to assemble a frame, it must calcu-
late dependencies and perform the decoding algorithm for each
layer of each VOP, build the scene according to the composition
information, and synchronize between the independent VOP se-
quences, all while observing the play out time constraint.
The fundamental processing unit in MPEG-4 is a 16x16 block
of pixels called a macroblock. Figure 3 shows a typical VOP
composed of rows of macroblocks called slices. Macroblocks
Figure 1: Group of Visual Object Planes (GOV) in MPEG-4. from I-, P-, and B-VOPs contain different kinds of data that re-
ﬂect the particular dependency relationships of the VOP. A dis-
crete cosine transform (DCT) is applied to each macroblock, and
coding, where each scene is decomposed into separate video ob- the resulting 16x16 matrix is then quantized. The range of the
jects (VOs). A typical example of the use of object based encod-
ing is a news broadcast, where the news person is encoded as a 2 http://www.faqs.org/rfcs/rfc3016.html
I P B B P B B P I P B B P B B P I P B B P B B P I
I P B B P B B P I P B B P B B P I P B B P B B P I
I P B B P B B P I P B B P B B P I P B B P B B P I
Figure 4: Example of video level switching in discrete encoding.
16 the others. Each frame in the discrete encoded stream consists
slice of only one rectangular VOP of ﬁxed size,3 which implies a one
16 to one correspondence between VOPs and frames. In this sense,
... the MPEG-4 codec in VTP performs like a conventional frame-
based encoder. In the remainder of this paper the terms “VOP”
macroblock and “frame” are used interchangeably.
The VTP sender determines from which discrete stream to
send video data based on receiver feedback, and sends from that
level exclusively until a decision is made to change. The QPs
across all frames in a single level are all within a pre-deﬁned
... range. In effect, VTP adapts to one of the pre-encoded quantiza-
tion scales in the video source instead of computing the quantiz-
ers in real time during the streaming session.
In Figure 4, three discrete levels of an example streaming ses-
Figure 3: Macroblocks and slices in MPEG-4. sion are shown with corresponding average bitrates. The vari-
able in this diagram is frame size (in bytes); the frame rate and
the GOV pattern are ﬁxed between levels. The arrows indicate
video quality changes during the sent stream. The stream starts
quantization parameters (QPs) is normally from 1 to 31, with at the lowest level – 128 Kbps, and then progresses to 256 Kbps
higher values indicating more compression and lower quality. and 384 Kbps as VTP determines bandwidth share is available.
Ultimately, the bitrate of an MPEG-4 video stream is governed Later, VTP reduces the rate to 256 Kbps again as it notices con-
by the quantization scale of each DCT transformed macroblock. tention for the link. All three streams are synchronized by frame
Q. Zhang, et. al.  exploit this object based encoding struc- throughout the transmission, but only one stream is sent at any
ture by using network feedback to choose different quantizers for given time. The quality change occurs only on I-frames, since
each VOP in real time. Foreground (more important) and back- the data in the P- and B-frames is predicted from the base I-frame
ground (less important) VOPs are weighted unequally, with QP in each GOV.
values selected so that the quality of the background VOs is sac-
riﬁced ﬁrst in times of congestion. The ranges of all quantizer
values are such that the sum of bitrates of all the VOP streams 4 The Video Transport Protocol
equals the target bitrate of the whole video stream.
In contrast, VTP achieves adaptivity through a less complex A typical streaming server sends video data by dividing each
approach with considerably looser semantics and lighter pro- frame into ﬁxed size packets and adding a header containing,
cessing requirements. VTP is founded on the technique of dis-
crete video encoding, where each video level is independent of 3 That is, there is only one video object in every scene.
for example, a sequence number, the time the packet was sent
and the relative play out time of the associated frame. Upon
receiving the necessary packets to reassemble a frame, the re- TYPE
ceiver buffers the compressed frame for decoding. The decom- SEQUENCE NO.
pressed video data output from the decoder is then sent to the 32 bits
SENDER TIMESTAMP (secs)
output device. If the decoder is given an incomplete frame due to
SENDER TIMESTAMP (µsecs) TYPE
packet loss during the transmission, it may decide to discard the
frame. The mechanism used in the discarding decision is highly RECEIVER TIMESTAMP (secs) SEQUENCE NO.
decoder-speciﬁc, but the resulting playback jitter is a universal RECEIVER TIMESTAMP (µsecs) SENDER TIMESTAMP (secs)
effect. As predicted frames depend on key frames, discarding a SIZE SENDER TIMESTAMP (µsecs)
key frame can severely reduce the overall frame rate.
RECEIVER TIMESTAMP (secs)
The primary design goal of VTP is to adapt the outgoing video Video Data
RECEIVER TIMESTAMP (µsecs)
stream so that, in times of network congestion, less video data
is sent into the network and consequently fewer packets are lost SIZE
and fewer frames are discarded. VTP rests on the underlying
assumption that the smooth and timely play out of consecutive A) VTP Video Packet B) VTP Control Packet
frames is central to a human observer’s perception of video qual-
ity. Although a decrease in the video bitrate noticeably produces Figure 5: VTP packet formats for a) video packets and b) control packets.
images of coarser resolution, it is not nearly as detrimental to the
perceived video quality as inconsistent, start-stop play out. VTP
capitalizes on this idea by adjusting both the video bitrate and
its sending rate during the streaming session. In order to tailor
the video bitrate, VTP requires the same video sequence to be is used by the sender to explicitly request a control packet from
pre-encoded at several different compression levels. By switch- the receiver. For every video packets sent, the sender will mark
ing between levels during the stream, VTP makes a fundamental the TYPE ﬁeld with an ack request, to which the receiver will re-
trade-off by increasing the video compression in an effort to pre- spond with a control packet. The value of is a server option that
serve a consistent frame rate at the client. is conﬁgurable at run time by the user. The two timestamp ﬁelds
In addition to maintaining video quality, the other important for sender and receiver respectively are used for RTT measure-
factor for setting adaptivity as the main goal in the design is ment and bandwidth computation. VTP estimates the bandwidth
inter-protocol fairness. Unregulated network ﬂows pose a risk available to it on the path and then calibrates its sending rate to
to the stability and performance of the Internet in their tendency the estimate, as detailed in the following paragraphs.
to overpower TCP connections that carry the large majority of When the receiver receives a data packet with the TYPE ﬁeld
trafﬁc. While TCP halves its window in response to congestion, indicating it should send a control packet, it performs two simple
unconstrained ﬂows are under no restrictions with respect to the operations. First, it copies the header of the video packet and
amount of data they can have in the network at any time. VTP’s writes its timestamp into the appropriate ﬁelds. Second, it writes
adaptivity attempts to alleviate this problem by interacting fairly the number of bytes received since the last control packet was
with any competing TCP ﬂows. sent into the SIZE ﬁeld. The modiﬁed video packet header is
The principal features of this design, each described in the then sent back to the sender as a control packet. This minimal
following subsections, can be summarized as follows: processing absolves the receiver of bandwidth computation and
frees it for decoding and video playback, which are highly time
1. Communication between sender and receiver is a “closed constrained.
loop,” i.e. the receiver sends acknowledgments to the Upon receipt of the control packet, the sender extracts the
sender at regular intervals. value in the SIZE ﬁeld and the receiver timestamp. The sender is
2. The bandwidth of the forward path is estimated and used able to compute the time delta between control packets at the re-
by the sender to determine the sending rate. ceiver by keeping the value of one previous receiver timestamp
in memory and subtracting it from the timestamp in the most re-
3. VTP is rate based. There is no congestion window or slow cently received packet. The value of the SIZE ﬁeld divided by
start phase. this time delta is the rate currently being achieved by this stream.
This rate is also the “admissible” rate since it is the rate at which
4.1 Sender and Receiver Interaction data is getting through the path bottleneck. In essence, the mea-
sured rate is equal to the bandwidth available to the connection.
VTP follows a client/sever design where the client initiates a Thus, it is input as a bandwidth sample into the bandwidth esti-
session by requesting a video stream from the server. Once sev- mation algorithm described in the next section.
eral initialization steps are completed, the sender and receiver The sender uses its own timestamps to handle the RTT com-
communicate in a closed loop, with the sender using the ac- putation. When the sender sends a video packet with the TYPE
knowledgments to determine the bandwidth and RTT estimates. ﬁeld marked for acknowledgment, it remembers the sequence
The VTP video header and acknowledgment or “control number. If the sequence number on the returning control packet
packet” formats are shown in Figure 5. The symmetric design fa- matches the stored value (recall the receiver simply copies the
cilitates both bandwidth and RTT computation. The TYPE ﬁeld header into the control packet, changing only its own timestamp
handle the data rate increase.  examines the factors that need be tuned to compensate for large RTTs. The gradual rate in-
to be taken into account in quality changing decisions in detail. crease seeks out available bandwidth and enables VTP to “ramp
In a rate and quality decrease, the transition to DR is initiated up” the video quality if network conditions remain accommodat-
when the server receives a bandwidth estimate less than its cur- ing. This behavior parallels the additive increase phase of AIMD
rent sending rate. In DR, the server checks the reference rate so that rate increases in VTP and TCP are comparable.
of each constituent quality to ﬁnd the highest one that can ﬁt Throughout the duration of the connection, VTP estimates the
within the bandwidth estimate. The server sets its sending rate forward path bandwidth. If the bandwidth estimate falls below
to the bandwidth estimate and transitions to the state correspond- the sending rate, VTP takes this as an indication of network con-
ing to the video quality that can be supported. Unlike the state gestion and reduces its rate. In summary, the protocol behaves
transitions to increase quality levels, the decrease happens im- conservatively by slightly increasing the send rate every RTT
mediately, with no cycles or waits on the RTT timer. This con- and cutting the rate immediately upon the arrival of “bad news.”
servative behavior contributes greatly to the fairness properties
of VTP discussed in Section 6.2.
As the FSM suggests, the selection of the encoding bitrates is 5 VTP Implementation
important. VTP observes the rule that a particular video encod-
ing level must be transmitted at a rate greater than or equal to its We implemented VTP on the Linux platform and performed
bitrate and will not send slower than the rate of the lowest qual- extensive evaluations using the Dummynet link emulator .
ity encoding. This could potentially saturate the network and We developed a technique to smooth the bandwidth required by
exacerbate congestion if the lowest video bitrate is frequently the outgoing video stream and compute the client buffer require-
higher than the available bandwidth. Additionally, if the step ment for speciﬁc pre-encoded video segments. In this section we
size between each reference rate is large, more data buffering cover the software implementation of VTP and our approach to
is required at the receiver. This follows from the fact that large client buffering.
step sizes lead to the condition where VTP is sending at a rate
that is considerably higher than the video bitrate for long periods 5.1 Software Architecture
The VTP implementation effort has strived to build a fully
functioning video streaming platform. VTP software accepts
4.3 Rate Based Congestion Control standard Audio/Video Interleaved (AVI) ﬁles as input. For each
video segment, VTP requires multiple AVI ﬁles, each of a dif-
The stability of the Internet depends on the window based
ferent level of MPEG-4 compression. Two main functional units
AIMD algorithm of TCP. Any protocol that does not observe comprise the VTP architecture. A transport layer component
the AIMD scheme requires justiﬁcation to be considered viable, called NetPeer provides an interface that returns an estimate of
especially for large-scale deployment. VTP has no congestion
the bandwidth share of the connection. A middleware compo-
window, does not perform slow start, and does not halve its send- nent called FileSystemPeer manages the source video data and
ing rate on every packet loss. However, VTP uses resources in determines the sending rate based on the estimate provided by
a minimal way and relinquishes them on the ﬁrst indication of NetPeer.
congestion. Justiﬁcation for the plausibility of VTP is based
For each set of AVI ﬁles, a binary ﬁle is created that con-
mainly on the practical observation that the threat to Internet
tains the discrete encoded video along with packet delimiters to
stability is not posed by ﬂows using congestion control schemes
guide the server in selecting the right frame when a level change
that are non-compliant to AIMD, but rather by ﬂows under no
needs to be made. Figure 7 shows the ﬁelds in a single record
end-system control at all – ﬂows that are completely impervious
of the binary ﬁle. Within the ﬁle the video data is packetized
to network conditions.
and sorted ﬁrst by frame number, then by video encoding level,
It has not been proven that Internet stability requires AIMD, and ﬁnally by number of the packet within the frame. This or-
but some form of end-to-end congestion control is necessary in ganization enables the FileSystemPeer to ﬁnd the right packet
order to prevent congestion collapse . Even though VTP is on a video level change without performing “seeks” on the ﬁle.
not founded on AIMD, it is still able to fairly share links with Audio and video portions of the AVI ﬁles are de-multiplexed in
TCP competitors as evidenced by the experimental results of the process of creating the binary ﬁle and only the video data is
Section 6.2. Inter-protocol fairness of VTP notwithstanding, any stored and transmitted. Streaming audio and video in combina-
end-to-end mechanism that limits the ﬂow of the real-time trafﬁc tion with VTP is a subject of future research. Upon receiving the
in an environment where it competes with TCP is advantageous client’s request to start a stream, the FileSystemPeer opens the
from the perspective of fairness. Furthermore, unlike TCP, VTP binary ﬁle and begins to send data at the lowest quality encod-
is aimed at preserving minimum variance in delivery rate at the ing. As the session progresses, the FileSystemPeer changes the
receiver. Streaming applications that eschew TCP due to its os- video level in response to the NetPeer feedback.
cillatory steady state nature can beneﬁt from the smooth delivery The client and server communicate over two separate sockets:
rate of VTP while during times of congestion their data load on one UDP socket for data and one UDP socket for control infor-
the network will be judiciously constrained. mation. Timestamps are gathered using the Berkeley Packet Fil-
By default, VTP performs a type of congestion avoidance: it ter utility (BPF)4 running in a separate thread to minimize the in-
increases its rate by a small amount on every estimated RTT.
Normally the rate increase is one packet size per RTT, but it can 4 Available from http://www-nrg.ee.lbl.gov/.
Disk RTT Probe RTT Probe
Server FileSystemPeer Server NetPeer Client NetPeer Player
Figure 8: VTP Software Architecture.
ﬁeld bytes description video data.
timestamp 4 application level The FileSystemPeer API provides two major functions:
video playout time
sequence # 4 sequence number of packet is_eof = getPacket(qual, buffer, size);
within current frame rate = setRate(rtt_val, bw_est, qual);
frame # 4 frame number
size 4 number of bytes of video The getPacket function ﬁlls the buffer ﬁeld with a
data for this record header and size bytes of video data from video quality qual,
video level 4 video encoding level where qual corresponds to one of the pre-encoded compres-
rate 4 nominal sending rate sion levels in the binary ﬁle. A ﬂag is returned indicating if
this is the last packet in the ﬁle. The setRate function real-
for this packet
izes the algorithm in Section 4.2. The values for the parameters
frame done 1 indicates if this is the last
rtt val and bw est are provided by NetPeer (see NetPeer
packet for this frame API below). The last parameter, qual, is passed by reference
video level 4 video encoding level and is set by the setRate function and used as input in the next
I frame 1 indicates if this packet call to getPacket. It should be noted that both getPacket
belongs to an I frame and setRate maintain state between calls.
video data variable MPEG-4 compressed video data The NetPeer API provides three functions:
bw_est = getBWE();
Figure 7: Fields included in the VTP binary input ﬁle. The timestamp,
sequence number, and size ﬁelds are different from and independent of rtt_val = getRTT();
the ﬁelds of the video packet header. sendData(rate, buffer);
The sender uses getBWE to get the latest bandwidth estimate
ﬂuence of the data processing on the RTT value. The BPF allows from its NetPeer. Internally, NetPeer performs non-blocking
the user mode player and server processes to collect timestamps reads on the control socket to obtain the latest acknowledgment
at the network interface level that exclude operating system and from the receiver. From the information in the ack, it computes
protocol overhead time. The minimum measured RTT during the a bandwidth estimate which is the return value of the function.
connection is used as the RTT value in the rate adjustment algo- The sending rate can then be computed by calling the setRate
rithm. Figure 8 presents a functional diagram of the VTP soft- function of the FileSystemPeer with the bandwidth estimate as
ware architecture. Each of the two server components of VTP the second parameter. GetRTT returns the latest value of the
is independent and could potentially be used with other software RTT estimate. The sendData function determines the amount
modules. Similarly, the client NetPeer is intended to function as of time to wait from the rate parameter and then sends the
a generic plug-in to any software player that supports modular buffer containing the header and video data.
input. In this implementation we used the xine video player  In addition to these exported functions, several other func-
for Unix systems. tions are provided to handle connection initiation, opening the
A VTP software server may be implemented easily by linking source video ﬁles, and other initialization and conﬁguration
the FileSystemPeer and NetPeer modules and providing a main tasks. The parameter, the value of the heuristic variable (in
routine to form an executable. The client side NetPeer includes units of RTT), and the port numbers that VTP uses are all user
buffering capability to accommodate network level buffering of conﬁgurable.