Streaming Video over the Internet: Approaches and Directions


Published on

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Streaming Video over the Internet: Approaches and Directions

  1. 1. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 1 Streaming Video over the Internet: Approaches and Directions Dapeng Wu, Student Member, IEEE, Yiwei Thomas Hou, Member, IEEE, Wenwu Zhu, Member, IEEE, Ya-Qin Zhang, Fellow, IEEE, Jon M. Peha, Senior Member, IEEE Abstract | Due to the explosive growth of the Internet fers long and perhaps unacceptable transfer time. In con- and increasing demand for multimedia information on the trast, in the streaming mode, the video content need not web, streaming video over the Internet has received tremen- be downloaded in full, but is being played out while parts dous attention from academia and industry. Transmission of real-time video typically has bandwidth, delay and loss re- of the content are being received and decoded. Due to quirements. However, the current best-e ort Internet does its real-time nature, video streaming typically has band- not o er any quality of service QoS guarantees to stream- width, delay and loss requirements. However, the current ing video. Furthermore, for video multicast, it is di cult to achieve both e ciency and exibility. Thus, Internet best-e ort Internet does not o er any quality of service streaming video poses many challenges. To address these QoS guarantees to streaming video over the Internet. In challenges, extensive research has been conducted. This spe- addition, for multicast, it is di cult to e ciently support cial issue is aimed at dissemination of the contributions in multicast video while providing service exibility to meet the eld of streaming video over the Internet. To introduce this special issue with the necessary background and pro- a wide range of QoS requirements from the users. Thus, vide an integral view on this eld, we cover six key areas designing mechanisms and protocols for Internet streaming of streaming video. Speci cally, we cover video compres- video poses many challenges. sion, application-layer QoS control, continuous media dis- tribution services, streaming servers, media synchronization To address these challenges, extensive research has been mechanisms, and protocols for streaming media. For each conducted. This special issue is aimed at dissemination of area, we address the particular issues and review major ap- proaches and mechanisms. We also discuss the trade-o s of the contributions in the eld of streaming video over the the approaches and point out future research directions. Internet. To introduce this special issue with the necessary Keywords | Internet, streaming video, video compression, background and give the reader a complete picture of this application-layer QoS control, continuous media distribution eld, we cover six key areas of streaming video, namely, services, steaming server, synchronization, protocol. video compression, application-layer QoS control, continu- ous media distribution services, streaming servers, media I. Introduction synchronization mechanisms, and protocols for streaming R ECENT advances in computing technology, compres- sion technology, high bandwidth storage devices, and high-speed networks have made it feasible to provide real- media. Each of the six areas is a basic building block, with which an architecture for streaming video can be built. The relations among the six basic building blocks can be illus- trated in Fig. 1. time multimedia services over the Internet. Real-time mul- timedia, as the name implies, has timing constraints. For Figure 1 shows an architecture for video streaming. In example, audio and video data must be played out con- Fig. 1, raw video and audio data are pre-compressed by tinuously. If the data does not arrive in time, the playout video compression and audio compression algorithms and process will pause, which is annoying to human ears and then saved in storage devices. Upon the client's request, eyes. a streaming server retrieves compressed video audio data Real-time transport of live video or stored video is the from storage devices and then the application-layer QoS predominant part of real-time multimedia. In this paper, control module adapts the video audio bit-streams accord- we are concerned with video streaming, which refers to real- ing to the network status and QoS requirements. After time transmission of stored video. There are two modes the adaptation, the transport protocols packetize the com- for transmission of stored video over the Internet, namely, pressed bit-streams and send the video audio packets to the download mode and the streaming mode i.e., video the Internet. Packets may be dropped or experience ex- streaming. In the download mode, a user downloads the cessive delay inside the Internet due to congestion. To im- entire video le and then plays back the video le. How- prove the quality of video audio transmission, continuous ever, full le transfer in the download mode usually suf- media distribution services e.g., caching are deployed in the Internet. For packets that are successfully delivered Manuscript received June 15, 2000; revised Dec. 7, 2000. to the receiver, they rst pass through the transport lay- D. Wu and J.M. Peha are with Carnegie Mellon University, Dept. of ers and then are processed by the application layer before Electrical Computer Engineering, 5000 Forbes Ave., Pittsburgh, PA 15213, USA. being decoded at the video audio decoder. To achieve syn- Y.T. Hou is with Fujitsu Laboratories of America, 595 Lawrence chronization between video and audio presentations, media Expressway, Sunnyvale, CA 94085, USA. synchronization mechanisms are required. From Fig. 1, it W. Zhu and Y.-Q. Zhang is with Microsoft Research China, 5F, can be seen that the six areas are closely related and they Beijing Sigma Center, No. 49 Zhichun Road, Haidian District, Beijing 100080, China. are coherent constituents of the video streaming architec-
  2. 2. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 2 Client/Receiver Media Streaming Server Video Synchronization Audio Decoder Decoder Storage Device Video Application-layer Compression Compressed Application-layer Raw Video QoS Control Video QoS Control Audio Compressed Raw Audio Compression Audio Transport Transport Protocols Protocols Internet (Continuous media distribution services) Fig. 1. An architecture for video streaming. ture. We brie y describe the six areas as follows. erating system, and a storage system. 1. Video compression. Raw video must be compressed 5. Media synchronization mechanisms. Media syn- before transmission to achieve e ciency. Video compres- chronization is a major feature that distinguishes multime- sion schemes can be classi ed into two categories: scalable dia applications from other traditional data applications. and non-scalable video coding. Since scalable video is ca- With media synchronization mechanisms, the application pable of gracefully coping with the bandwidth uctuations at the receiver side can present various media streams in in the Internet 43 , we are primarily concerned with scal- the same way as they were originally captured. An exam- able video coding techniques. We will also discuss the re- ple of media synchronization is that the movements of a quirements imposed by streaming applications on the video speaker's lips match the played-out audio. encoder and decoder. 6. Protocols for streaming media. Protocols are de- 2. Application-layer QoS control. To cope with vary- signed and standardized for communication between clients ing network conditions and di erent presentation quality and streaming servers. Protocols for streaming media pro- requested by the users, various application-layer QoS con- vide such services as network addressing, transport, and trol techniques have been proposed 17 , 60 , 66 , 79 . session control. According to their functionalities, the pro- The application-layer techniques include congestion control tocols can be classi ed into three categories: network-layer and error control. Their respective functions are as follows. protocol such as Internet protocol IP, transport protocol Congestion control is employed to prevent packet loss and such as user datagram protocol UDP, and session control reduce delay. Error control, on the other hand, is to im- protocol such as real-time streaming protocol RTSP. The remainder of this paper is devoted to the exposition prove video presentation quality in the presence of packet of the above six areas. Section II discusses video com- loss. Error control mechanisms include forward error cor- pression techniques. In Section III, we present application- rection FEC, retransmission, error-resilient encoding and layer QoS control mechanisms for streaming video. Sec- error concealment. tion IV describes continuous media distribution services. 3. Continuous media distribution services. In order In Section V, we discuss key issues on design of streaming to provide quality multimedia presentations, adequate net- servers. Section VI presents various media synchronization work support is crucial. This is because network support mechanisms. In Section VII, we overview key protocols for can reduce transport delay and packet loss ratio. Built on streaming video. Section VIII summarizes this paper and top of the Internet IP protocol, continuous media distri- points out future research directions. bution services are able to achieve QoS and e ciency for streaming video audio over the best-e ort Internet. Con- II. Video Compression tinuous media distribution services include network lter- Since raw video consumes a large amount of bandwidth, ing, application-level multicast, and content replication. compression is usually employed to achieve transmission 4. Streaming servers. Streaming servers play a key role e ciency. In this section, we discuss various compression in providing streaming services. To o er quality streaming approaches and requirements imposed by streaming appli- services, streaming servers are required to process multi- cations on the video encoder and decoder. media data under timing constraints and support interac- Basically, video compression schemes can be classi ed tive control operations such as pause resume, fast forward into two approaches: scalable and non-scalable video cod- and fast backward. Furthermore, streaming servers need ing. For simplicity, we will only show the encoder and de- to retrieve media components in a synchronous fashion. coder in intra-mode1 and only use discrete cosine transform A streaming server typically consists of three subsystems, namely, a communicator e.g., transport protocols, an op- 1 Intra-mode coding refers to coding a video unit e.g., a mac-
  3. 3. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 3 Raw Compressed DCT Q VLC Video Bit-stream Compressed Decoded VLD IQ IDCT Bit-stream Video DCT : Discrete Cosine Transform VLD : Variable Length Decoding Q : Quantization IQ : Inverse Quantization VLC : Variable Length Coding IDCT : Inverse DCT a b Fig. 2. a A non-scalable video encoder; b a non-scalable video decoder. Base Layer Raw DCT Q VLC Compressed Video Base Layer Base Layer Bit-stream Compressed VLD IQ IDCT Decoded + - IQ Bit-stream Video Enhancement Layer Enhancement Layer Enhancement Layer Q VLC Compressed Compressed VLD + IDCT Decoded IQ Bit-stream Bit-stream Video a b Fig. 3. a An SNR-scalable encoder; b an SNR-scalable decoder. quality. Speci cally, compared with decoding the complete bit-stream Fig. 4a, decoding the base substream or mul- tiple substreams produces pictures with degraded quality (a) Fig. 4b, or a smaller image size Fig. 4c, or a lower frame rate Fig. 4d. The scalabilities of quality, image sizes, or frame rates, are called SNR, spatial, or temporal scalability, respectively. These three scalabilities are basic scalable mechanisms. There can be combinations of the (b) basic mechanisms, such as spatiotemporal scalability 27 . To provide more exibility in meeting di erent demands of streaming e.g., di erent access link bandwidths and dif- ferent latency requirements, a new scalable coding mech- (c) anism, called ne granularity scalability FGS, was pro- posed to MPEG-4 37 , 38 , 39 . As shown in Fig. 5a, an FGS encoder compresses a raw video sequence into two substreams, i.e., a base layer bit-stream and an enhance- ment bit-stream. Di erent from an SNR scalable encoder, (d) an FGS encoder uses bitplane coding2 to represent the en- Fig. 4. Scalable video: a video frames reconstructed from the hancement stream see Fig. 6. With bitplane coding, an complete bit-stream; b video frames with degraded quality; c FGS encoder is capable of achieving continuous rate control video frames with a smaller image size; d video frames with a for the enhancement stream. This is because the enhance- lower frame rate. ment bit-stream can be truncated anywhere to achieve the target bit-rate. DCT. For wavelet-based scalable video coding, please re- A variation of FGS is progressive ne granularity scal- fer to Refs. 36 , 62 , 73 , 74 and references therein. ability PFGS 72 . PFGS shares the good features of A non-scalable video encoder see Fig. 2a generates FGS, such as ne granularity bit-rate scalability and error one compressed bit-stream. In contrast, a scalable video resilience. Unlike FGS, which only has two layers, PFGS encoder compresses a raw video sequence into multiple sub- could have more than two layers. The essential di erence streams see Fig. 3a. One of the compressed substreams between FGS and PFGS is that FGS only uses the base is the base substream, which can be independently decoded layer as a reference for motion prediction while PFGS uses and provide coarse visual quality. Other compressed sub- multiple layers as references to reduce the prediction error, streams are enhancement substreams, which can only be decoded together with the base substream and can pro- 2 Bitplane coding uses embedded representations 60 . For example, vide better visual quality. The complete bit-stream i.e., a DCT to 127. There are represented bycients. Each DCT coe ranges from 0 coe cient can be 64 DCT coe 7 bits i.e., its value cient combination of all the substreams provides the highest has a most signi cant bit MSB and all the MSBs from the 64 DCT coe cients form Bitplane 0 see Fig. 6. Similarly, all the second roblock without any reference to previously coded data. most signi cant bits form Bitplane 1.
  4. 4. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 4 Base Layer Raw DCT Q VLC Compressed Base Layer Base Layer Video Compressed VLD IDCT Decoded Bit-stream IQ Bit-stream Video + - IQ Enhancement Enhancement Enhancement Bitplane Shift Find Maximum Bitplane VLC Compressed Compressed Bitplane VLD Bitplane Shift + IDCT Decoded Bit-stream Bit-stream Video a b Fig. 5. a An FGS encoder; b an FGS decoder. Bitplane 0 stream least-cost scheme to e ciently provide VCR-like Most Significant Bit Bitplane 1 functionality for MPEG video streaming. Decoding complexity. Some devices such as cellular phones and personal digital assistants PDAs require low power consumption. Therefore, streaming video applica- tions running on these devices must be simple. In particu- Bitplane k Least Significant Bit lar, low decoding complexity is desirable. To address this DCT DCT DCT issue, Lin et al. 40 employed a least-cost scheme to reduce Coefficient 0 Coefficient 1 Coefficient 63 decoding complexity. Fig. 6. Bitplanes of enhancement DCT coe cients. We have discussed various compression mechanisms and requirements imposed by streaming applications on the video encoder and decoder. Next, we present the resulting in higher coding e ciency. application-layer QoS control mechanisms, which adapt the Next, we describe various requirements imposed by video bit-streams according to the network status and QoS streaming applications on the video encoder and decoder requirements. and brie y discuss some techniques that address these re- quirements. III. Application-layer QoS Control Bandwidth. To achieve acceptable perceptual quality, a streaming application typically has minimum bandwidth The objective of application-layer QoS control is to avoid requirement. However, the current Internet does not pro- congestion and maximize video quality in the presence of vide bandwidth reservation to support this requirement. In packet loss. The application-layer QoS control techniques addition, it is desirable for video streaming applications to include congestion control and error control. These tech- employ congestion control to avoid congestion, which hap- niques are employed by the end systems and do not require pens when the network is heavily loaded. For video stream- any QoS support from the network. ing, congestion control takes the form of rate control, that We organize the rest of this section as follows. In Sec- is, adapting the sending rate to the available bandwidth in tion III-A, we survey the approaches for congestion control. the network. Compared with non-scalable video, scalable Section III-B describes mechanisms for error control. video is more adaptable to the varying available bandwidth in the network. A. Congestion Control Delay. Streaming video requires bounded end-to-end Bursty loss and excessive delay have devastating e ect on delay so that packets can arrive at the receiver in time to video presentation quality and they are usually caused by be decoded and displayed. If a video packet does not arrive network congestion. Thus, congestion control mechanisms in time, the playout process will pause, which is annoying at end systems are necessary to help reducing packet loss to human eyes. A video packet that arrives beyond its delay and delay. bound say, its playout time is useless and can be regarded as lost. Since the Internet introduces time-varying delay, the form of for streaming 71 . Rate control control takes Typically, rate control video, congestion attempts to to provide continuous playout, a bu er at the receiver is minimize the possibility of network congestion by match- usually introduced before decoding 13 . Loss. Packet loss is inevitable in the Internet and packet ing the rate Next, we present various approaches network bandwidth. of the video stream to the available for rate loss can damage pictures, which is displeasing to human control in Section III-A.1 and describe an associated tech- eyes. Thus, it is desirable that a video stream be robust to packet loss. Multiple description coding is such a com- nique called rate shaping in Section III-A.2. pression technique to deal with packet loss 68 . Video-cassette-recorder VCR like function. A.1 Rate Control Some streaming applications require VCR-like functions Rate control is a technique used to determine the sending such as stop, pause resume, fast forward, fast backward rate of video tra c based on the estimated available band- and random access. Lin et al. 40 proposed a dual-bit- width in the network. Existing rate control schemes can
  5. 5. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 5 Receiver Receiver Receiver Receiver Link 1 Link 2 Link 1 Link 2 Sender Receiver Sender Receiver Receiver Receiver Receiver Receiver a b Fig. 7. a Unicast video distribution using multiple point-to-point connections. b Multicast video distribution using point-to-multipoint transmission. be classi ed into three categories: source-based, receiver- Single-channel Unicast based, and hybrid rate control, which are presented as fol- Multicast lows. low Service high Source-based Rate Control. Under the source-based Flexibility rate control, the sender is responsible for adapting the Bandwidth video transmission rate. Typically, feedback is employed high low Efficiency by source-based rate control mechanisms. Based upon the feedback information about the network, the sender could Receiver-based/Hybrid Rate Control regulate the rate of the video stream. The source-based rate control can be applied to both unicast see Fig. 7a Fig. 8. Trade-o between e ciency and exibility. 70 and multicast see Fig. 7b 4 . For unicast video, existing source-based rate control mechanisms follow two approaches: probe-based and Single-channel multicast is e cient since all the receivers model-based approach 71 . share one channel. However, single-channel multicast is The probe-based approach is based on probing exper- unable to provide exible services to meet the di erent iments. Speci cally, the source probes for the available demands from receivers with various access link band- network bandwidth by adjusting the sending rate in a way width. In contrast, if multicast video were to be deliv- that could maintain the packet loss ratio p below a certain ered through individual unicast streams, the bandwidth threshold Pth 70 . There are two ways to adjust the send- e ciency is low but the services could be di erentiated ing rate: 1 additive increase and multiplicative decrease since each receiver can negotiate the parameters of the ser- 70 , and 2 multiplicative increase and multiplicative de- vices with the source. Unicast and single-channel multicast crease 64 . are two extreme cases shown in Fig. 8. To achieve good The model-based approach is based on a throughput trade-o between bandwidth e ciency and service exi- model of a transmission control protocol TCP connec- bility for multicast video, receiver-based and hybrid rate tion. Speci cally, the throughput of a TCP connection can control were proposed. be characterized by the following formula 20 : Receiver-based Rate Control. Under the receiver- based rate control, the receivers regulate the receiving rate 1:22 MT U ; of video streams by adding dropping channels while the RT T pp = 1 sender does not participate in rate control 71 . Typically, where is the throughput of a TCP connection, MT U receiver-based rate control is used in multicasting scalable maximum transit unit is the packet size used by the con- video, where there are several layers in the scalable video nection, RT T is the round trip time for the connection, and each layer corresponds to one channel in the multicast p is the packet loss ratio experienced by the connection. tree. Under the model-based rate control, Eq. 1 is used to de- Similar to the source-based rate control, the existing termine the sending rate of the video stream. Thus, the receiver-based rate control mechanisms follow two ap- video connection could avoid congestion in a similar way proaches: probe-based and model-based approach. The to that of TCP and it can compete fairly with TCP ows. basic probe-based rate control consists of two parts 43 : For this reason, the model-based rate control is also called 1 When no congestion is detected, a receiver probes for TCP-friendly rate control 20 . the available bandwidth by joining a layer channel, result- For multicast under the source-based rate control, the ing in an increase of its receiving rate. If no congestion is sender uses a single channel to transport video to the re- detected after the joining, the join-experiment is successful. ceivers see Fig. 7b. Such multicast is called single- Otherwise, the receiver drops the newly added layer. 2 channel multicast. For single-channel multicast, only the When congestion is detected, a receiver drops a layer i.e., probe-based rate control can be employed 4 . leaves a channel, resulting in a reduction of its receiving
  6. 6. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 6 A layer-dropping lter can distinguish the layers and Streaming Storage Device drop layers according to importance. The dropping order Server Compressed is from the highest enhancement layer down to the base Video layer. A frequency lter performs operations on the compres- Rate sion layer. Speci cally, it operates in the frequency domain Shaper i.e., DCT coe cients. Frequency ltering mechanisms in- clude low-pass ltering, color reduction ltering and color- Source-based Transport to-monochrome ltering. Low-pass ltering is to discard the DCT coe cients of the higher frequencies. A color Rate Control Protocols reduction lter performs the same operation as a low-pass lter except that it only operates on the chrominance infor- mation in the video stream. A color-to-monochrome lter Internet removes all color information from the video stream. In MPEG, this is done by replacing each chrominance block Fig. 9. An architecture for source-based rate control. with an empty block. Unlike the frame-dropping lter, the frequency lter reduces the bandwidth without a ecting the frame rate. Its cost is reduction in presentation quality rate. of the resulting frame. Unlike the probe-based approach which implicitly esti- A re-quantization lter performs operations on the com- mates the available network bandwidth through probing pression layer i.e., DCT coe cients. The lter rst ex- experiments, the model-based approach uses explicit esti- tracts the DCT coe cients from the compressed video mation for the available network bandwidth. The model- stream through techniques like dequantization. Then the based approach is also based on Eq. 1. lter re-quantizes the DCT coe cients with a larger quan- Hybrid Rate Control. Under the hybrid rate control, tization step, resulting in rate reduction. the receivers regulate the receiving rate of video streams In sum, the purpose of congestion control is to avoid by adding dropping channels while the sender also adjusts congestion. On the other hand, packet loss is inevitable in the transmission rate of each channel based on feedback the Internet and may have signi cant impact on percep- from the receivers. Examples of hybrid rate control include tual quality. This prompts the need to design mechanisms the destination set grouping 10 and a layered multicast to maximize video presentation quality in the presence of scheme 29 . packet loss. Error control is such a mechanism, which will be presented next. A.2 Rate Shaping The objective of rate shaping is to match the rate of B. Error Control a pre-compressed video bitstream to the target rate con- Error control mechanisms include FEC, retransmission, straint 17 . A rate shaper or lter, which performs rate error-resilient encoding and error concealment, which are shaping, is required for the source-based rate control see described in Sections III-B.1 to III-B.4, respectively. Fig. 9. This is because the stored video may be pre- compressed at a certain rate, which may not match the B.1 FEC available bandwidth in the network. The principle of FEC is to add redundant information so There are many types of lters such as codec lter, that original message can be reconstructed in the presence frame-dropping lter, layer-dropping lter, frequency l- of packet loss. Based on the kind of redundant information ter, and re-quantization lter 75 , which are described as to be added, we classify existing FEC schemes into three follows. categories: channel coding, source coding-based FEC, and A codec lter is to decompress and compress a video joint source channel coding 71 . stream. It is commonly used to perform transcoding be- For Internet applications, channel coding is typically tween di erent compression schemes. Depending on the used in terms of block codes. Speci cally, a video stream compression scheme used, transcoding could be simpli ed is rst chopped into segments, each of which is packetized without full decompression and recompression. into k packets; then for each segment, a block code e.g., A frame-dropping lter can distinguish the frame types Tornado code 1 is applied to the k packets to generate e.g., I-, P-, and B-frame in MPEG and drop frames an n-packet block, where n k. To perfectly recover a according to importance. For example, the dropping or- segment, a user only needs to receive any k packets in the der would be rst B-frames, then P-frames, and nally I- n-packet block. frames. The frame-dropping lter is used to reduce the Source coding-based FEC SFEC is a recently devised data rate of a video stream by discarding a number of variant of FEC for Internet video 5 . Like channel cod- frames and transmitting the remaining frames at a lower ing, SFEC also adds redundant information to recover rate. The frame-dropping lter could be used at the source from loss. For example, the nth packet contains the nth 80 or used in the network see Section IV-A. group-of-blocks GOB and redundant information about
  7. 7. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 7 Sender Receiver recovery may not be useful for Internet video applica- packet 1 tions. Therefore, we will not present the standardized packet 2 lost error-resilient tools. Instead, we present multiple descrip- tion coding MDC 47 , 68 , which is promising for robust packet 3 Tc request for packet 2 retransmitted packet 2 RTT Internet video transmission. With MDC, a raw video sequence is compressed into Ds multiple streams referred to as descriptions as follows: Td (2) each description provides acceptable visual quality; more combined descriptions provide a better visual quality. The Fig. 10. Timing diagram for receiver-based control. advantages of MDC are 1 robustness to loss: even if a receiver gets only one description other descriptions being lost, it can still reconstruct video with acceptable quality; the n , 1th GOB, which is a compressed version of the 2 enhanced quality: if a receiver gets multiple descrip- n , 1th GOB with larger quantizer. tions, it can combine them together to produce a better Joint source channel coding is an approach to optimal reconstruction than that produced from any one of them. rate allocation between source coding and channel coding However, the advantages come with a cost. To make each 71 . description provide acceptable visual quality, each descrip- tion must carry su cient information about the original B.2 Delay-constrained Retransmission video. This will reduce the compression e ciency com- Retransmission is usually dismissed as a method to re- pared to conventional single description coding SDC. In cover lost packets in real-time video since a retransmit- addition, although more combined descriptions provide a ted packet may miss its play-out time. However, if the better visual quality, a certain degree of correlation be- one-way trip time is short with respect to the maximum tween the multiple descriptions has to be embedded in each allowable delay, a retransmission-based approach called description, resulting in further reduction of the compres- delay-constrained retransmission is a viable option for er- sion e ciency. Further investigation is needed to nd a ror control. good trade-o between the compression e ciency and the For unicast, the receiver can perform the following delay- reconstruction quality from one description. constrained retransmission scheme. B.4 Error Concealment When the receiver detects the loss of packet N : if Tc + RT T + Ds TdN Error-resilient encoding is executed by the source to en- send the request for packet N to the sender; hance robustness of compressed video before packet loss actually happens this is called preventive approach. On where Tc is the current time, RT T is an estimated round the other hand, error concealment is performed by the re- trip time, Ds is a slack term, and TdN is the time when ceiver when packet loss has already occurred this is called packet N is scheduled for display. The slack term Ds may reactive approach. Speci cally, error concealment is em- include tolerance of error in estimating RT T , the sender's ployed by the receiver to conceal the lost data and make response time, and the receiver's decoding delay. The tim- the presentation less displeasing to human eyes. ing diagram for receiver-based control is shown in Fig. 10, There are two basic approaches for error concealment, where Ds is only the receiver's decoding delay. It is clear namely, spatial and temporal interpolation. In spatial in- that the objective of the delay-constrained retransmission terpolation, missing pixel values are reconstructed using is to suppress requests of retransmissions that will not ar- neighboring spatial information. In temporal interpolation, rive in time for display. the lost data is reconstructed from data in the previous frames. Typically, spatial interpolation is used to recon- B.3 Error-resilient Encoding struct the missing data in intra-coded frames while tempo- The objective of error-resilient encoding is to enhance ral interpolation is used to reconstruct the missing data in robustness of compressed video to packet loss. The inter-coded frames. standardized error-resilient encoding schemes include re- In recent years, numerous error-concealment schemes synchronization marking, data partitioning, and data re- have been proposed in the literature refer to Ref. 69 for covery 33 . However, re-synchronization marking, data a good survey. Examples include maximally smooth re- partitioning, and data recovery are targeted at error-prone covery 67 , projection onto convex sets 58 , and various environments like wireless channels and may not be appli- motion vector and coding mode recovery methods such as cable to the Internet environment. For video transmission motion compensated temporal prediction 25 . over the Internet, the boundary of a packet already pro- So far, we have reviewed various application-layer QoS vides a synchronization point in the variable-length coded control techniques. These techniques are employed by the bit-stream at the receiver side. On the other hand, since end systems and do not require any QoS support from the a packet loss may cause the loss of all the motion data network. If the network is able to support QoS for video and its associated shape texture data, mechanisms such streaming, the performance can be further enhanced. In as re-synchronization marking, data partitioning, and data the next section, we present network QoS support mecha-
  8. 8. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 8 Client Control Control Server Filter Client Data Data Fig. 12. A system model of network ltering. Server R Filter which can be realized by TCP connections. The model R Client shown in Fig. 12 allows the client to communicate with only one host the last lter, which will either forward the requests or act upon them. The operations of a lter on the Filter R R data plane include: 1 receiving video stream from server or previous lter, and 2 sending video to client or next Client lter at the target rate. The operations of a lter on the control plane include: 1 receiving requests from client or Fig. 11. Filters placed inside the network. next lter, 2 acting upon requests, and 3 forwarding the requests to its previous lter. nisms, which are based on the best-e ort Internet. Typically, frame-dropping lters see Section III-A.2 are used as network lters. The receiver can change the band- IV. Continuous Media Distribution Services width of the media stream by sending requests to the lter In order to provide quality multimedia presentations, ad- to increase or decrease the frame dropping rate. To facili- equate support from the network is critical. This is because tate decisions on whether the lter should increase or de- network support can reduce transport delay and packet loss crease the bandwidth, the receiver continuously measures ratio. Streaming video and audio are classi ed as contin- the packet loss ratio p. Based on the packet loss ratio, a uous media because they consist of a sequence of media rate control mechanism can be designed as follows 32 . If quanta such as audio samples or video frames, which con- the packet loss ratio is higher than a threshold , the client vey meaningful information only when presented in time. will ask the lter to increase the frame dropping rate. If the Built on top of the Internet IP protocol, continuous media packet loss ratio is less than another threshold , distribution services are designed with the aim of provid- the receiver will ask the lter to reduce the frame dropping ing QoS and achieving e ciency for streaming video audio rate. over the best-e ort Internet. Continuous media distri- The advantages of using frame-dropping lters inside bution services include network ltering, application-level the network include: 1 improved video quality. For ex- multicast, and content replication, which are presented in ample, when a video stream ows from an upstream link Sections IV-A to IV-C, respectively. with larger available bandwidth to a downstream link with smaller available bandwidth, use of a frame-dropping lter A. Network Filtering at the connection point between the upstream link and As a congestion control technique, network ltering is the downstream link could help improve the video qual- aimed at maximizing video quality during network conges- ity. This is because the lter understands the format of the tion. As described in Section III-A.2, the lter at the video media stream and can drop packets in a way that grace- server can adapt the rate of video streams according to the fully degrades the stream's quality instead of corrupting network congestion status. However, the video server may the ow outright. 2 bandwidth e ciency. This is be- be too busy to handle the computation required to adapt cause the ltering can help to save network resources by each unicast video stream. Hence, the service providers discarding those frames that are late. may like to place lters in the network 32 . Figure 11 il- B. Application-level Multicast lustrates an example of placing lters in the network. The nodes labeled R denote routers that have no knowledge of The Internet's original design, while well suited for point- the format of the media streams and may randomly discard to-point applications like e-mail, le transfer, and Web packets. The Filter nodes receive the client's requests browsing, fails to e ectively support large-scale content de- and adapt the stream sent by the server accordingly. This livery like streaming-media multicast. In an attempt to ad- solution allows the service provider to place lters on the dress this shortcoming, a technology called IP multicast nodes that connect to network bottlenecks. Furthermore, was proposed twelve years ago 14 . As an extension to multiple lters can be placed along the path from a server the IP layer, IP multicast is capable of providing e cient to a client. multipoint packet delivery. To be speci c, the e ciency is To illustrate the operations of lters, a system model is achieved by having one and only one copy of the original IP depicted in Fig. 12 32 . The model consists of the server, packet sent by the multicast source be transported along the client, at least one lter, and two virtual channels be- any physical path in the IP multicast tree. However, with tween them. Of the two virtual channels, one is for control a decade of research and development, there are still many and the other is for data. The same channels exist between barriers in deploying IP multicast. These problems include any pair of lters. The control channel is bi-directional, scalability, network management, deployment and support
  9. 9. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 9 for higher layer functionality e.g., error, ow and conges- Mirroring is to place copies of the original multime- tion control. To address these issues, an application-level dia les on other machines scattered around the Internet. multicast mechanism was proposed 19 . That is, the original multimedia les are stored on the The application-level multicast is aimed at building a main server while copies of the original multimedia les multicast service on top of the Internet. It enables inde- are placed on the duplicate servers. In this way, clients pendent content delivery service providers CSPs, Inter- can retrieve multimedia data from the nearest duplicate net service providers ISPs, or enterprises to build their server, which gives the clients the best performance e.g., own Internet multicast networks and interconnect them lowest latency. Mirroring has some disadvantages. Cur- into larger, world-wide media multicast networks. That rently, mechanisms for establishing dedicated mirrors are is, the media multicast networks could support peering expensive, ad hoc, and slow. In addition, establishing a relationships at the application level or the streaming- mirror on an existing server, while cheaper, is still an ad media content layer, where content backbones intercon- hoc and administratively complex process. Finally, there nect service providers. Hence, much as the Internet is built is no standard way to make scripts and server setup easily from an interconnection of networks enabled through IP- transferable from one server to another. level peering relationships among ISPs, the media mul- Caching, which is based on the belief that di erent ticast networks can be built from an interconnection of clients will load many of the same contents, makes lo- content-distribution networks enabled through application- cal copies of contents that the clients retrieve. Typically, level peering relationships among various sorts of service clients in a single organization retrieve all contents from a providers, e.g., traditional ISPs, CSPs, and application ser- single local machine, called a cache. The cache retrieves a vice providers ASPs. video le from the origin server, storing a copy locally and We brie y describe the operation of the media multicast then passing it on to the client who requests it. If a client networks as follows. In the media multicast networks, each asks for a video le which the cache has already stored, the multicast-capable node called MediaBridge 19 performs cache will return the local copy rather than going all the routing at the application layer. In addition, each Media- way to the origin server where the video le resides. In ad- Bridge is interconnected with one or more neighboring Me- dition, cache sharing and cache hierarchies allow each cache diaBridge through explicit con guration, which de nes the to access les stored at other caches so that the load on the application-level overlay topology. Collectively, the Media- origin server can be reduced and network bottlenecks can Bridges in a media multicast network employ a distributed be alleviated 8 , 18 . application-level multicast routing algorithm to determine Most of the techniques for caching are targeted at generic the optimal virtual paths for propagating content through- web objects. Some recent work demonstrated that caching out the network. When the underlying network fails or be- strategies that are speci c to particular types of objects comes overly congested, the media multicast network auto- can help improve the overall performance 44 . For this matically and dynamically re-routes content via alternate reason, a lot of e orts have been contributed along this di- paths according to application-level routing policies. In rection 49 , 54 , 76 , 81 . A trivial extension of caching addition, MediaBridges dynamically subscribe to multicast techniques to video is to store complete video sequences in content when and only when a downstream client requests the cache. However, such approach may not be applicable it. This capability ensures that one and only one copy of due to the large volume of video data and possibly lim- the multicast content ows across any physical or virtual ited cache space on a proxy server. Instead, it was shown path independent of the number of downstream clients, re- that even a few cached frames can contribute to signi - sulting in savings of network bandwidth. cant improvement in performance 44 . Miao and Ortega The advantage of the application-level multicast is that 44 proposed two video caching strategies, initial caching it breaks the barriers such as scalability, network man- and selective caching, which store part of the video stream agement, support for congestion control, which have pre- on the cache. They demonstrated that selective caching vented ISPs from establishing IP multicast peering ar- can maximize the robustness of the video stream against rangements. network congestion while not violating the limited decoder bu er size. C. Content Replication To increase the cache hit rate and reduce latency ex- A important technique for improving scalability of the perienced by the clients, Kermode 35 proposed to use media delivery system is content media replication. The hints to assist the cache in scheduling data retrieval e.g., content replication takes two forms, namely, caching and prefetch and replacement of the cache content. The hints mirroring, which are deployed by publishers, CSPs and can be classi ed into two categories: content hints and ap- ISPs. Both caching and mirroring seek to place content plication hints. Content hints are provided by the content's closer to the clients and both share the following advan- sender while application hints are provided by the receiv- tages: ing application. Content hints describe the data and the reduced bandwidth consumption on network links, way it is delivered. For example, the content hints can reduced load on streaming servers, inform receiving caches about when an object's data can reduced latency for clients, be deleted from the cache. Application hints describe the increased availability. receiving application's needs for the object's data. An in-
  10. 10. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 10 stance of application hints is to describe the application's Deadlines d1 dA d2 d3 dB d4 d5 dC d6 d7 dD d8 needs for data in the immediate future so that the cache can prefetch data for the application. In addition, for the High rate 1 2 3 4 5 6 7 8 multicast scenario, content and application hints can be Low rate used by the cache to determine which multicast channels A B C D should be joined, when the join should occur, and for how Rate monotonic long the cache should listen. 1 A 2 A 3 B 4 B 5 C 6 C 7 D 8 D EDF 1 A 2 3 B 4 5 C 6 7 D 8 V. Streaming Servers t Fig. 13. EDF versus rate-monotonic scheduler. Streaming servers play a key role in providing stream- ing services. To o er quality streaming services, streaming servers are required to process multimedia data under tim- A.1 Process Management ing constraints in order to prevent artifacts e.g., jerkiness Process management deals with the main processor re- in video motion and pops in audio during playback at the source 57 . The process manager maps single processes clients. In addition, streaming servers also need to support onto the CPU resource according to a speci ed scheduling VCR-like control operations such as stop, pause resume, policy such that all processes can meet their requirements. fast forward and fast backward. Furthermore, streaming To ful ll the timing requirements of continuous media, servers have to retrieve media components in a synchronous the operating system must use real-time scheduling tech- fashion. For example, retrieving a lecture presentation re- niques. Most attempts to solve real-time scheduling prob- quires synchronizing video and audio with lecture slides. lems are variations of two basic algorithms for multime- A streaming server typically consists of the following dia systems: earliest deadline rst EDF 42 and rate- three subsystems. monotonic scheduling 12 . In EDF scheduling, each task 1. Communicator. A communicator involves the appli- is assigned a deadline and the tasks are processed in the cation layer and transport protocols implemented on the order of increasing deadlines. In rate-monotonic schedul- server shown in Fig. 1. Through a communicator, the ing, each task is assigned a static priority according to its clients can communicate with a server and retrieve multi- request rate.3 Speci cally, the task with the shortest pe- media contents in a continuous and synchronous manner. riod or the highest rate gets the highest priority, and the We have addressed the application layer in Section III and task with the longest period or the lowest rate gets the will address transport protocols in Section VII. lowest priority. Then the tasks are processed in the order 2. Operating system. Di erent from traditional oper- of priorities. ating systems, an operating system for streaming services Both EDF and rate-monotonic scheduling are preemp- needs to satisfy real-time requirements for streaming ap- tive, that is, the schedulers can preempt the running task plications. and schedule the new task for the processor based on its 3. Storage system. A storage system for streaming ser- deadline priority. The execution of the interrupted task vices has to support continuous media storage and re- will resume at a later time. The di erence between EDF trieval. and rate-monotonic scheduling is as follows. EDF sched- In this section, we are primarily concerned with oper- uler is based on one-priority task queue and the processor ating system support and storage systems for streaming runs the task with the earliest deadline. On the other hand, media, which will be presented in Sections V-A and V-B, rate-monotonic scheduler is a static-priority scheduler with respectively. multiple-priority task queues. That is, the tasks in the lower-priority queue cannot be executed until all the tasks in the higher-priority queues are served. In the example of A. Real-time Operating System Fig. 13, there are two task sequences. The high rate se- quence is Task 1 to Task 8; the low rate sequence is Task A The operating system shields the computer hardware to Task D. As shown in Fig. 13, in rate-monotonic schedul- from all other software. The operating system o ers vari- ing, Task 2 preempts Task A since Task 2 has a higher pri- ous services related to the essential resources, such as the ority; on the other hand, in EDF, Task 2 does not preempt CPU, main memory, storage, and all input and output de- Task A since Task A and Task 2 have the same deadlines vices. In the following sections, we discuss the unique issues dA=d2. It is clear that a rate-monotonic scheduler is of real-time operating systems and review the associated more prone to task switching than EDF. In general, the approaches to the problems introduced by streaming ser- rate-monotonic algorithm ensures that all deadlines will be vices. Speci cally, Section V-A.1 shows how process man- met if the processor utilization is under 69 percent 12 ; the agement takes into account the timing requirements im- EDF algorithm can achieve 100 percent utilization of pro- posed by streaming media and apply appropriate schedul- cessor but may not guarantee the processing of some tasks ing methods; Section V-A.2 describes how to manage re- during overload periods. sources to accommodate timing requirements; Section V- A.3 discusses the issues on le management. 3 Assume that each task is periodic.
  11. 11. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 11 A.2 Resource Management SCAN-EDF 48 , grouped sweeping scheduling GSS 77 , Resources in a multimedia server include CPUs, mem- and dynamic circular SCAN DC-SCAN 30 , which are ories, and storage devices. Since resources are limited, a described as follows. multimedia server can only serve a limited number of clients The SCAN-EDF combines the seek optimization of the with requested QoS. Therefore, resource management is re- traditional disk-scheduling method SCAN 15 and the quired to manage resources so as to accommodate timing real-time guarantees of the EDF mechanism. Note that requirements. Resource management involves admission the EDF mechanism in disk scheduling is non-preemptive, control and resource allocation. Speci cally, before admit- which is di erent from the preemptive EDF scheme used ting a new client, a multimedia server must perform ad- in process management. mission control test to decide whether a new connection The grouped sweeping scheduling divides the set of n can be admitted without violating performance guarantees streams into g groups; groups can be formed in such a way already committed to existing connections. If a connec- that all streams belonging to the same group have similar tion is accepted, the resource manager allocates resources deadlines. Individual streams within a group are served required to meet the QoS for the new connection. according to SCAN. Admission control algorithms can be classi ed into two DC-SCAN employs a circular SCAN 56 service order categories: deterministic admission control 22 and statis- so as to minimize disk seek overhead and variations in tical admission control 65 . Deterministic admission con- inter-service time, resulting in high throughput. It reduces trol algorithms provide hard guarantees to clients while start-up delay by dynamically adapting the circular SCAN statistical admission control algorithms provide statistical service order. guarantees to clients i.e., the continuity requirements of As a result, the three algorithms, SCAN-EDF, GSS and at least a xed percentage of media units are ensured to be DC-SCAN, can improve continuous media data throughput met. The advantages of deterministic admission control and meet real-time requirements imposed by continuous are simplicity and strict assurance of quality; its limita- media. tion is lower utilization of server resources. In contrast to Another function needs to be supported by le man- this, statistical admission control improves the utilization agement is interactive control such as pause resume, fast of server resources by exploiting the human perceptual tol- forward and fast backward. The pause resume opera- erances as well as the di erences between the average and tions pose a signi cant challenge to the design of e cient the worst-case performance characteristics of a multimedia bu er management schemes because they interfere with the server 65 . sharing of a multimedia stream among di erent viewers. Corresponding to admission control algorithms, resource This issue is still under study. The fast-forward and fast- allocation schemes can be either deterministic or statisti- backward operations can be implemented either by playing cal. Deterministic resource allocation schemes make reser- back media at a higher rate than normal or by continu- vations for the worst case, e.g., reserving bandwidth for the ing playback at the normal rate while skipping some data. longest processing time and the highest rate that a task Since the former approach can signi cantly increase the might ever need. On the other hand, statistical resource data rate, its direct implementation is impractical. The allocation schemes achieve higher utilization by allowing latter approach, on the other hand, needs to be carefully temporary overload and a small percentage of QoS viola- designed if inter-data dependencies are present for exam- tions. ple, P frames and B frames depend on I frames in MPEG 9 . As a result, for streaming MPEG video, entire group of A.3 File Management pictures GOPs have to be skipped during fast-forward op- The le system provides access and control functions for erations, and the viewer sees normal resolution video with le storage and retrieval 23 . There are two basic ap- gaps, which is acceptable. proaches to supporting continuous media in le systems. B. Storage System In the rst approach, the organization of les on disks re- mains as it is for discrete data i.e., a le is not scattered There are several challenging issues on designing storage across several disks, with the necessary real-time support systems for multimedia, such as high throughput, large ca- provided through special disk-scheduling algorithms and pacity and fault-tolerance 23 , which we discuss as follows. enough bu er capacity to avoid jitter. The second ap- Increase throughput with data striping. If an entire proach is to organize audio and video les on distributed video le is stored on one disk, the number of concurrent storage like disk arrays. Under the second approach, the accesses to that le are limited by the throughput of that disk throughput can be improved by scattering striping disk. This dictates the number of clients that are view- each audio video le across several disks described later in ing the same video le. To overcome this limitation, data Section V-B and disk seek-times can be reduced by disk- striping was proposed 55 . Under data striping schemes, scheduling algorithms. a multimedia le is scattered across multiple disks and the Traditional disk-scheduling algorithms such as rst- disk array can be accessed in parallel. An example of data come- rst-serve and SCAN 15 , 61 do not provide real- striping is shown in Fig. 14, where Block 1, 2 and 3 of File A time guarantees. Hence, many disk-scheduling algorithms can be read in parallel, resulting in increased throughput. have been proposed to address this issue. These include An important issue in design of a data-striping scheme is to
  12. 12. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 12 Disk Controller Video Server Buffer Memory File A, Block 1 File A, Block 2 File A, Block 3 File B, Block 1 File B, Block 2 File A, Block 4 File A, Block 5 File A, Block 6 File B, Block 3 File B, Block 4 Disk Controller Tape Library . . . . . . . . . . . . . . . . . . Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 Fig. 16. Hierarchical storage. Fig. 14. Data striped in multiple disks and accessed in parallel. Client Client Client Client Client LAN/WAN Video Server Buffer Memory Video Video Video Video Server Server Server Server Disks Disks Storage Area Network . . . . . . (Fiber Channel) Disk Controller . . . Tape Library Disks . . . LAN: local area network WAN: wide area network Fig. 15. Disk-based video storage. Fig. 17. An SAN-based server and storage architecture for large-scale deployment. balance the load of most heavily loaded disks to avoid over- load situations while keeping latency small. The designers ual hosts, through ber channel arbitrated loop FC-AL have to trade o load balance with low latency since load connections; or the connections in an SAN can form a ma- balance and low latency are two con icting objectives 55 . trix through a ber channel switch. With these high-speed Note that data striping is di erent from le replication an connections, an SAN is able to provide a many-to-many expensive way to increase throughput in that data striping relationship between heterogeneous storage devices e.g., allows only one copy of a video le stored on disks while disk arrays, tape libraries, and optical storage arrays, and le replication allows multiple copies of a video le stored multiple servers and storage clients. on disks. Another approach to deploying large-scale storage is net- Increase capacity with tertiary and hierarchical work attached storage NAS shown in Fig. 18 26 . Dif- storage. The introduction of multiple disks can increase ferent from SAN, an NAS equipment can attach to a local the storage capacity as shown in Fig. 15. However, the area network LAN or a wide area network WAN di- cost for large archives e.g., with 40 tera-byte storage re- rectly. This is because an NAS equipment includes a le quirement is prohibitively high if a large number of disks system such as network le system NFS and can run on are used for storage. To keep the storage cost down, ter- Ethernet, asynchronous transfer mode ATM, and ber tiary storage e.g., an automated tape library or CD-ROM distributed data interface FDDI. The protocols that NAS jukebox must be added. uses include hypertext transfer protocol HTTP, NFS, To reduce the overall cost, a hierarchical storage archi- TCP, UDP, and IP. The main di erences between NAS tecture shown in Fig. 16 is typically used. Under the hi- and SAN are summarized in Table I. On the other hand, erarchical storage architecture, only a fraction of the total both NAS and SAN achieve data separation from the ap- storage is kept on disks while the major remaining portion plication server so that storage management can be simpli- is kept on a tertiary tape system. Speci cally, frequently ed. Speci cally, both NAS and SAN have the following requested video les are kept on disks for quick access; the advantages over the traditional storage: 1 simpli cation remainder resides in the automated tape library. of storage management by centralizing storage, 2 scala- To deploy streaming services at a large scale, a stor- bility, and 3 fault tolerance. age area network SAN architecture was proposed shown Fault tolerance. In order to ensure uninterrupted service in Fig. 17 16 , 28 . An SAN can provide high-speed even in the presence of disk failures, a server must be able data pipes between storage devices and hosts at far greater to reconstruct lost information. This can be achieved by distances than conventional host-attached small-computer- using redundant information. The redundant information systems-interface SCSI. The connections in an SAN can could be either parity data generated by error-correcting be direct links between speci c storage devices and individ- codes like FEC or duplicate data on separate disks. That
  13. 13. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 3, MARCH 2001 13 Client Slide 1 Slide 2 Slide 3 Slide 4 Client Client Client LAN/WAN Audio sequence . . . Video Video Fig. 19. Synchronization between the slides and the commenting Server Server . . . audio stream. Network Attached Storage Fig. 18. A network-attached storage architecture for large-scale de- 1. Intra-stream synchronization. The lowest layer of ployment. continuous media or time-dependent data such as video TABLE I and audio is the media layer. The unit of the media layer Differences between SAN and NAS is logical data unit such as a video audio frame, which ad- heres to strict temporal constraints to ensure acceptable Networking technologies Protocols user perception at playback. Synchronization at this layer SAN Fiber channel Encapsulated SCSI is referred to as intra-stream synchronization, which main- NAS Ethernet, ATM, FDDI HTTP, NFS, TCP, UDP, IP tains the continuity of logical data units. Without intra- stream synchronization, the presentation of the stream may be interrupted by pauses or gaps. is, there are two fault-tolerant techniques: error-correcting 2. Inter-stream synchronization. The second layer of i.e., parity-encoding 2 , 46 , 63 and mirroring 45 . Par- time-dependent data is the stream layer. The unit of the ity data adds a small storage overhead but it requires syn- stream layer is a whole stream. Synchronization at this chronization of reads and additional processing time to layer is referred to as inter-stream synchronization, which decode lost information. In contrast, mirroring does not maintains temporal relationships among di erent contin- require synchronization of reads or additional processing uous media. Without inter-stream synchronization, skew time to decode lost information, which signi cantly simpli- between the streams may become intolerable. For example, es design and implementation of video servers. However, users could be annoyed if they notice that the movements mirroring incurs at least twice as much storage volume as in of the lips of a speaker do not correspond to the presented the non-fault-tolerant case. As a result, there is a trade-o audio. between reliability and complexity cost. A recent study 3. Inter-object synchronization. The highest layer of 21 shows that, for the same degree of reliability, mirroring- a multimedia document is the object layer, which inte- based schemes always outperform parity-based schemes in grates streams and time-independent data such as text and terms of per stream cost as well as restart latency after disk still images. Synchronization at this layer is referred to as failure. inter-object synchronization. The objective of inter-object To summarize, we have addressed various issues in synchronization is to start and stop the presentation of streaming server design and presented important tech- the time-independent data within a tolerable time interval, niques for e cient, scalable and reliable storage and re- if some previously de ned points of the presentation of a trieval of multimedia les. In the next section, we discuss time-dependent media object are reached. Without inter- synchronization mechanisms for streaming media. object synchronization, for example, the audience of a slide show could be annoyed if the audio is commenting one slide VI. Media Synchronization while another slide is being presented. A major feature that distinguishes multimedia applica- Media streams may lose synchronization after moving tions from other traditional data applications is the inte- from the server to the client. As shown in Fig. 1, there are gration of various media streams that must be presented many components along the path which transports data in a synchronized fashion. For example, in distance learn- from its storage site to the user. Speci cally, the server ing, the presentation of slides should be synchronized with retrieves data from the storage device and sends that data the commenting audio stream see Fig. 19. Otherwise, the into the network; the network transports the data to the current slide being displayed on the screen may not corre- client; the client reads the data from its network interface spond to the lecturer's explanation heard by the students, and presents it to the user; operating systems and proto- which is annoying. With media synchronization, the ap- cols allow these systems to run and do their work. Each of plication at the receiver side can present the media in the these components on the transport path performs a certain same way as they were originally captured. task and a ects the data in a di erent way. They all in- Media synchronization refers to maintaining the tempo- evitably introduce delays and delay variations in either pre- ral relationships within one data stream and between var- dictable or unpredictable manners. In particular, the delay ious media streams. There are three levels of synchroniza- introduced in the network is typically unpredictable due to tion, namely, intra-stream, inter-stream and inter-object the best-e ort nature of the Internet. The incurred de- synchronization. The three levels of synchronization corre- lays and delay variations could disrupt intra-media, inter- spond to three semantic layers of multimedia data as fol- media, and inter-object synchronization. Therefore, media lows 57 . synchronization mechanisms are required to ensure proper