IBM VideoCharger and Digital Library MediaBase.doc
Upcoming SlideShare
Loading in...5
×
 

IBM VideoCharger and Digital Library MediaBase.doc

on

  • 925 views

 

Statistics

Views

Total Views
925
Views on SlideShare
925
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

IBM VideoCharger and Digital Library MediaBase.doc IBM VideoCharger and Digital Library MediaBase.doc Document Transcript

  • A Review of Video Streaming over the Internet Abstract : Ideally, video and audio are streamed across the Internet from the server to the client in response to a client request for a Web page containing embedded videos. The client plays the incoming multimedia stream in real time as the data is received. Quite a few video streamers are starting to appear and many pseudo-streaming technologies and other potential solutions are also in the pipeline. Generally streaming video solutions may work on a closed-loop intranet, but for mass-market Internet use, they're simply dysfunctional. However current transport protocol, codec and scalability research will eventually make video on the Web a practical reality. Below we have reviewed the currently available commercial products which purport to provide video streaming capabilities over the Internet and outlined their limitations. Then we describe the major research projects curently underway, which are attempting to solve some of these limitations. Finally we compare and
  • evaluate the SuperNOVA project with respect to other research projects and the current commercial products. Introduction For a long time now, its been very easy to download and play back high- quality audio and video files from the Internet. Current web browsers and servers support full-file transfer mode of document retrieval. However, full file transfer means very long, unacceptable transfer times and playback latency. Ideally, video and audio should be streamed across the Internet from the server to the client in response to a client request for a Web page containing embedded videos. The client plays the incoming multimedia stream in real time as the data is received. Audio streaming is becoming widely accepted and deployed. In particular, Progressive Networks' RealAudio has a wide following. Although streaming audio programs are considerably further along than video, they are still nowhere near typical computer-sound quality. The idea of streaming video over the network has been gaining a lot of interest. The current Internet is a best effort network and interconnects sites with widely varying bandwidth capabililties. In the future the Internet will see the rollout of ATM, RSVP with the ability to control Quality of Services (QoS) and
  • mobile networks with widely varying QoS. Therefore it will remain a very heterogeneous network. In this report firstly we present a brief review of the current video compression standards, evolving standards and techniques and the internet transport protocols being deployed. In addition, issues such as the need for servers, plugins and firewall penetration are discussed. There are many commerical streaming video products becoming available as well as many research projects in this area. We then review the currently available commercial products which purport to provide video streaming capabilities over the Internet and out line their current limitations. Then we describe the major research projects currently underway, which are attempting to solve some of these limitations. Finally we compare and evaluate the SuperNOVA project with respect to other research projects and the cur rent commercial products. Video Compression Standards The most important video codec standards for streaming video are H.261, H.263, MJPEG, MPEG1, MPEG2 and MPEG4. A brief description of these is given below. Compared to video codecs for CD-ROM or TV broadcast, codecs designed for the Internet require greater scalability, lower computational complexity, greater resiliency to network losses, and lower encode/decode latency for video conferencing. In addition, the codecs must be tightly linked to network delivery software to achieve the highest possible frame rates and picture quality. As one looks at the existing codec standards, it becomes apparent that none are ideal for Internet video. In fact, it is quite
  • clear that over the next few years, we will see a host of new algorithms that are specifically designed for the Internet and are thus more suitable for it. Research is currently underway looking at both new scalable, flexible codecs and ways of scaling existing codecs using transcoding and filters. Section 3 outlines current research in video scalability. New algorithms specifically targeted at Internet video are being developed. Consequently application framework standards such as H323/H.324 for videoconferencing and MPEG4, are being designed that will easily incorporate these new codec innovations into applications being developed today, without significant rework. H.261 H.261 is also known as P*64 where P is an integer number meant to represent multiples of 64kbit/sec. H.261 was targeted at teleconferencing applications and is intended for carrying video over ISDN - in particular for face-to-face videophone applications and for videoconferencing. The actual encoding algorithm is similar to (but incompatible with) that of MPEG. H.261 needs substantially less CPU power for real-time encoding than MPEG. The algorithm includes a mechanism which optimises bandwidth usage by trading picture quality against motion, so that a quickly-changing picture will have a lower quality than a relatively static picture. H.261 used in this way is thus a constant-bit-rate encoding rather than a constant-quality, variable-bit- rate encoding. H.263 H.263 is a draft ITU-T standard designed for low bitrate communication. It is expected that the standard will be used for a wide range of bitrates, not just low bitrate applications. It is expected that H.263 will replace H.261 in many applications. The coding algorithm of H.263 is similar to that used by H.261, however with some improvements and changes to improve performance and error recovery. The differences between the H.261 and H.263 coding
  • algorithms are listed below. Half pixel precision is used for H.263 motion compensation whereas H.261 used full pixel precision and a loop filter. Some parts of the hierarchical structure of the datastream are now optional, so the codec can be configured for a lower datarate or better error recovery. There are now four optional negotiable options included to improve performance: Unrestricted Motion Vectors, Syntax-based arithmetic coding, Advance prediction, and forward and backward frame prediction similar to MPEG called P-B frames. H.263 supports five resolutions. In addition to QCIF and CIF that were supported by H.261 there is SQCIF, 4CIF, and 16CIF. SQCIF is approximately half the resolution of QCIF. 4CIF and 16CIF are 4 and 16 times the resolution of CIF respectively. The support of 4CIF and 16CIF means the codec could then compete with other higher bitrate video coding standards such as the MPEG standards. MJPEG There is really no such standard as "motion JPEG" or "MJPEG" for video. Various vendors have applied JPEG to individual frames of a video sequence, and have called the result "M-JPEG". JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes. It works
  • well on photographs, naturalistic artwork, and similar material; not so well on lettering, simple cartoons, or line drawings. JPEG is a lossy compression algorithm which uses DCT-based encoding. JPEG can typically achieve 10:1 to 20:1 compression without visible loss, 30:1 to 50:1 compression is possible with small to moderate defects, while for very-low-quality purposes such as previews or archive indexes, 100:1 compression is quite feasible. Non-linear video editors are typically used in broadcast TV, commercial post production, and high-end corporate media departments. Low bitrate MPEG-1 quality is unacceptable to these customers, and it is difficult to edit video sequences that use inter-frame compression. Consequently, non-linear editors (e.g., AVID, Matrox, FAST, etc.) will continue to use motion JPEG with low compression factors (e.g., 6:1 to 10:1). MPEG-1 MPEG 1, 2 and 4 are currently accepted, draft and developing standards respectively, for the bandwidth efficient transmission of video and audio. The MPEG-1 codec targets a bandwidth of 1-1.5 Mbps offering VHS quality video at CIF (352x288) resolution and 30 frames per second. MPEG-1 requires expensive hardware for real- time encoding. While decoding can be done in software, most implementations consume a large fraction of a high-end processor. MPEG-1 does not offer resolution scalability and the video quality is highly susceptible to packet losses, due to the dependencies present in the P (predicted) and B (bi-directionally predicted) frames. The B-frames also introduce latency in the encode process, since encoding frame N needs access to frame N+k, making it less suitable for video conferencing.
  • MPEG-2 MPEG 2 extends MPEG 1 by including support for higher resolution video and increased audio capabilities. The targeted bit rate for MPEG 2 is 4-15Mbits/s, providing broadcast quality full-screen video. The MPEG 2 draft standard does cater for scalability. Three (3) types of scalability; Signal-to-Noise Ratio (SNR), Spatial and Temporal, and one extension (that can be used to implement scalability) Data Partitioning, have been defined. Compared with MPEG-1, it requires even more expensive hardware to encode and decode. It is also prone to poor video quality in the presence of losses, for the same reasons as MPEG-1. Both MPEG-1 and MPEG-2 are well suited to the purposes for which they were developed. For example, MPEG-1 works very well for playback from CD-ROM, and MPEG-2 is great for high-quality archiving applications and for TV broadcast applications. In the case of satellite broadcasts, MPEG-2 allows >5 digital channels to be encoded using the same bandwidth as used by a single analog channel today, without sacrificing video quality. Given this major advantage, the large encoding costs are really not a factor. However, for existing computer and Internet infrastructures, MPEG-based solutions are too expensive and require too much bandwidth; they were not designed with the Internet in mind. MPEG-4 The intention of MPEG 4 is to provide a compression scheme suitable for video conferencing, i.e. data rates less 64Kbits/s. MPEG4 will be based on the segmentation of audiovisual scenes into AVOs or "audio/visual objects" which can be multiplexed for transmission over heterogeneous networks. The MPEG-4 framework currently being developed focuses on a language called MSDL (MPEG-4 Syntactic Description Language). MSDL allows applications to construct new codecs by composing more primitive components and providing the ability to dynamically download these components over the Internet. This philosophy is similar to that for the multimedia APIs being developed for Sun Microsystems Java, where it will be possible to dynamically download codec components. This trend is also seen in products from major vendors such as Microsoft and Netscape, where they allow for multiple audio and video codecs to be plugged into their real-time streaming solutions.
  • Scalable Video Compression Techniques These can be sub-divided into DCT-based schemes (which include H.261, H.263, MPEG1 and MPEG2), wavelet and sub-band schemes, fractal-based schemes and image segmentation/region based compression schemes (MPEG4). Subband/Wavelet Coding The majority of scalable video codecs are based on subband coding techniques of which the most widely used is the wavelet transform. VDONet and Vxtreme use the Wavelet codecs. There is also a lot of work going on in research organisations looking at the application of wavelet and subband coding techniques to scalable video codecs - see sections 7.2, 7.3, 7.4, 7.5. Fractal Video Coding 13 14 Various research groups [ , ] are investigating the application of fractal compression to scalable video. Iterated Systems have developed a commercial product which has been implemented within Progressive Network's RealVideo product. Image Segmentation and Object-based Video Coding A number of research groups are investigating the application of image segmentation to video compression. The approaches involve extracting important subsets of the image content of each frame and only delivering the most important e.g. object boundaries, moving objects. Object-based coding can achieve very high data compression rates while maintaining an acceptable visual quality in the decoded images. However object-based coders are computationally intensive and to be viable as a real time process, an object-based coder would need to have the image segmentation algorithm implemented as a VLSI array.See section 7.6, the UC Davis Image Sequence Processing Group [22] and 7.7 the Video Communication Research Group (VCRG), Uni. of Western Australia [19], and the Bath Video Coding Group
  • [21]. The MPEG4 standard is directly related to this content-based scalable video codec approach. Internet Transport Protocols TCP Transmission Control Protocol HTTP (Hypertext Transfer Protocol) uses TCP as the protocol for reliable document transfer. If packets are delayed or damaged, TCP will effectively stop traffic until either the original packets or backup packets arrive. Hence it's unsuitable for video and audio because: • TCP imposes its own flow control and windowing schemes on the data stream, effectively destroying temporal relations between video frames and audio packets • Reliable message delivery is unnecessary for video and audio - losses are tolerable and TCP retransmission causes further jitter and skew. UDP UDP (User Datagram Protocol) is the alternative to TCP. RealPlayer, StreamWorks and VDOLive use this approach. (RealPlayer gives you a choice of UDP or TCP, but the former is preferred.) UDP forsakes TCP's error correction and allows packets to drop out if they're late or damaged. When this happens, you'll hear or see a dropout, but the stream will continue. Despite the prospect of dropouts, this approach is arguably better for
  • continuous media delivery. If broadcasting live events, everyone will get the same information simultaneously. One disadvantage to the UDP approach is that many network firewalls block UDP information. While Progressive Networks, Xing and VDOnet offer work-arounds for client sites (revert to TCP), some users simply may not be able to access UDP files. Server or Serverless Two major approaches are emerging for streaming multimedia content to clients. The first is the server-less approach which uses the standard web- server and the associated HTTP protocol to get the multimedia data to the client. The second is the server-based approach that uses a separate server specialized to the video/multimedia streaming task. The specialization takes many forms, including optimized routines for reading the huge multimedia files from disk, the flexibility to choose any of UDP/TCP/HTTP/Multicast protocols to deliver data, and the option to exploit continuous contact between client and server to dynamically optimize content delivery to the client. The primary advantages of the server-less approach are: (i) there is one less software piece to learn and manage, and (ii) from an economic perspective, there is no video-server to pay for. In contrast, the server-based approach has the
  • advantages that it: (i) makes more efficient use of the network bandwidth, (ii) offers better video quality to the end user, (iii) supports advanced features like admission control and multi-stream multimedia content, (iv) scales to support large number of end users, and (v) protects content copyright. The tradeoffs clearly indicate that for serious providers of streaming multimedia content the server-based approach is the superior solution. RealPlayer, StreamWorks and VDOnet's VDOLive require you to install their A/V server software on your Web server computer. Among other things, this software can tailor the quality and number of streams, and provide detailed reports of who requested which streams. Other programs, such as Shockwave and VivoActive, are serverless. They don't require any special A/V server software beyond your ordinary Web server software. With these programs, you simply link a file on your server's hard drive from a Web page. When someone hits the link, the file starts to download. Serverless programs are simple to incorporate into a Web site but don't have the reporting capabilities of server-based programs. And because they lack both stream- and bandwidth-management features, they may be problematic if you need to support many simultaneous streams. Java Replayers Replacing Plugins
  • New solutions are appearing which use Java to eliminate the need to download and install plugins or players. Such an approach will become standard once the Java Media Player APIs being developed by Sun, Silicon Graphics and Intel are available. This approach will also ensure client platform independence. Vosaic appears to be one of the few products with a Java replayer which supports H.263. FireWalls Nearly all streaming products require users behind a firewall to have a UDP port opened for the video streams to pass through (1558 for StreamWorks, 7000 for VDOLive, 7070 for RealAudio). Rather than punch security holes in the firewall, Xing/StreamWorks has developed a proxy software package you can compile and use, while VDONet/VDOLIve and Progressive Networks/RealPlayer are approaching leading firewall developers to get support for their streams incorporated into upcoming products. Currently a number of products change from UDP to HTTP or TCP when UDP can't get through firewall restrictions. This reduces the quality of the video. In all cases, it's still a security issue for network managers.
  • Commercial Real Time Video Streamers MacroMedia's Streaming Shockwave Shockwave for Director consists of two components. On the HTTP server side, the Afterburner tool compresses Director movies to make them available on the Internet. On the client side, the Shockwave plugin lets the user incorporate Director movies into the page layout of their HTML document. The current Shockwave plugin is not streaming. The entire Director movie must be downloaded before playback. The current release allows for a seperate real- time audio stream which can be encoded at 8,16,32 or 64 kbps, depending on the most likely bandwidth available to users. Macromedia have just released Director 6 Multimedia Studio which supposedly includes new Streaming Shockwave technology. Macromedia and Progressive Networks have also announced the integration of Shockwave Flash, a vector-based animation and graphics system, on top of RealMedia, to enable audio and video streaming of output from Flash.It is a serverless product which relies on the HTTP protocol only. It isn't capable of live feeds and makes no use of IP Multicast, so it can't scale well to support thousands of enterprise customers while efficiently using bandwidth.
  • Progressive Network's RealVideo Progressive Networks has recently launched RealVideo, the streaming video version of their well-known RealAudio product. Both server and client versions have been released. In addition Progressive Networks have released a range of video-oriented content development tools, some their own, others developed by third parties. Users need to install the RealServer 4.0 and the RealPlayer Plus 4.0. It uses the RTSP protocol on top of UDP. Users apparently have a choice of either fixed or optimized frame rate encoding in the new RealVideo encoder. Users choose between a number of pre-defined encoding templates which correspond to the most appropriate audio and video formats for a given bandwidth. "Stream thinning" detects poor or congested Internet connections and will dynamically adjust the video frame rate in real-time. This is presumably frame dropping. "Smart networking" automatically delivers audio and video streams via the most efficient network protocol. This is presumable choosing between TCP, UDP or UDP multicast. The choice of TCP would be to deal with firewall restrictions blocking UDP. Progressive Networks have recently licensed in ClearVideo, a fractal-based
  • video compression technology from Iterated Systems (see http://www.iterated.com) to complement their internally-developed compression methods. RealVideo 1.0 provides two codecs RealVideo Standard (developed by Progressive Networks) and RealVideo Fractal (using Clear Video technology from Iterated Systems, Inc.). Xing Technology's StreamWorks StreamWorks streams video and audio over the WWW using UDP/IP. Video streams can be MPEG1 while audio can be MPEG1 or MPEG1 private data streams containing MPEG2 LBR audio. Providers encode content at 8.5, 24, 56 or 112 kbps depending on the bandwidth capabilities of the potential users. StreamWorks supports a process called thinning which reduces a high- bandwidth stream so it can be transmitted over a lower bandwidth connection. At low bandwidths, the software maintains a continuous audio stream of 8 to 10 Kbps, and the video stream uses whatever bandwidth is left. The MPEG- based compression allows the software to drop frames from the stream, creating a jerky video sequence with almost no motion, while maintaining a smooth audio playback. The quality of the frames that do get through is still pretty good, just not as fluid as one would expect from real video.
  • StreamWorks' is able to broadcast streams to "relay servers". By using a star configuration, it's possible to provide a video feed from a single server to regional servers that then provide that stream to desktop clients. StreamWorks' technology includes three components: the client software, the server software and a video capture/encoding box called the AVTrans encoder to compress audio and video streams. These streams are transferred to a Unix server running the StreamWorks server software over a TCP/IP network and, from there, are broadcast over the network to client workstations. The AVTrans encoder is capable of creating a range of compressed streams, ranging from an 8.5-Kbps low bit rate format that produces 8-kilohertz mono audio on the client, to a 112-Kbps stream that provides 44 KHz of stereo audio or 30-frames-per-second, quarter-screen video for large bandwidth connections such as Ethernet or a T1. Like the other products examined in this review, StreamWorks requires you to register its mime type in your Web server's configuration file, and you need to open a UDP port (1558) for delivering video to client workstations. The server software can recode the compressed streams on the fly to compensate for large numbers of users and a limited bandwidth. The server is configured from a text file, so you can limit the total bandwidth output, the maximum number of
  • simultaneous streams and the maximum bit rate per stream. The maximum default configuration for the server is 10 Mbps for an Ethernet connection, but that can be adjusted depending on how your client machines are connecting-- via 14.4-Kbps modem pool, ISDN hub or 100-Mbps backbone. With a 28.8- Kbps modem connection, the StreamWorks server drops to a much lower frame rate of 2 to 3 frames per second, producing a jerky, halting video image while maintaining continuous audio continuity. Client performance is better than VDOLive. VDONet's VDOLive VDONet claim that content providers only need one video source which can be scaled on the fly for both high and low scale connections. They claim to be able to deliver 10-15 fps over a 28.8 kbps modem using a proprietary video compression scheme based in part on wavelet techniques (VDOWave). Under ideal conditions (minimal Internet traffic, no local network overhead, minimal overload on the VDOLive On-Demand Server): with a 14.4 kbps modem: up to 2 to 3 frames per second with a 28.8 kbps modem: from 8 to 12 frames per second with an ISDN line: up to 20 frames per second. VDONet's VDOLive boasts a slightly higher frame rate over a standard 28.8-Kbps
  • modem than StreamWorks because it uses a wavelet compression technology that lets it shave layers of quality off each frame that's transmitted, rather than dropping whole frames. This creates a stream that is smoother at low bit rates, but of lower visual clarity and quality. VDOLive appears to be the only commercial product which tries to estimate bandwidth and adapt dynamically. The image quality is very poor at times but audio is good. VDOLive includes two programs, VDO Capture which lets you capture video streams and VDO Clip which compresses a previously captured video stream and encodes it for delivery from a VDO server. VDO Capture supports seven full-motion video cards that can capture 16- or 24-bit color images at 15 frames per second in a frame size ranging from 64-by-64 pixels to 250-by-176 pixels. Unfortunately, existing AVI files that don't meet these criteria can't be used unless they're converted. The VDOLive client is blunt, but effective. Hitting the play button calls up a window for you to enter an address for the VDOLive meta file that points to the video stream you want to launch. There are also a few user-configurable parameters behind this window. VDOLive is supported by some firewall vendors. However if UDP-based video is blocked by a firewall, VDOLive resorts to TCP-based video instead. VDONet's codec VDOWave has been included in the codecs shipped with Microsoft's NetShow
  • since 1996. Microsoft hold an equity stake in VDONet. Vosaic Based on research at the University of Illinois, Vosaic uses the Video Datagram Protocol (VDP) protocol.VDP is basically an augmented RTP. VDP improves reliability by creating two separate channels between the client and server; one is a control channel the two machines use to coordinate what information is being sent across the network, and the other channel is for the streaming data. A server would first send the client what amounts to an inventory of the stream that is about to be broadcast. The client then uses this list to tell the server which segments to deliver, and if a segment of the stream is lost or delayed, the client can simply ask for that segment be resent. The stream itself is buffered on the client side, providing for smooth playback in most cases. VDP also uses adaptive flow control on the server side that can adapt the packet flow based on how well the client is doing. If the client is doing well and receiving all the frames, the server can increase the number of packets being sent out onto the network. If the client is having trouble keeping up or the network is so loaded that packets are being delayed, the server can drop packets from the stream. VDP is designed to preserve network
  • bandwidth in response to both network congestion as well as client CPU load. Vosaic supports video and audio standards including MPEG1, MPEG2, GSM audio, and H.263. To view Vosaic's streaming videos you need the Vosaic plug-in. It also requires you to down load both a VOSAIC client and a server. There is a new version out based on Java which VOSAIC MediaStudio is a JAVA-based authoring application which can convert AVI/ASF formats and MPEG1/2 formats into bandwidth compatible MPEG or H.263 files. The quality (target frame rate, quantisation, MPEG frame sequence(IPBIPB)) need to be pre-set depending on the likely connection bandwidths of your clients. Vosaic appears to be quite similar to SuperNOVA. It uses both feedback and a feedforward scheme to adapt to both network and end system conditions. However it doesn't include end-to-end QoS management with user interaction. Dynamic scaling is only frame-dropping, within the boundaries pre-determined at capture time. It does not support transcoding on the fly. On a T1 link your source is MPEG while on a 28K link your source is H263. On the plus side - they already have a 100% Java H263 player. Vosaic had a lot of audio dropouts compared to VDOLive which maintains audio at all costs. It delivered 8bit video only and suffered from missing blocks due to packets being lost - a consequence of MPEG1 encoded video.
  • VXtreme VXtreme consists of a number of WebTheater products: Web Theater Client, Server, Producer, LiveStation, and Personal Edition. VXtreme's software-only compression technology automatically adapts the bandwidth of the video to the network connection. VXtreme's Web Theater software uses RTP (Real Time Protocol) as its network delivery mechanism extended to include mechanisms for packet loss recovery. VXtreme's compression method is non- standard. They claim it offers bandwidth scaling and software-only capability. It is apparently not based on DCT or motion estimation (H.261, H.263, MPEG1,2) or wavelets which they claim are compute-intensive and require hardware-support. For the multicast case, VXtreme uses a layered compression scheme to divide the compressed video into multiple streams with differing priorities (based on importance to visual quality). This layered approach reduces jitter caused by frame dropping and delivers smoother but lower resolution video. They have a bizarre congestion control method which freezes both audio and video and then restarts. Their proprietary encoding method is just as blocky as DCT-based encoding. Microsoft has recently acquired VXtreme's codec to ship with NetShow. Vivoactive
  • The VivoActive player supports audio/video streaming of proprietary VIV files over the web with standard HTTP connections. VIV files are compressed (up to 250:1) files created by the VivoActive producer. Presently, the Producer can be downloaded for free. The plug-in works well with VIV files, but not many sites have VIV files.The VIVO format uses H.263 video compression and G.723 audio compression. No separate video server required. Uses HTTP rather than UDP. While Vivo acknowledged that there is some inevitable loss in speed and quality using HTTP vs. UDP, they, argued it is negligible, and that it is more than made up for by the fact that HTTP, which will continue to send streams even when packets are dropped, is more flexible and less of a bandwidth hog than UDP. Not truly scalable - users can control how a video file is compressed and delivered by specifying a bandwidth. You can choose from a variety of predefined settings to optimize your video depending on the type of content you're streaming and the network connection of your audience (modem, ISDN, T1, LAN).lets you customize the data rate, frame rate, output size, audio quality and buffering parameters for your streaming video.
  • Microsoft's NetShow Microsoft's NetShow expects the user to first create an ASF (Active Movie Streaming) stream. The user has to choose from a range of audio and video codecs depending on their bandwidth availability. Codecs on offer include MPEG-layer3, Microsoft MPEG-4, Vivo G.723 (audio) and H.263 (video). Content can be produced using VivoActive. It doesn't appear to offer dynamic scalability but relies on the user to choose from a table of codecs depending on whether they are on a 28.8Kbps modem, 56Kbps ISDN or 110Kbps Intranet connection. NetShow will also support the Progressive Networks RealAudio and RealVideo formats. It requires both a client (NetShow Player) and a server (NetShow Server). There is also a set of NetShow Content Creation Tools. It uses the UDP protocol and relies on port 1755 to get through firewalls. A Netscape plug-in is used to replay the video. The major limitation of NetShow is that it doesn't support high quality video formats which would be deliverable over high bandwidth connections. But it does deliver very good quality video (using the latest compression standards, H.263 and MPEG4) at low bandwidths. The advantage of NetShow is its flexibility. It supports a range of audio and video codecs which can simply be plugged into the NetShow architecture to provide a range of video/audio
  • streaming solutions. Codecs on offer include: Duck TrueMotion RT, MPEG-3, Iterated Systems' ClearVideo, Microsoft MPEG4, VDOnet's VDOWave, Vivo H.263, Intel H.263. In addition, they have just acquired Vxtreme. See http://www.microsoft.com/netshow/codecsship.htm Comparison of Commercial Video Streaming Products The previous section describes the 8 major players in this field. The best ones are those which deliver the highest quality video for a given bandwidth i.e. lowest delay, no jitter (low frame loss), good audio/visual synchronisation, high quality audio and image resolution. In addition, the ability to provide the best possible video quality over a range of networks/bandwidths without content duplication is highly desirable. This characteristic is referred to as scalability. All commercial products, except ShockWave claim some form of video scalability. Investigation reveals that often the claims of scalability are not what they appear to be or are simply misleading. The scalability more often than not is static and not dynamic, and there is little user control in the visual manifestation of this scalability.
  • The currently available commercial products offer two types of scalability. Firstly, there is scalability at the encoding stage. Users are given a range of encoding formats to choose from, which correspond to a range of bandwidths. The limitation of this scalability is that users need to know the bandwidth in advance. This is inflexible - any unpredicted load cannot be handled gracefully. Additionally, in a multi-receiver scenario the selected bandwidth must be that of the lowest channel's capacity. This is an unrealistic restriction and a waste of bandwidth for higher capacity receivers. Also forcing an individual to select bandwidth assumes some sort of technical awareness, and does not easily illustrate the related visual quality of the selected video. Multiple formats were not supported from a single source, but rather required the existence of a clip in the desired format. This entails an overhead in administration and storage of audio and video material. Secondly, some of the products also incorporate some kind of dynamic scalability based on the available bandwidth at the time. Where dynamic scalability is provided it is usually simple frame dropping. This is not ideal because it can cause jerkiness and loss of synch. Alternatively, a layered or hierarchical compression method can be used. Layered compression
  • methods usually lose image quality or resolution but maintain frame rate as the bandwidth drops. VXtreme claims to use a layered compression method but it only supports AVI and MOV file formats. VOSAIC supports a variety of codecs - H.263, MPEG1 and MPEG2 - to suit the available bandwidth which can range from 28.8Kbps to T1. The bandwidth must be specified at encoding so that the most appropriate codec can be selected. Limited dynamic adaption is possible through frame dropping. VDOLive is based on a proprietary wavelet encoding which enables 10-15fps, 1/4 screen video replay over 28.8Kbps. It scales dynamically from 14.4Kbps modem to ISDN and Cable modems. VivoActive offers a very simple solution for low bandwidth connections. It doesn't require a server since it uses HTTP and it simply uses the low bandwidth H.263 and GSM codecs to enable embedded audio/video streaming over 28.8 Kbps modems. But it doesn't support high quality video (MPEG1, MPEG2) over higher bandwidths.
  • Progressive Network's RealVideo has recently incorporated Iterated Systems fractal compression technology, which will improve its ability to dynamically scale to a range of bandwidths. The philosophy being adopted by the major vendors such as Sun, Microsoft and Netscape is to provide the ability to dynamically download codec components over the Internet. In the multimedia APIs being developed for Sun Microsystems Java, it will be possible to dynamically download codec components. This trend is also seen in products from major vendors such as Microsoft and Netscape, where they allow for multiple audio and video codecs to be plugged into their real-time streaming solutions. Consequently, Microsoft's NetShow which has been designed to allow a variety of codecs, suited to differing applications, to be easily incorporated, offers flexibility and support for the latest scalable video compression techniques. Commercial Video Servers High-end database-driven video servers are also available from companies like IBM, Oracle, SGI,Sun and Tektronix. These products should be
  • considered for large scale applications or for serving large numbers of simultaneous streams. SGI WebForce IBM VideoCharger and Digital LibraryMediaBase Sun MediaCenter Servers General Conclusions Streaming video (and audio) across networks is an effort that is attracting many participants. This is evidenced by the eight primary commercial and thirteen research organisations involved with this technology in various ways. A key characteristic of both the commercial products and research demonstrators is the diversity in technological infrastructure e.g. networks, protocols, compression standards supported. All the commercial video products reviewed in this report are optimised for low bandwidth modem or ISDN connections and are not designed to scale to higher bandwidth networks. The video needs to be pre-encoded with the target audience in mind. The commercial products have either adopted/developed their own proprietary standards, embraced the currently accepted standards (e.g. MPEG) or implemented a combination of the two. Compatibility between the commercial products has been limited because of these proprietary
  • standards. However recent products such as Sun's MediaFramework API and MicroSoft's NetShow have been designed to enable new and various codecs to be easily incorporated into their framework. H.263 and MPEG-4 are going to become the defacto standards for video delivery over low bandwidths. But broadband standards such as MPEG-1 and MPEG-2, which are useful for many types of broadcast and CD-ROM applications, are unsuitable for the Internet. Although MPEG-2 has had scalability enhancements, these will not be exploitable until the availability of reasonably priced hardware encoders and decoders which support scalable MPEG2. Codecs designed for the Internet require greater bandwidth scalability, lower computational complexity, greater resilience to network losses, and lower encode/decode latency for interactive applications. These requirements imply codecs designed specifically for the diversity and heterogeneity of Internet delivery. The research on Internet codecs has broadly taken two directions. DCT based and non-DCT based. DCT based video delivery, except for MPEG 2, possesses no inherent scalability. To achieve adaptivity various operations can be applied to the (semi) compressed data stream to reduce its bit rate.
  • Amongst these operations is transcoding, the conversion of one compression standard to another. The beauty of the DCT based approach is that it is compatible with current and imminent draft compression standards. Furthermore it allows re-use of existing audio and video archives without explicitly re-coding them to cater for all possible formats. Existing viewers also maintain their currency. Non-DCT based compression techniques, e.g. layered, sub-band, wavelet etc., are intrinsically scalable. This is their great attraction. Unfortunately although several CODECs exist, they are still experimental in nature and often suffer from performance problems. In addition, existing movie libraries would need to be re-coded, by no means a trivial task. The research projects reviewed in this chapter broadly fall into two categories, one group is developing scalable video CODECs mainly using sub band coding. The other group is looking at scalable video in the context of QoS. There is consensus in the research community that the key to efficient delivery of continuous media over heterogeneous networks is dynamic bandwidth adaption. Of these groups the research carried out at Columbia both in the video-on-demand testbed seem the most significant work in this
  • area this research is similar to SuperNOVA in some areas and complementary in others.