A Review of Video Streaming over
Ideally, video and audio are streamed across the Internet from the server to
the client in response to a client request for a Web page containing embedded
videos. The client plays the incoming multimedia stream in real time as the
data is received. Quite a few video streamers are starting to appear and many
pseudo-streaming technologies and other potential solutions are also in the
pipeline. Generally streaming video solutions may work on a closed-loop
intranet, but for mass-market Internet use, they're simply dysfunctional.
However current transport protocol, codec and scalability research will
eventually make video on the Web a practical reality. Below we have
reviewed the currently available commercial products which purport to provide
video streaming capabilities over the Internet and outlined their limitations.
Then we describe the major research projects curently underway, which are
attempting to solve some of these limitations. Finally we compare and
evaluate the SuperNOVA project with respect to other research projects and
the current commercial products.
For a long time now, its been very easy to download and play back high-
quality audio and video files from the Internet. Current web browsers and
servers support full-file transfer mode of document retrieval. However, full file
transfer means very long, unacceptable transfer times and playback latency.
Ideally, video and audio should be streamed across the Internet from the
server to the client in response to a client request for a Web page containing
embedded videos. The client plays the incoming multimedia stream in real
time as the data is received. Audio streaming is becoming widely accepted
and deployed. In particular, Progressive Networks' RealAudio has a wide
following. Although streaming audio programs are considerably further along
than video, they are still nowhere near typical computer-sound quality. The
idea of streaming video over the network has been gaining a lot of interest.
The current Internet is a best effort network and interconnects sites with
widely varying bandwidth capabililties. In the future the Internet will see the
rollout of ATM, RSVP with the ability to control Quality of Services (QoS) and
mobile networks with widely varying QoS. Therefore it will remain a very
heterogeneous network. In this report firstly we present a brief review of the
current video compression standards, evolving standards and techniques and
the internet transport protocols being deployed. In addition, issues such as the
need for servers, plugins and firewall penetration are discussed. There are
many commerical streaming video products becoming available as well as
many research projects in this area. We then review the currently available
commercial products which purport to provide video streaming capabilities
over the Internet and out line their current limitations. Then we describe the
major research projects currently underway, which are attempting to solve
some of these limitations. Finally we compare and evaluate the SuperNOVA
project with respect to other research projects and the cur rent commercial
Video Compression Standards
The most important video codec standards for streaming video are H.261, H.263,
MJPEG, MPEG1, MPEG2 and MPEG4. A brief description of these is given below.
Compared to video codecs for CD-ROM or TV broadcast, codecs designed for the
Internet require greater scalability, lower computational complexity, greater resiliency
to network losses, and lower encode/decode latency for video conferencing. In
addition, the codecs must be tightly linked to network delivery software to achieve the
highest possible frame rates and picture quality. As one looks at the existing codec
standards, it becomes apparent that none are ideal for Internet video. In fact, it is quite
clear that over the next few years, we will see a host of new algorithms that are
specifically designed for the Internet and are thus more suitable for it. Research is
currently underway looking at both new scalable, flexible codecs and ways of scaling
existing codecs using transcoding and filters. Section 3 outlines current research in
video scalability. New algorithms specifically targeted at Internet video are being
developed. Consequently application framework standards such as H323/H.324 for
videoconferencing and MPEG4, are being designed that will easily incorporate these
new codec innovations into applications being developed today, without significant
H.261 is also known as P*64 where P is an integer number meant to represent
multiples of 64kbit/sec. H.261 was targeted at teleconferencing applications and is
intended for carrying video over ISDN - in particular for face-to-face videophone
applications and for videoconferencing. The actual encoding algorithm is similar to
(but incompatible with) that of MPEG. H.261 needs substantially less CPU power for
real-time encoding than MPEG. The algorithm includes a mechanism which optimises
bandwidth usage by trading picture quality against motion, so that a quickly-changing
picture will have a lower quality than a relatively static picture. H.261 used in this
way is thus a constant-bit-rate encoding rather than a constant-quality, variable-bit-
is a draft ITU-T standard designed for low bitrate communication. It is
expected that the standard will be used for a wide range of bitrates, not just
low bitrate applications. It is expected that H.263 will replace H.261 in many
applications. The coding algorithm of H.263 is similar to that used by H.261,
however with some improvements and changes to improve performance and
error recovery. The differences between the H.261 and H.263 coding
algorithms are listed below. Half pixel precision is used for H.263 motion
compensation whereas H.261 used full pixel precision and a loop filter. Some
parts of the hierarchical structure of the datastream are now optional, so the
codec can be configured for a lower datarate or better error recovery. There
are now four optional negotiable options included to improve performance:
Unrestricted Motion Vectors, Syntax-based arithmetic coding, Advance
prediction, and forward and backward frame prediction similar to MPEG called
P-B frames. H.263 supports five resolutions. In addition to QCIF and CIF that
were supported by H.261 there is SQCIF, 4CIF, and 16CIF. SQCIF is
approximately half the resolution of QCIF. 4CIF and 16CIF are 4 and 16 times
the resolution of CIF respectively. The support of 4CIF and 16CIF means the
codec could then compete with other higher bitrate video coding standards
such as the MPEG standards.
There is really no such standard as "motion JPEG" or "MJPEG" for video.
Various vendors have applied JPEG to individual frames of a video sequence,
and have called the result "M-JPEG". JPEG is designed for compressing
either full-color or gray-scale images of natural, real-world scenes. It works
well on photographs, naturalistic artwork, and similar material; not so well on
lettering, simple cartoons, or line drawings. JPEG is a lossy compression
algorithm which uses DCT-based encoding. JPEG can typically achieve 10:1
to 20:1 compression without visible loss, 30:1 to 50:1 compression is possible
with small to moderate defects, while for very-low-quality purposes such as
previews or archive indexes, 100:1 compression is quite feasible. Non-linear
video editors are typically used in broadcast TV, commercial post production,
and high-end corporate media departments. Low bitrate MPEG-1 quality is
unacceptable to these customers, and it is difficult to edit video sequences
that use inter-frame compression. Consequently, non-linear editors (e.g.,
AVID, Matrox, FAST, etc.) will continue to use motion JPEG with low
compression factors (e.g., 6:1 to 10:1).
MPEG 1, 2 and 4 are currently accepted, draft and developing standards respectively,
for the bandwidth efficient transmission of video and audio. The MPEG-1 codec
targets a bandwidth of 1-1.5 Mbps offering VHS quality video at CIF (352x288)
resolution and 30 frames per second. MPEG-1 requires expensive hardware for real-
time encoding. While decoding can be done in software, most implementations
consume a large fraction of a high-end processor. MPEG-1 does not offer resolution
scalability and the video quality is highly susceptible to packet losses, due to the
dependencies present in the P (predicted) and B (bi-directionally predicted) frames.
The B-frames also introduce latency in the encode process, since encoding frame N
needs access to frame N+k, making it less suitable for video conferencing.
MPEG 2 extends MPEG 1 by including support for higher resolution video and
increased audio capabilities. The targeted bit rate for MPEG 2 is 4-15Mbits/s,
providing broadcast quality full-screen video. The MPEG 2 draft standard does cater
for scalability. Three (3) types of scalability; Signal-to-Noise Ratio (SNR), Spatial and
Temporal, and one extension (that can be used to implement scalability) Data
Partitioning, have been defined. Compared with MPEG-1, it requires even more
expensive hardware to encode and decode. It is also prone to poor video quality in the
presence of losses, for the same reasons as MPEG-1. Both MPEG-1 and MPEG-2 are
well suited to the purposes for which they were developed. For example, MPEG-1
works very well for playback from CD-ROM, and MPEG-2 is great for high-quality
archiving applications and for TV broadcast applications. In the case of satellite
broadcasts, MPEG-2 allows >5 digital channels to be encoded using the same
bandwidth as used by a single analog channel today, without sacrificing video quality.
Given this major advantage, the large encoding costs are really not a factor. However,
for existing computer and Internet infrastructures, MPEG-based solutions are too
expensive and require too much bandwidth; they were not designed with the Internet
The intention of MPEG 4 is to provide a compression scheme suitable for video
conferencing, i.e. data rates less 64Kbits/s. MPEG4 will be based on the segmentation
of audiovisual scenes into AVOs or "audio/visual objects" which can be multiplexed
for transmission over heterogeneous networks. The MPEG-4 framework currently
being developed focuses on a language called MSDL (MPEG-4 Syntactic Description
Language). MSDL allows applications to construct new codecs by composing more
primitive components and providing the ability to dynamically download these
components over the Internet. This philosophy is similar to that for the multimedia
APIs being developed for Sun Microsystems Java, where it will be possible to
dynamically download codec components. This trend is also seen in products from
major vendors such as Microsoft and Netscape, where they allow for multiple audio
and video codecs to be plugged into their real-time streaming solutions.
Scalable Video Compression Techniques
These can be sub-divided into DCT-based schemes (which include H.261, H.263,
MPEG1 and MPEG2), wavelet and sub-band schemes, fractal-based schemes and
image segmentation/region based compression schemes (MPEG4).
The majority of scalable video codecs are based on subband coding techniques of
which the most widely used is the wavelet transform. VDONet and Vxtreme use the
Wavelet codecs. There is also a lot of work going on in research organisations looking
at the application of wavelet and subband coding techniques to scalable video codecs
- see sections 7.2, 7.3, 7.4, 7.5.
Fractal Video Coding
Various research groups [ , ] are investigating the application of fractal
compression to scalable video. Iterated Systems have developed a
commercial product which has been implemented within Progressive
Network's RealVideo product.
Image Segmentation and Object-based Video Coding
A number of research groups are investigating the application of image segmentation
to video compression. The approaches involve extracting important subsets of the
image content of each frame and only delivering the most important e.g. object
boundaries, moving objects. Object-based coding can achieve very high data
compression rates while maintaining an acceptable visual quality in the decoded
images. However object-based coders are computationally intensive and to be viable
as a real time process, an object-based coder would need to have the image
segmentation algorithm implemented as a VLSI array.See section 7.6, the UC Davis
Image Sequence Processing Group  and 7.7 the Video Communication Research
Group (VCRG), Uni. of Western Australia , and the Bath Video Coding Group
. The MPEG4 standard is directly related to this content-based scalable video codec
Internet Transport Protocols
TCP Transmission Control Protocol
HTTP (Hypertext Transfer Protocol) uses TCP as the protocol for reliable
document transfer. If packets are delayed or damaged, TCP will effectively
stop traffic until either the original packets or backup packets arrive. Hence it's
unsuitable for video and audio because:
• TCP imposes its own flow control and windowing schemes on the data stream,
effectively destroying temporal relations between video frames and audio
• Reliable message delivery is unnecessary for video and audio - losses are
tolerable and TCP retransmission causes further jitter and skew.
UDP (User Datagram Protocol) is the alternative to TCP. RealPlayer,
StreamWorks and VDOLive use this approach. (RealPlayer gives you a
choice of UDP or TCP, but the former is preferred.) UDP forsakes TCP's error
correction and allows packets to drop out if they're late or damaged. When
this happens, you'll hear or see a dropout, but the stream will continue.
Despite the prospect of dropouts, this approach is arguably better for
continuous media delivery. If broadcasting live events, everyone will get the
same information simultaneously. One disadvantage to the UDP approach is
that many network firewalls block UDP information. While Progressive
Networks, Xing and VDOnet offer work-arounds for client sites (revert to
TCP), some users simply may not be able to access UDP files.
Server or Serverless
Two major approaches are emerging for streaming multimedia content to
clients. The first is the server-less approach which uses the standard web-
server and the associated HTTP protocol to get the multimedia data to the
client. The second is the server-based approach that uses a separate server
specialized to the video/multimedia streaming task. The specialization takes
many forms, including optimized routines for reading the huge multimedia files
from disk, the flexibility to choose any of UDP/TCP/HTTP/Multicast protocols
to deliver data, and the option to exploit continuous contact between client
and server to dynamically optimize content delivery to the client. The primary
advantages of the server-less approach are: (i) there is one less software
piece to learn and manage, and (ii) from an economic perspective, there is no
video-server to pay for. In contrast, the server-based approach has the
advantages that it: (i) makes more efficient use of the network bandwidth, (ii)
offers better video quality to the end user, (iii) supports advanced features like
admission control and multi-stream multimedia content, (iv) scales to support
large number of end users, and (v) protects content copyright. The tradeoffs
clearly indicate that for serious providers of streaming multimedia content the
server-based approach is the superior solution. RealPlayer, StreamWorks and
VDOnet's VDOLive require you to install their A/V server software on your
Web server computer. Among other things, this software can tailor the quality
and number of streams, and provide detailed reports of who requested which
streams. Other programs, such as Shockwave and VivoActive, are serverless.
They don't require any special A/V server software beyond your ordinary Web
server software. With these programs, you simply link a file on your server's
hard drive from a Web page. When someone hits the link, the file starts to
download. Serverless programs are simple to incorporate into a Web site but
don't have the reporting capabilities of server-based programs. And because
they lack both stream- and bandwidth-management features, they may be
problematic if you need to support many simultaneous streams.
Java Replayers Replacing Plugins
New solutions are appearing which use Java to eliminate the need to
download and install plugins or players. Such an approach will become
standard once the Java Media Player APIs being developed by Sun, Silicon
Graphics and Intel are available. This approach will also ensure client platform
independence. Vosaic appears to be one of the few products with a Java
replayer which supports H.263.
Nearly all streaming products require users behind a firewall to have a UDP
port opened for the video streams to pass through (1558 for StreamWorks,
7000 for VDOLive, 7070 for RealAudio). Rather than punch security holes in
the firewall, Xing/StreamWorks has developed a proxy software package you
can compile and use, while VDONet/VDOLIve and Progressive
Networks/RealPlayer are approaching leading firewall developers to get
support for their streams incorporated into upcoming products. Currently a
number of products change from UDP to HTTP or TCP when UDP can't get
through firewall restrictions. This reduces the quality of the video. In all cases,
it's still a security issue for network managers.
Commercial Real Time Video Streamers
MacroMedia's Streaming Shockwave
Shockwave for Director consists of two components. On the HTTP server
side, the Afterburner tool compresses Director movies to make them available
on the Internet. On the client side, the Shockwave plugin lets the user
incorporate Director movies into the page layout of their HTML document. The
current Shockwave plugin is not streaming. The entire Director movie must be
downloaded before playback. The current release allows for a seperate real-
time audio stream which can be encoded at 8,16,32 or 64 kbps, depending on
the most likely bandwidth available to users. Macromedia have just released
Director 6 Multimedia Studio which supposedly includes new Streaming
Shockwave technology. Macromedia and Progressive Networks have also
announced the integration of Shockwave Flash, a vector-based animation and
graphics system, on top of RealMedia, to enable audio and video streaming of
output from Flash.It is a serverless product which relies on the HTTP protocol
only. It isn't capable of live feeds and makes no use of IP Multicast, so it can't
scale well to support thousands of enterprise customers while efficiently using
Progressive Network's RealVideo
Progressive Networks has recently launched RealVideo, the streaming video
version of their well-known RealAudio product. Both server and client versions
have been released. In addition Progressive Networks have released a range
of video-oriented content development tools, some their own, others
developed by third parties. Users need to install the RealServer 4.0 and the
RealPlayer Plus 4.0. It uses the RTSP protocol on top of UDP. Users
apparently have a choice of either fixed or optimized frame rate encoding in
the new RealVideo encoder. Users choose between a number of pre-defined
encoding templates which correspond to the most appropriate audio and
video formats for a given bandwidth. "Stream thinning" detects poor or
congested Internet connections and will dynamically adjust the video frame
rate in real-time. This is presumably frame dropping. "Smart networking"
automatically delivers audio and video streams via the most efficient network
protocol. This is presumable choosing between TCP, UDP or UDP multicast.
The choice of TCP would be to deal with firewall restrictions blocking UDP.
Progressive Networks have recently licensed in ClearVideo, a fractal-based
video compression technology from Iterated Systems (see
http://www.iterated.com) to complement their internally-developed
compression methods. RealVideo 1.0 provides two codecs RealVideo
Standard (developed by Progressive Networks) and RealVideo Fractal (using
Clear Video technology from Iterated Systems, Inc.).
Xing Technology's StreamWorks
StreamWorks streams video and audio over the WWW using UDP/IP. Video
streams can be MPEG1 while audio can be MPEG1 or MPEG1 private data
streams containing MPEG2 LBR audio. Providers encode content at 8.5, 24,
56 or 112 kbps depending on the bandwidth capabilities of the potential users.
StreamWorks supports a process called thinning which reduces a high-
bandwidth stream so it can be transmitted over a lower bandwidth connection.
At low bandwidths, the software maintains a continuous audio stream of 8 to
10 Kbps, and the video stream uses whatever bandwidth is left. The MPEG-
based compression allows the software to drop frames from the stream,
creating a jerky video sequence with almost no motion, while maintaining a
smooth audio playback. The quality of the frames that do get through is still
pretty good, just not as fluid as one would expect from real video.
StreamWorks' is able to broadcast streams to "relay servers". By using a star
configuration, it's possible to provide a video feed from a single server to
regional servers that then provide that stream to desktop clients.
StreamWorks' technology includes three components: the client software, the
server software and a video capture/encoding box called the AVTrans
encoder to compress audio and video streams. These streams are transferred
to a Unix server running the StreamWorks server software over a TCP/IP
network and, from there, are broadcast over the network to client
workstations. The AVTrans encoder is capable of creating a range of
compressed streams, ranging from an 8.5-Kbps low bit rate format that
produces 8-kilohertz mono audio on the client, to a 112-Kbps stream that
provides 44 KHz of stereo audio or 30-frames-per-second, quarter-screen
video for large bandwidth connections such as Ethernet or a T1. Like the
other products examined in this review, StreamWorks requires you to register
its mime type in your Web server's configuration file, and you need to open a
UDP port (1558) for delivering video to client workstations. The server
software can recode the compressed streams on the fly to compensate for
large numbers of users and a limited bandwidth. The server is configured from
a text file, so you can limit the total bandwidth output, the maximum number of
simultaneous streams and the maximum bit rate per stream. The maximum
default configuration for the server is 10 Mbps for an Ethernet connection, but
that can be adjusted depending on how your client machines are connecting--
via 14.4-Kbps modem pool, ISDN hub or 100-Mbps backbone. With a 28.8-
Kbps modem connection, the StreamWorks server drops to a much lower
frame rate of 2 to 3 frames per second, producing a jerky, halting video image
while maintaining continuous audio continuity. Client performance is better
VDONet claim that content providers only need one video source which can
be scaled on the fly for both high and low scale connections. They claim to be
able to deliver 10-15 fps over a 28.8 kbps modem using a proprietary video
compression scheme based in part on wavelet techniques (VDOWave).
Under ideal conditions (minimal Internet traffic, no local network overhead,
minimal overload on the VDOLive On-Demand Server): with a 14.4 kbps
modem: up to 2 to 3 frames per second with a 28.8 kbps modem: from 8 to 12
frames per second with an ISDN line: up to 20 frames per second. VDONet's
VDOLive boasts a slightly higher frame rate over a standard 28.8-Kbps
modem than StreamWorks because it uses a wavelet compression
technology that lets it shave layers of quality off each frame that's transmitted,
rather than dropping whole frames. This creates a stream that is smoother at
low bit rates, but of lower visual clarity and quality. VDOLive appears to be the
only commercial product which tries to estimate bandwidth and adapt
dynamically. The image quality is very poor at times but audio is good.
VDOLive includes two programs, VDO Capture which lets you capture video
streams and VDO Clip which compresses a previously captured video stream
and encodes it for delivery from a VDO server. VDO Capture supports seven
full-motion video cards that can capture 16- or 24-bit color images at 15
frames per second in a frame size ranging from 64-by-64 pixels to 250-by-176
pixels. Unfortunately, existing AVI files that don't meet these criteria can't be
used unless they're converted. The VDOLive client is blunt, but effective.
Hitting the play button calls up a window for you to enter an address for the
VDOLive meta file that points to the video stream you want to launch. There
are also a few user-configurable parameters behind this window. VDOLive is
supported by some firewall vendors. However if UDP-based video is blocked
by a firewall, VDOLive resorts to TCP-based video instead. VDONet's codec
VDOWave has been included in the codecs shipped with Microsoft's NetShow
since 1996. Microsoft hold an equity stake in VDONet.
Based on research at the University of Illinois, Vosaic uses the Video
Datagram Protocol (VDP) protocol.VDP is basically an augmented RTP. VDP
improves reliability by creating two separate channels between the client and
server; one is a control channel the two machines use to coordinate what
information is being sent across the network, and the other channel is for the
streaming data. A server would first send the client what amounts to an
inventory of the stream that is about to be broadcast. The client then uses this
list to tell the server which segments to deliver, and if a segment of the stream
is lost or delayed, the client can simply ask for that segment be resent. The
stream itself is buffered on the client side, providing for smooth playback in
most cases. VDP also uses adaptive flow control on the server side that can
adapt the packet flow based on how well the client is doing. If the client is
doing well and receiving all the frames, the server can increase the number of
packets being sent out onto the network. If the client is having trouble keeping
up or the network is so loaded that packets are being delayed, the server can
drop packets from the stream. VDP is designed to preserve network
bandwidth in response to both network congestion as well as client CPU load.
Vosaic supports video and audio standards including MPEG1, MPEG2, GSM
audio, and H.263. To view Vosaic's streaming videos you need the Vosaic
plug-in. It also requires you to down load both a VOSAIC client and a server.
There is a new version out based on Java which VOSAIC MediaStudio is a
JAVA-based authoring application which can convert AVI/ASF formats and
MPEG1/2 formats into bandwidth compatible MPEG or H.263 files. The
quality (target frame rate, quantisation, MPEG frame sequence(IPBIPB)) need
to be pre-set depending on the likely connection bandwidths of your clients.
Vosaic appears to be quite similar to SuperNOVA. It uses both feedback and
a feedforward scheme to adapt to both network and end system conditions.
However it doesn't include end-to-end QoS management with user interaction.
Dynamic scaling is only frame-dropping, within the boundaries pre-determined
at capture time. It does not support transcoding on the fly. On a T1 link your
source is MPEG while on a 28K link your source is H263. On the plus side -
they already have a 100% Java H263 player. Vosaic had a lot of audio
dropouts compared to VDOLive which maintains audio at all costs. It delivered
8bit video only and suffered from missing blocks due to packets being lost - a
consequence of MPEG1 encoded video.
VXtreme consists of a number of WebTheater products: Web Theater Client,
Server, Producer, LiveStation, and Personal Edition. VXtreme's software-only
compression technology automatically adapts the bandwidth of the video to
the network connection. VXtreme's Web Theater software uses RTP (Real
Time Protocol) as its network delivery mechanism extended to include
mechanisms for packet loss recovery. VXtreme's compression method is non-
standard. They claim it offers bandwidth scaling and software-only capability.
It is apparently not based on DCT or motion estimation (H.261, H.263,
MPEG1,2) or wavelets which they claim are compute-intensive and require
hardware-support. For the multicast case, VXtreme uses a layered
compression scheme to divide the compressed video into multiple streams
with differing priorities (based on importance to visual quality). This layered
approach reduces jitter caused by frame dropping and delivers smoother but
lower resolution video. They have a bizarre congestion control method which
freezes both audio and video and then restarts. Their proprietary encoding
method is just as blocky as DCT-based encoding. Microsoft has recently
acquired VXtreme's codec to ship with NetShow.
The VivoActive player supports audio/video streaming of proprietary VIV files
over the web with standard HTTP connections. VIV files are compressed (up
to 250:1) files created by the VivoActive producer. Presently, the Producer
can be downloaded for free. The plug-in works well with VIV files, but not
many sites have VIV files.The VIVO format uses H.263 video compression
and G.723 audio compression. No separate video server required. Uses
HTTP rather than UDP. While Vivo acknowledged that there is some
inevitable loss in speed and quality using HTTP vs. UDP, they, argued it is
negligible, and that it is more than made up for by the fact that HTTP, which
will continue to send streams even when packets are dropped, is more flexible
and less of a bandwidth hog than UDP. Not truly scalable - users can control
how a video file is compressed and delivered by specifying a bandwidth. You
can choose from a variety of predefined settings to optimize your video
depending on the type of content you're streaming and the network
connection of your audience (modem, ISDN, T1, LAN).lets you customize the
data rate, frame rate, output size, audio quality and buffering parameters for
your streaming video.
Microsoft's NetShow expects the user to first create an ASF (Active Movie
Streaming) stream. The user has to choose from a range of audio and video
codecs depending on their bandwidth availability. Codecs on offer include
MPEG-layer3, Microsoft MPEG-4, Vivo G.723 (audio) and H.263 (video).
Content can be produced using VivoActive. It doesn't appear to offer dynamic
scalability but relies on the user to choose from a table of codecs depending
on whether they are on a 28.8Kbps modem, 56Kbps ISDN or 110Kbps
Intranet connection. NetShow will also support the Progressive Networks
RealAudio and RealVideo formats. It requires both a client (NetShow Player)
and a server (NetShow Server). There is also a set of NetShow Content
Creation Tools. It uses the UDP protocol and relies on port 1755 to get
through firewalls. A Netscape plug-in is used to replay the video. The major
limitation of NetShow is that it doesn't support high quality video formats
which would be deliverable over high bandwidth connections. But it does
deliver very good quality video (using the latest compression standards,
H.263 and MPEG4) at low bandwidths. The advantage of NetShow is its
flexibility. It supports a range of audio and video codecs which can simply be
plugged into the NetShow architecture to provide a range of video/audio
streaming solutions. Codecs on offer include: Duck TrueMotion RT, MPEG-3,
Iterated Systems' ClearVideo, Microsoft MPEG4, VDOnet's VDOWave, Vivo
H.263, Intel H.263. In addition, they have just acquired Vxtreme. See
Comparison of Commercial Video Streaming Products
The previous section describes the 8 major players in this field. The best
ones are those which deliver the highest quality video for a given bandwidth
i.e. lowest delay, no jitter (low frame loss), good audio/visual synchronisation,
high quality audio and image resolution. In addition, the ability to provide the
best possible video quality over a range of networks/bandwidths without
content duplication is highly desirable. This characteristic is referred to as
All commercial products, except ShockWave claim some form of video
scalability. Investigation reveals that often the claims of scalability are not
what they appear to be or are simply misleading. The scalability more often
than not is static and not dynamic, and there is little user control in the visual
manifestation of this scalability.
The currently available commercial products offer two types of scalability.
Firstly, there is scalability at the encoding stage. Users are given a range of
encoding formats to choose from, which correspond to a range of bandwidths.
The limitation of this scalability is that users need to know the bandwidth in
advance. This is inflexible - any unpredicted load cannot be handled
gracefully. Additionally, in a multi-receiver scenario the selected bandwidth
must be that of the lowest channel's capacity. This is an unrealistic restriction
and a waste of bandwidth for higher capacity receivers. Also forcing an
individual to select bandwidth assumes some sort of technical awareness,
and does not easily illustrate the related visual quality of the selected video.
Multiple formats were not supported from a single source, but rather required
the existence of a clip in the desired format. This entails an overhead in
administration and storage of audio and video material.
Secondly, some of the products also incorporate some kind of dynamic
scalability based on the available bandwidth at the time. Where dynamic
scalability is provided it is usually simple frame dropping. This is not ideal
because it can cause jerkiness and loss of synch. Alternatively, a layered or
hierarchical compression method can be used. Layered compression
methods usually lose image quality or resolution but maintain frame rate as
the bandwidth drops. VXtreme claims to use a layered compression method
but it only supports AVI and MOV file formats.
VOSAIC supports a variety of codecs - H.263, MPEG1 and MPEG2 - to suit
the available bandwidth which can range from 28.8Kbps to T1. The bandwidth
must be specified at encoding so that the most appropriate codec can be
selected. Limited dynamic adaption is possible through frame dropping.
VDOLive is based on a proprietary wavelet encoding which enables 10-15fps,
1/4 screen video replay over 28.8Kbps. It scales dynamically from 14.4Kbps
modem to ISDN and Cable modems.
VivoActive offers a very simple solution for low bandwidth connections. It
doesn't require a server since it uses HTTP and it simply uses the low
bandwidth H.263 and GSM codecs to enable embedded audio/video
streaming over 28.8 Kbps modems. But it doesn't support high quality video
(MPEG1, MPEG2) over higher bandwidths.
Progressive Network's RealVideo has recently incorporated Iterated Systems
fractal compression technology, which will improve its ability to dynamically
scale to a range of bandwidths.
The philosophy being adopted by the major vendors such as Sun, Microsoft
and Netscape is to provide the ability to dynamically download codec
components over the Internet. In the multimedia APIs being developed for
Sun Microsystems Java, it will be possible to dynamically download codec
components. This trend is also seen in products from major vendors such as
Microsoft and Netscape, where they allow for multiple audio and video codecs
to be plugged into their real-time streaming solutions. Consequently,
Microsoft's NetShow which has been designed to allow a variety of codecs,
suited to differing applications, to be easily incorporated, offers flexibility and
support for the latest scalable video compression techniques.
Commercial Video Servers
High-end database-driven video servers are also available from companies
like IBM, Oracle, SGI,Sun and Tektronix. These products should be
considered for large scale applications or for serving large numbers of
IBM VideoCharger and Digital LibraryMediaBase
Sun MediaCenter Servers
Streaming video (and audio) across networks is an effort that is attracting many
participants. This is evidenced by the eight primary commercial and thirteen research
organisations involved with this technology in various ways. A key characteristic of
both the commercial products and research demonstrators is the diversity in
technological infrastructure e.g. networks, protocols, compression standards
All the commercial video products reviewed in this report are optimised for low
bandwidth modem or ISDN connections and are not designed to scale to
higher bandwidth networks. The video needs to be pre-encoded with the
target audience in mind.
The commercial products have either adopted/developed their own
proprietary standards, embraced the currently accepted standards (e.g.
MPEG) or implemented a combination of the two. Compatibility between the
commercial products has been limited because of these proprietary
standards. However recent products such as Sun's MediaFramework API and
MicroSoft's NetShow have been designed to enable new and various codecs
to be easily incorporated into their framework.
H.263 and MPEG-4 are going to become the defacto standards for video
delivery over low bandwidths. But broadband standards such as MPEG-1 and
MPEG-2, which are useful for many types of broadcast and CD-ROM
applications, are unsuitable for the Internet. Although MPEG-2 has had
scalability enhancements, these will not be exploitable until the availability of
reasonably priced hardware encoders and decoders which support scalable
Codecs designed for the Internet require greater bandwidth scalability, lower
computational complexity, greater resilience to network losses, and lower
encode/decode latency for interactive applications. These requirements imply
codecs designed specifically for the diversity and heterogeneity of Internet
delivery. The research on Internet codecs has broadly taken two directions.
DCT based and non-DCT based. DCT based video delivery, except for MPEG
2, possesses no inherent scalability. To achieve adaptivity various operations
can be applied to the (semi) compressed data stream to reduce its bit rate.
Amongst these operations is transcoding, the conversion of one compression
standard to another. The beauty of the DCT based approach is that it is
compatible with current and imminent draft compression standards.
Furthermore it allows re-use of existing audio and video archives without
explicitly re-coding them to cater for all possible formats. Existing viewers also
maintain their currency.
Non-DCT based compression techniques, e.g. layered, sub-band, wavelet
etc., are intrinsically scalable. This is their great attraction. Unfortunately
although several CODECs exist, they are still experimental in nature and often
suffer from performance problems. In addition, existing movie libraries would
need to be re-coded, by no means a trivial task.
The research projects reviewed in this chapter broadly fall into two categories,
one group is developing scalable video CODECs mainly using sub band
coding. The other group is looking at scalable video in the context of QoS.
There is consensus in the research community that the key to efficient
delivery of continuous media over heterogeneous networks is dynamic
bandwidth adaption. Of these groups the research carried out at Columbia
both in the video-on-demand testbed seem the most significant work in this
area this research is similar to SuperNOVA in some areas and
complementary in others.