The term multimedia usually implies that at least one of text ( structured/unstructured, hypertext, etc ), graphics (drawings), or image ( discrete media ) is associated with either audio or motion video information ( continuous media ).
Multimedia streaming is the overlapping the playout of the data at the receiver with the transmission by the sender.
A video stream consists of a sequence of images or frames.
A frame consists of a grid of pixels.
( Table 1 )
An audio stream consists of a sequence of audio samples.
Basic Terms Table 1. Hierarchy of multimedia content. Picture element Two-dimensional grid of pixels Sequence of frames over time Synchronized set of streams Set of multimedia sessions Pixel Frame Stream Session Presentation Definition Term
The advantage of streaming is that it can enable easier access to multimedia resources.
Another possibility is the integration of video and audio with other web-based applications, such as chat and other real-time collaboration tools.
Streaming vs. downloading
What Is The Difference Between Downloading and Streaming ?
When you download a video, you have to copy the
entire file to your hard disk before you can play it.
When the video is streamed , there is a small wait as the stream 'buffers' but there is no need to save the file.
Streaming is the act of sending media files (audio and/or video) over the Internet from one computer to another computer so that the media plays as it is being delivered.
Basic Terms Figure 2. To hear or view a media file without downloading it
Media Encoding Audio Video Animation Clients Send Request To Servers Web Server Send Request to Media Server Media Server Proprietary Format
Access to Storage
Relieves Web Server
Send Stream To Clients
Java based player
Browser plug-in player
A media stream proceeds through the following stages before it is displayed to a recipient:
The audio or video stream must be captured from an analog device, such as a microphone or a video camera, and converting to a digital form.
25 fps (frame per second) for video and 16-bit for audio is suitable.
An encoder converts the raw digital data into a particular audio or video format.
A server may store the encoded stream for future transmission.
The stream is transmitted to one or more recipients. A live stream may be transmitted as it is captured and encoded, whereas a prerecorded stream is transmitted by a server.
The receiver decodes and displays the data as they arrive. Alternatively, the receiver may store the entire stream before initiating playback.
( Figure 3 )
Basic Terms Figure 3. Capturing Video
There are two different types of streaming :
The client begins playback of the multimedia file as it is delivered. The file is ultimately stored on the client computer.
Use standard web server
Quality is better than real-time streaming
The multimedia file is delivered to the client computer but the file is not stored on the client computer.
Require a special streaming server
two different types of real-time streaming :
used to deliver a live event while it is occurring.
Examples: live soccer game, live concerts, live radio, and videoconferences.
used to deliver archived media streams.
Examples: video clips, movies, and
Why Streaming Media?
No waiting for complete downloads.
Streamed files are not written to disk.
Presentation of live events is possible.
Major streaming formats
Microsoft Windows Media
Basic Terms Figure 4. Streaming media development process
How does streaming work?
Basic Terms Figure 5. Streaming media playback
How does streaming work?
Basic Terms Figure 6. Streaming media from a conventional Web server
Basic Terms Figure 7. Realtime Streaming protocol
SETUP - the server allocates resources for a client session.
PLAY - the server delivers a stream to a client session.
PAUSE - the server suspends delivery of a stream.
TEARDOWN - the server breaks down the connection and releases the resources allocated for the session.
Basic Terms Figure 8. RTSP state machihne
RTSP State Machine
Basic Terms RTSP operation
Clip is a media file that contains audio, video, or both.
A webcast uses streaming media technology to take a single content source and distribute it to many simultaneous listeners/viewers by broadcasting over the Internet.
Three general methods for delivering content from a server to a client across a network:
The server delivers the content to a single client.
The server delivers the content to all clients, regardless whether they want the content or not.
The server delivers the content to a group of receivers who indicate they wish to receive the content.
Broadcast means a piece of information is sent or transmitted from one point to all other points.
There is just one sender, but the information is simultaneously sent to all connected receivers.
In telecommunications, broadcasting means propagation of a flow of information from one source to all potential recipients.
In networking, a distinction is made between broadcasting and multicasting .
Broadcasting sends a message to everyone
on the network.
Multicasting sends a message to a select list of recipients.
Bit rate is amount of data that can be carried from one point to another in a given time period (usually a second).
Bit rate is sometimes called data rate or transfer rate or bandwidth .
Multiple Bit Rate Encoding
Combine several streams with different bit rate into a single file
The appropriate bit rate stream is automatically selected
( Figure 13 )
Basic Terms Figure 13. Multiple bit rate encoding
This is the ratio of width to height that the encoded video will be.
This information is present in the output video stream and used by the decoder to display the video at the correct aspect ratio.
The computer display is designed for an aspect ratio of 1.33:1 , which means that the width of the display area is only 1.33 times the height, almost square.
Frame means one still picture.
By changing still pictures (frames) quickly, human eye "thinks" that the video is smooth and can't separate pictures from each others and instead sees smooth video.
Frame rate is the number of video frames (complete pictures) that will be presented to the viewer each second.
Human eye can sees smooth video with the frame rate more than ~24 fps (frames per second).
In American TV system, NTSC, the frame rate is approximately 29.97 fps.
In European PAL system the frame rate is 25fps.
Frame buffer is a special memory to hold the complete digital representation of the frame to be displayed on a computer screen.
The frame buffer is scanned line by line by the digital-to-analog converter system of the display.
Color depth or bit depth is the number of bits used to represent the color of a single pixel in a bitmapped image or video frame buffer.
It is known as bits per pixel (bpp).
Higher color depth gives a broader range of distinct colors.
( Table 2 )
Basic Terms Table 2. Color depth chart. Number of Colors Bit-Depth 16,777,216 (True Color + Alpha Channel) 32 16,777,216 (True Color, SVGA) 24 65,536 (High Color, XGA) 16 256 (VGA) 8 16 (EGA) 4 4 (CGA) 2 2 (monochrome) 1
In transmission technology, jitter refers to the variation of the delay generated by the transmission equipment.
In data communications, jitter refers to the variation over time of the network transit delay.
Compression is the process of eliminating redundant information to decrease file size.
Compression converts frames and pixels to mathematical algorithms that the computer can understand.
Decompression converts mathematical algorithms back to frames and pixels for playback.
Two compression methods are:
Lossless compression retains all of the data of the original file as it's converted to a smaller file size.
In lossless compression the information is recovered without any alteration after the decompression stage.
When a lossless file is opened, algorithms restore all compressed information, creating a duplicate of the source file.
It generally preferred for creating high-quality or professional applications.
Lossless compression is applied where the accuracy of the information is essential, such as in medical imaging where it's important to retain fine detail.
Lossless compression is also called bit-preserving compression .
Lossy compression refers to the case where the decompressed information is different from the original uncompressed information.
With this kind of compression, some of the source file's information is discarded to conserve space.
When the file is decompressed, this information is reconstructed through algorithms.
This method results in some loss of sound quality or image detail when compared to the original.
This mode is suitable for most continuous media such as sound and motion video as well as for many images.
The MPEG Standards
MPEG standards developed and managed by Motion Picture Experts Group (MPEG)
MPEG-2: DVD, HDTV
MPEG-4: Content-based video coding
MPEG-7: Multimedia indexing and retrieval
MPEG-21: Multimedia delivery and consumption
Released in 1992
A standard for coded representation of
Combination of above
Typical application – video CD (VCD)
The MPEG-2 Standard
Released in 1994
A standard to provide video quality not lower than NTSC/PAL with bit rates target between 2-10 Mbit/s
Digital cable TV distribution
Networked database service via ATM
Digital video tape recorder (VTR)
Satellite and terrestrial digital broadcasting distribution
It also supports HDTV applications, and so pre-emptied MPEG-3 standard
The MPEG-4 Standard
First released in 1998, and targeted for content-based multimedia applications and low bit-rate video coding.
Algorithms and tools for coding and flexible representation of audio/video to meet the challenges of multimedia applications.
The objective of low bit-rate video coding was later accomplished by H.264, the convergence of ITU-T H.263 and MPEG-2.
First release in 2001
Official name: Multimedia Content Description Interface
To allow efficient search for multimedia content using standardized descriptors
The main research issues:
Optimum search engine
Feature analysis & query design
The MPEG-21 Standard
Aim at defining a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain.
Decompression is the process by which compressed information is expanded by addition of the redundant information eliminated at the compression stage.
After decompression, the resulting information may be identical to the original – lossless compression – or be different – lossy compression .
Codec stands for Coder/Decoder or Compression/Decompression .
Codec is a piece of software or a driver that is mostly for compression to reduce file size but may also do some formatting.
Compression is the primary function of the Codec.
With codec, your system recognizes the encoded video/audio format and allows you to play (decode) the audio/video file in a particular format.
MPEG1, MPEG2, DIVX, WMV(WINDOWS MEDIA VIDEO), MPEG4-H264, RealVideo
MP3,ATRAC, AAC, WMA (WINDOWS MEDIA AUDIO), DTS, RealAudio
JPEG, JPEG2000, PNG, GIF
TCP (Transmission Control Protocol)
UDP (User Datagram Protocol)
RTP (Real-time Transport Protocol)
RSVP (Resource ReSerVation Protocol)
( Table 3 )
Multimedia Protocols MULTIMEDIA CONCEPTS Table 3. Multimedia protocols. Disadvantage Advantage Network Protocol · Complicated request mechanism · Receivers may experience random packet loss for small reservation · Reliable connection · Receiver can obtain different levels of service RSVP · No guarantee for QoS · Header is larger than UDP · More complicated that UDP · No support for congestion control · Support real-time transmission · Provide timing reconstruction, loss detection, security and content identification · Allows retrieval of very interesting network statistics RTP/RTCP · Many network firewalls block UDP data · Need error concealment for video packet loss · No support for congestion control · Cannot be played using popular stream players such as QuickTime · Suitable for streaming · Allows packet drops; if packets arrive late or damaged, streaming will continue · No retransmission needed UDP · Typically need large buffer to handle data rate variation · Loss recovery needs retransmission causing further jitter or skew · No support for multicast · Dominate protocol for data transfer of data over the Internet · Streaming through firewall · Reliable TCP
B.Krishnamurthy, J. Rexford. “Web Protocols and Practice” , 2001
A.Silberschatz, P.Garvin, G.Gange. “ operating system concepts”, 2005
References DCT Basis Function Image: http:// en.wikipedia.org/wiki/Image:Dctjpeg.png , GNU licensed JPEG Example Image: http:// en.wikipedia.org/wiki/Image:Phalaenopsis_JPEG.jpg , by Ilmari Karonon at Wikipedia, Creative Commons Attribution-ShareAlike 2.5 License MP3 File Image: http://en.wikipedia.org/wiki/Image:Mp3filestructure.jpg