SYLLABUS FOR AFFILIATED INSTITUTIONSANNA UNIVERSITY, CHENNAIREGULATIONS – 2009CU9255 INTERNETWORKING MULTIMEDIA L T P C3 0 0 3UNIT I INTRODUCTION 9Digital sound, video and graphics, basic multimedia networking, multimedia characteristics, evolutionof Internet services model, network requirements for audio/video transform, multimedia coding andcompression for text, image, audio and video. Multimedia Communication in Wireless Network.UNIT II SUBNETWORK TECHNOLOGY 9Broadband services, ATM and IP, IPV6, High speed switching, resource reservation, Buffermanagement, traffic shaping, caching, scheduling and policing, throughput, delay and jitterperformance.UNIT III MULTICAST AND TRANSPORT PROTOCOL 9Multicast over shared media network, multicast routing and addressing, scoping multicast and NBMAnetworks, Reliable transport protocols, TCP adaptation algorithm, RTP, RTCP.UNIT IV MEDIA - ON – DEMAND 9Storage and media servers, voice and video over IP, MPEG-2 over ATM/IP, indexing synchronizationof requests, recording and remote control.UNIT V APPLICATIONS 9MIME, Peer-to-peer computing, shared application, video conferencing, centralized and distributedconference control, distributed virtual reality, light weight session philosophy.TOTAL:45 PERIODSREFERENCES:1. Jon Crowcroft, Mark Handley, Ian Wakeman. “Internetworking Multimedia”, Harcourt Asia Pvt.Ltd. Singapore, 1998.2. B.O. Szuprowicz, “Multimedia Networking”, McGraw Hill, NewYork. 19953. Tay Vaughan,Multimedia making it to work, 4ed,Tata McGrawHill, NewDelhi,2000.4. Ellen kayata wesel, Ellen Khayata, “Wireless Multimedia Communication: Networking Video,Voice and Data”, Addison Wesley Longman Publication, USA, 1998.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 1UNIT IINTRODUCTIONMultimedia is, as described previously, a woven combination of digitally manipulated text,photographs, graphic art, sound, animation, and video elements. When you allow an end user—also known as the viewer of a multimedia project—to control what and when the elements aredelivered, it is called interactive multimedia. When you provide a structure of linked elementsthrough which the user can navigate, interactive multimedia becomes hypermedia.Although the definition of multimedia is a simple one, making it work can be complicated. Notonly do you need to understand how to make each multimedia element stand up and dance, butyou also need to know how to use multimedia computer tools and technologies to weave themtogether. The people who weave multimedia into meaningful tapestries are called multimediadevelopers.The software vehicle, the messages, and the content presented on a computer, television screen,PDA (personal digital assistant), or mobile phone together constitute a multimedia project. If theproject is to be shipped or sold to consumers or end users, typically delivered as a download onthe Internet but also on a CD-ROM or DVD in a box or sleeve, with or without instructions, it isa multimedia title. Your project may also be a page or site on the World Wide Web, where youcan weave the elements of multimedia into documents with HTML (Hypertext MarkupLanguage) or DHTML (Dynamic Hypertext Markup Language) or XML (eXtensible MarkupLanguage) and play rich media files created in such programs as Adobe’s Flash, LiveMotion, orApple’s QuickTime by installing plug-ins into a browser application such as Internet Explorer,Safari, Google Chrome, or Firefox. Browsers are software programs or tools for viewing contenton the Web. See Chapter 12 for more about plug-ins, multimedia, and the Web. A multimediaproject need not be interactive to be called multimedia: users can sit back and watch it just asthey do a movie or the television. In such cases a project is linear, or starting at the beginningand running through to the end. When users are given navigational control and can wanderthrough the content at will, multimedia becomes nonlinear and user interactive, and is a powerfulpersonal gateway to information.Determining how a user will interact with and navigate through the content of a project requiresgreat attention to the message, the scripting or storyboarding, the artwork, and the programming.You can break an entire project with a badly designed interface. You can also lose the messagein a project with inadequate or inaccurate content. Multimedia elements are typically sewntogether into a project using authoring tools. These software tools are designed to manageindividual multimedia elements and provide user interaction. Integrated multimedia is the“weaving” part of the multimedia definition, where source documents such as montages,graphics, video cuts, and sounds merge into a final presentation. In addition to providing amethod for users to interact with the project, most authoring tools also offer facilities for creatingand editing text and images and controls for playing back separate audio and video files that havebeen created with editing tools designed for these media. The sum of what gets played back and
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 2how it is presented to the viewer on a monitor is the graphical user interface, or GUI(pronounced “gooey”).The GUI is more than just the actual graphics on the screen—it also often provides the rules orstructure for the user’s input. The hardware and software that govern the limits of what canhappen here are the multimedia platform or environment.HistoryThroughout the 1960s, 1970s 1980s and 1990s, Computers have been restricted to dealing withtwo main types of data - words and numbers, text and arithmetic processing, through WordProcessing and Spreadsheets etc. etc. Codes for numbers (binary, BCD, Fixed point etc., IEEEfloating point), are fairly well standardized. Codes for text (ASCII, EBCDIC, but also fonts,Kangi, ppt etc.) are also reasonably well understood. Higher level ``codes - links, indexes,references, and so on are the subject of such standards as the ubiquitous Hyper-Text MarkupLanguage, HTML.Now computers, disks and networks are fast enough to process, store and transmit audio andvideo and computer generated visualization material as well as text and graphics and data: hencethe multimedia revolutionOne thing about multimedia that cannot be overstated: It is big, like space in the HitchhikersGuide to the Universe, it is much bigger than you can imagine. Of course, we are not talkingabout the hype here, we are talking about the storage transmission and processing requirements!To paraphrase Maurice Zapp, from David Lodges novel, A Small World: ``Every Encoding is aDecoding. The idea behind this glib quote is that each time we invent a new way of representingand transmitting information, we also have to teach ourselves to receive and comprehend thatnew type of representation. In the rest of this section, we take a look at some aspects ofrepresentation that need to be understood in regard to multimedia.Numbers and letters have standard encodings: ASCII and IEEE Floating Point are the mostwidespread now (at least for common English language text processing, and for numericprogramming) - in the past there have been a plethora of other encodings, even for simpleRoman alphabet text. As multi-lingual support has become common, we have seeing a briefincrease in the number of encodings, and then as the problems become better understood, astandard set of character sets are emerging. Digital Multimedia Encodings in the form of audioand video are still at a very early stage in terms of standards, and there are many, partly becauseof the range of possible processing, storage and transmission performance capacities available oncomputers and networks, where some systems are right at the limits of their abilities to do anyuseful work at all!Each new medium needs to be coded and we need to have common representations for objects inthe medium; there are many choices. For example, speech can be coded as a sequence ofsamples, a sequence of phonemes, a string of text with a voice synthesizer setting, and so on,requiring more or less intelligence or processing at the sender and receiver, and providing more
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 3or less structural information (and as a result, typically allowing more compression). Similarly,video can be coded as a sequence of bitmaps, or else can be broken down into some descriptionof scenes, objects within scenes, motion of objects and so on.The codings now involve possible relationships with time and between different media. Whenwe read a block of text, it is usually up to the reader to choose how quickly to read it. Hypertextto some extent breaks this rule, at least by relating text non linearly with other text. When welisten to speech, or have a conversation with another autonomous being, we do not control therate of arrival of information so obviously. When we combine media, sound and vision, forexample, we typically expect the combined media on a recording (or seen remotely) to maintainthe temporal relationship that they had at source. This is what really defines data as beingmultimedia. Hypermedia is multimedia that is arranged with non-linear relations between sub-sequences.Compression, and Hierarchical encoding are also needed. Multimedia data is typically muchmore bulky than text or numeric data. A typical simple-minded sampled audio sequence mighttake 8K bytes per second. This compares badly with 8K bytes of text: Assume we had 10characters per word, then this would constitute 800 words, and might take a quick speakersomething like a minute to read aloud. In other words, the speech requires at least two orders ofmagnitude more bytes than text. Video is far worse still, although clearly, comparisons are moredifficult, since the value of typical information content is quite different.All of this means that we need to consider compression techniques, to save storage andtransmission capacity. Luckily, much audio and video is redundant (contains effectively repeatedor less useful data) and is often far more amenable to compression than text.Meta-languages (codes for codings) are required. Typically, while we are still evolving a widerange of codings and compression techniques, we need protocols for exchange of media betweendifferent systems. We also need protocols to relate the different media (for synchronisation andfor hypermedia).Next, lets look at some audio and video input forms and digital encodings.Audio and Video all start life in the ``Analog Domain. (Domain is used in this context just tomean before or after some particular conversion). It is important to understand the basicrequirements of the media in time and space. The analog domain is usually best understood interms of the range of frequencies in use for a particular quality. For sound, this means how lowand high a note/sound is allowed. For video, this translates into the number of distinguishablecolours. For video, we also have to consider the frame rate. Video is similar to film in that itconsists of a number of discrete frames. You may recall seeing old films which were shot at alower frame rate than is used nowadays, and flicker is visible. To refine this point further, weshould distinguish between the rate at which a scene is sampled, and the rate at which a frame ona screen is displayed. For many moving image systems, these may be different. For example,films may show the same frame more than once to reduce flicker. Although Cathode Ray Tubeshave significant persistence, video systems may refresh different parts of the screen at differentrates - interlacing is used in many systems where alternate lines of the screen are refreshed in
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 4alternate cycles. This is motivated by the possible reduction in bandwidth, both in the analog anddigital domain.Both sound and image can be broken down at any instant into a set of basic frequencies. This isthe so-called ``waveform. We can record all of the frequencies present at anyone time, or wecan choose to record only the ``important ones. If we choose to record less than all frequencies,we get less ``fidelity in our recording, so that the playback is less like the original. However, theless we record, the less tape/recording media we need.Audio and Video start as waves, a sequence of compression and rare-faction of air, or thefluctuation of an electric and magnetic field, with time.Waves need to be captured, by some device, and then sampled digitally. Typically, a ``sample-and-hold technique is used: An electro-mechanical or light sensitive device responds to thesound or light, and produces an analogue electrical signal. This can be averaged over a sampleperiod by providing some discrete clock signal, and an averaging circuit. The value during thissample period can then be converted to a digital value of a given accuracy (``quantized).We can do this sampling ``perfectly by sampling twice as often digitally as the highest analogfrequency, or we can take advantage of human frailty and reduce the quality by decreasing thesample frequency (the clock rate above, and/or the quantization (number of bits used persample).The term ``bandwidth is used by electrical engineers to refer to the frequency range of an analogsignal. Often, especially in the Internet Community, the term is used loosely to refer to channelcapacity, or the bit rate of a link. Note that because of sampling, quantization, and compression,the bit rate needed for a given bandwidth analog signal is potentially many times (orders ofmagnitude even) less than the perfect sample signal requirement would imply.Analog Audio for humans is roughly in the range 50Hz to 20KHz. Human speech is intelligible,typically even when restricted to the range 1-3KHz, and the telephone networks have takenadvantage of this since very early days by providing only limited quality lines. This has meantthat they can use low quality speakers and microphones in the handset - the quality is similar toAM radio.It is not entirely a coincidence, therefore that the copper wires used for transmission in thetelephone system were principally chosen for the ability to carry a baseband signal that couldconvey speech (``toll) quality audio.In most systems luckily, phone wires are over-engineered. They are capable of carrying a signalat up to 16 times the ``bandwidth, of that used by pure analog phones from the home to theexchange over a kilometre, and 300 times this bandwidth up to 100 meters. For the moment,though, the ``last mile or customer subscriber-loop circuits have boxes at the ends that limit thisto what is guaranteed for ordinary audio telephony, while the rest of the frequencies are used forengineering work.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 5Video signals, on the other hand, occupy a much wider frequency range. Analog TV, whichdefined for about 60 years the input, output and transmission standards, has several differentstandards, but is typically amplitude modulated on a 3.58MHz carrier. The signal that isconveyed on this is a sequence of ``scanlines, each making up a screen. The scanline isessentially a sample of the brightness and colors across a horizontal line as detected in thecamera, and as used to control the electron gun in the TV monitor.In the CCIR 601 standard, a digital version of this is defined, which two samples of 8 bits eachof colour (chrominance) and one of brightness (luminance) at 13/5MHz. The resulting data rateis around 166Mbps per second.It is not entirely a co-incidence that old cable TV networks are capable of transmitting these datarates; however modern hybrid fiber-coax networks are targeted at carrying a much larger numberof compressed digital channels.The purpose of talking about the media encoding and channel capacity requirements is to showthe relationship between particular media, and the transmission technology associated with them.The two largest networks in the world in terms of terminals are the phone network and the TVnetwork. Each addresses a particular capacity and pattern of communication. If a data networksuch as the Internet is to carry these media, and if the ``terminals, or workstations and PCs ofthe Internet are to be able to capture, store, transmit, receive, and display such media, then thenetwork and end systems have to deal with these types of data, one way or another. If wecompress the data, and decompress at a receiver, then the rate/capacity is till required outside ofthe compressed domain, and the compression/decompression engines need to be able to copewith this, even though they may spare the network (or storage device).The word ``encoding is often used as a noun as well as a verb when talking about multimedia.Nowadays, there is a vast range of encodings currently in use or development. There are avariety of reasons for this: Codes for audio, video depend on the quality of audio or videorequired. A very simple example of this is the difference between digital audio for ISDNtelephones (64Kbps PCM see later) and for CD (1.4Mbps 16 bit etc.) 1.1; another reason for therange of encodings is that some encodings include linkages to other media for reasons ofsynchronization (e.g. between voice and lips); yet another reason is to provide future proofingagainst any new media (holograms?); finally, because of the range of performance of differentcomputers, it may be necessary to have a ``meta-protocol to negotiate what is used betweenencoder and decoder. This permits programs to encode a stream of media according to whateveris convenient to them, while a decoder can then decode it according to their capabilities. Forexample, some HDTV (High Definition Television Standards) are actually a superset of currentstandard TV encoding so that a ``rougher picture can be extracted by existing TV receivers fromnew HDTV transmissions (or from paying back new HDTV videotapes). This principle is quitegeneral.Digital SoundDigital audio is created when you represent the characteristics of a sound wave using numbers—a process referred to as digitizing. You can digitize sound from a microphone, a synthesizer,
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 6existing recordings, live radio and television broadcasts, and popular CD and DVDs. In fact, youcan digitize sounds from any natural or prerecorded source.Digitized sound is sampled sound. Every nth fraction of a second, a sample of sound is taken andstored as digital information in bits and bytes. The quality of this digital recording depends uponhow often the samples are taken (sampling rate or frequency, measured in kilohertz, or thousandsof samples per second) and how many numbers are used to represent the value of each sample(bit depth, sample size, resolution, or dynamic range). The more often you take a sample and themore data you store about that sample, the finer the resolution and quality of the captured soundwhen it is played back. Since the quality of your audio is based on the quality of your recordingand not the device on which your end user will play the audio, digital audio is said to be deviceindependent.The three sampling rates most often used in multimedia are 44.1 kHz (CD-quality), 22.05 kHz,and 11.025 kHz. Sample sizes are either 8 bits or 16 bits. The larger the sample size, the moreaccurately the data will describe the recorded sound. An 8-bit sample size provides 256 equalmeasurement units to describe the level and frequency of the sound in that slice of time. A 16-bitsample size, on the other hand, provides a staggering 65,536 equal units to describe the sound inthat same slice of time. As you can see in Figure 4-1, slices of analog waveforms are sampled atvarious frequencies, and each discrete sample is then stored either as 8 bits or 16 bits (or more)of data.Figure 4-1 It is impossible to reconstruct the original waveform if the sampling frequency is toolow.The value of each sample is rounded off to the nearest integer (quantization), and if theamplitude is greater than the intervals available, clipping of the top and bottom of the waveoccurs (see Figure 4-2). Quantization can produce an unwanted background hissing noise, andclipping may severely distort the sound.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 7Figure 4-2 Examples of quantizing and clippingMaking Digital Audio FilesMaking digital audio files is fairly straightforward on most computers. Plug a microphone intothe microphone jack of your computer. If you want to digitize archived analog sourcematerials—music or sound effects that you have saved on videotape, for example—simply plugthe “Line-Out” or “Headphone” jack of the device into the “Line-In” jack on your computer.Then use audio digitizing software such as Audacity (see Figure 4-3), to do the work.You should focus on two crucial aspects of preparing digital audio files:• Balancing the need for sound quality against file size. Higher quality usually meanslarger files, requiring longer download times on the Internet and more storage space on aCD or DVD.• Setting proper recording levels to get a good, clean recording.Digital VideoIn digital systems, the output of the CCD is digitized by the camera into a sequence of singleframes, and the video and audio data are compressed before being written to a tape (see Figure 6-2) or digitally stored to disc or flash memory in one of several proprietary and competingformats. Digital video data formats, especially the codec used for compressing anddecompressing video (and audio) data, are important.Figure 6-2 Diagram of tape path across the video head for digital recording
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 8In 1995, Apple’s FireWire technology was standardized as IEEE 1394, and Sony quicklyadopted it for much of its digital camera line under the name i.Link. FireWire and i.Link (andUSB 2) cable connections allow a completely digital process, from the camera’s CCD to the harddisk of a computer; and camcorders store the video and sound data on an onboard digital tape,writable mini-DVD, mini–hard disk, or flash memory.HDTVWhat started as the High Definition Television (HDTV) initiative of the FederalCommunications Commission in the 1980s changed first to the Advanced Television (ATV)initiative and then finished as the Digital Television (DTV) initiative by the time the FCCannounced the change in 1996. This standard, which was slightly modified from both the DigitalTelevision Standard (ATSC Doc. A/53) and the Digital Audio Compression Standard (ATSCDoc. A/52), moved U.S. television from an analog to a digital standard. It also provided TVstations with sufficient bandwidth to present four or five Standard Television (STV, providingthe NTSC’s resolution of 525 lines with a 3:4 aspect ratio, but in a digital signal) signals or oneHDTV signal (providing 1,080 lines of resolution with a movie screen’s 16:9 aspect ratio).HDTV provides high resolution in a 16:9 aspect ratio (see Figure 6-3). This aspect ratio allowsthe viewing of Cinemascope and Panavision movies. There was contention between thebroadcast and computer industries about whether to use interlacing or progressive-scantechnologies. The broadcast industry promulgated an ultra-high-resolution, 1920 × 1080interlaced format (1080i) to become the cornerstone of the new generation of high-endentertainment centers, but the computer industry wanted a 1280 × 720 progressive-scan system(720p) for HDTV. While the 1920 × 1080 format provides more pixels than the 1280 × 720standard, the refresh rates are quite different. The higher resolution interlaced format deliversonly half the picture every 1/60 of a second, and because of the interlacing, on highly detailedimages there is a great deal of screen flicker at 30 Hz. The computer people argue that the picturequality at 1280 × 720 is superior and steady. Both formats have been included in the HDTVstandard by the Advanced Television Systems Committee (ATSC), found at www.atsc.org.Figure 6-3 Here you can see the difference between VGA and HDTV aspect ratios.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 9GraphicsStill images may be small or large, or even full screen. They may be colored, placed at randomon the screen, evenly geometric, or oddly shaped. Still images may be a single tree on a wintryhillside; stacked boxes of text against a gray, tartan, or Italian marble background; anengineering drawing; a snapshot of your department manager’s new BMW. Whatever their form,still images are generated by the computer in two ways: as bitmaps (or paint graphics) and asvector-drawn (or just plain “drawn”) graphics. Bitmaps may also be called “raster” images.Likewise, bitmap editors are sometimes called “painting” programs. And vector editors aresometimes called “drawing” programs.Bitmaps are used for photo-realistic images and for complex drawings requiring fine detail.Vector-drawn objects are used for lines, boxes, circles, polygons, and other graphic shapes thatcan be mathematically expressed in angles, coordinates, and distances. A drawn object can befilled with color and patterns, and you can select it as a single object. The appearance of bothtypes of images depends on the display resolution and capabilities of your computer’s graphicshardware and monitor. Both types of images are stored in various file formats and can betranslated from one application to another or from one computer platform to another. Typically,image files are compressed to save memory and disk space; many bitmap image file formatsalready use compression within the file itself—for example, GIF, JPEG, and PNG.Still images may be the most important element of your multimedia project or web site. If youare designing multimedia by yourself, put yourself in the role of graphic artist and layoutdesigner. Take the time necessary to discover all the tricks you can learn about your drawingsoftware. Competent, computer-literate skills in graphic art and design are vital to the success ofyour project. Remember—more than anything else, the user’s judgment of your work will beheavily influenced by the work’s visual impact.BitmapsA bit is the simplest element in the digital world, an electronic digit that is either on or off, blackor white, or true (1) or false (0). This is referred to as binary, since only two states (on or off) areavailable. A map is a two dimensional matrix of these bits. A bitmap, then, is a simple matrix ofthe tiny dots that form an image and are displayed on a computer screen or printed.A one-dimensional matrix (1-bit depth) is used to display monochrome images—a bitmap whereeach bit is most commonly set to black or white. Depending upon your software, any two colorsthat represent the on and off (1 or 0) states may be used. More information is required to describeshades of gray or the more than 16 million colors that each picture element might have in a colorimage, as illustrated in Figure 3-1. These picture elements (known as pels or, more commonly,pixels) can be either on or off, as in the 1-bit bitmap, or, by using more bits to describe them, canrepresent varying shades of color (4 bits for 16 colors; 8 bits for 256 colors; 15 bits for 32,768colors; 16 bits for 65,536 colors; 24 bits for 16,772,216 colors). Thus, with 2 bits, for example,the available zeros and ones can be combined in only four possible ways and can, then, describeonly four possible colors:
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 10Figure 3-1 A bitmap is a data matrix that describes the characteristics of all the pixels making upan image. Here, each cube represents the data required to display a 4 × 4–pixel image (the face ofthe cube) at various color depths (with each cube extending behind the face indicating thenumber of bits— zeros or ones—used to represent the color for that pixel).Together, the state of all the pixels on a computer screen make up the image seen by the viewer,whether in combinations of black and white or colored pixels in a line of text, a photograph-likepicture, or a simple background pattern.Vector DrawingMost multimedia authoring systems provide for use of vector-drawn objects such as lines,rectangles, ovals, polygons, complex drawings created from those objects, and text.• Computer-aided design (CAD) programs have traditionally used vector-drawn objectsystems for creating the highly complex and geometric renderings needed by architectsand engineers.• Graphic artists designing for print media use vector-drawn objects because the samemathematics that put a rectangle on your screen can also place that rectangle (or the fancycurves of a good line-art illustration) on paper without jaggies. This requires the higherresolution of the printer, using a page description format such as Portable DocumentFormat (PDF).• Programs for 3-D animation also use vector-drawn graphics. For example, the variouschanges of position, rotation, and shading of light required to spin an extruded corporatelogo must be calculated mathematically.A vector is a line that is described by the location of its two endpoints.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 11Vector drawing uses Cartesian coordinates where a pair of numbers describes a point in two-dimensional space as the intersection of horizontal and vertical lines (the x and y axes). Thenumbers are always listed in the order x,y. In three-dimensional space, a third dimension—depth— is described by a z axis (x,y,z). This coordinate system is named for the Frenchphilosopher and mathematician, René Descartes. So a line might be simply<line x1="0" y1="0" x2="200" y2="100">where x1 and y1 define the starting point (in the upper-left corner of the viewing box) and x2 andy2 define the end point. A simple rectangle is computed from starting point and size: yoursoftware will draw a rectangle (rect) starting at the upper-left corner of your viewing area (0,0)and going 200 pixels horizontally to the right and 100 pixels downward to mark the oppositecorner. Add color information like<rect x="0" y="0" width="200" height="100" fill="#FFFFFF" stroke="#FF0000"/>and your software will draw the rectangle with a red boundary line and fill it with the colorwhite. You can, of course, add other parameters to describe a fill pattern or the width of theboundary line. Circles are defined by a location and a radius:<circle cx="50" cy="50" r="10" fill="none" stroke="#000000" />Type the following code into a text editor and save it as plain text with a .svg extension. This is aScalable Vector Graphics file. Open it in an HTML5-capable browser (File:Open File…) andyou will see:<svg xmlns="http://www.w3.org/2000/svg"xmlns:xlink="http://www.w3.org/1999/xlink"width="200"height="200"viewBox="-100 -100 300 300"><rect x="0" y="0" fill="yellow" stroke="red" width="200" height="100"/><text transform="matrix(1 0 0 1 60 60)" font-family="TimesNewRomanPS-BoldMT" font-size="36">SVG</text></svg>Because these SVG files can be saved in a small amount of memory and because they arescalable without distortion (try changing the width and height of the view box in the precedingcode), SVG (Tiny) is supported by browsers on most mobile phones and PDAs. The SVGspecification also includes timebased changes or animations that can be embedded within theimage code (see www.w3.org/TR/SVG11/animate.html#AnimationElements). Figure 3-8 showsAdobe Illustrator saving a file in SVG format. Vector drawing tools use Bézier curves or paths tomathematically represent a curve. In practical terms, editing software shows you points on thepath, each point having a “handle.” Changing the location of the handle changes the shape of thecurve. Mastering Bézier curves is an important skill: these curves not only create graphic shapesbut represent motion paths when creating animations.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 12Figure 3-8 Drawing software such as Adobe Illustrator can save vector graphics in SVG format.Vector-Drawn Objects vs. BitmapsVector-drawn objects are described and drawn to the computer screen using a fraction of thememory space required to describe and store the same object in bitmap form. The file containingthe vector-drawn colored rectangle described in the preceding section is less than 698 bytes ofalphanumeric data (even less—468 bytes—when the description is tokenized or compressed as.svgz). On the other hand, the same rectangle saved as a .gif image with a 64-color palette takes1,100 bytes. Because of this file size advantage, web pages that use vector graphics as SVG filesor in plug-ins such as Flash download faster and, when used for animation, draw faster thanpages displaying bitmaps. It is only when you draw many hundreds of objects on your screenthat you may experience a slowdown while you wait for the screen to be refreshed—the size,location, and other properties for each of the objects must be computed. Thus, a single imagemade up of 500 individual line and rectangle objects, for example, may take longer for thecomputer to process and place on the screen than an image consisting of just a few drawn circleobjects. A vector-drawn object is created “on the fly,” that is, the computer draws the imagefrom the instructions it has been given, rather than displaying a precreated image. This meansthat vector objects are easily scalable without loss of resolution or image quality. A large drawnimage can be shrunk to the size of a postage stamp, and while it may not look good on acomputer monitor at 72 dpi, it may look great when printed at 300 dpi to a color printer. Resizinga bitmapped image requires either duplicating pixels (creating a blocky, jagged look calledpixelation) or throwing pixels away (eliminating details). Because vector images are drawn frominstructions on the fly, a rescaled image retains the quality of the original.Basic Multimedia NetworkingThe Internet began as a research network funded by the Advanced Research Projects Agency(ARPA) of the U.S. Defense Department, when the first node of the ARPANET was installed atthe University of California at Los Angeles in September 1969. By the mid-1970s, theARPANET “inter-network” embraced more than 30 universities, military sites, and governmentcontractors, and its user base expanded to include the larger computer science research
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 13community. By 1983, the network still consisted of merely several hundred computers on only afew local area networks.In 1985, the National Science Foundation (NSF) aligned with ARPA to support a collaborationof supercomputing centers and computer science researchers across the ARPANET. The NSFalso funded a program for improving the backbone of the ARPANET, by increasing itsbandwidth from 56 Kbps to T1 and then T3 (see “Connections” a little later in the chapter formore information) and branching out with links to international sites in Europe and the Far East.In 1989, responsibility and management for the ARPANET was officially passed from militaryinterests to the academically oriented NSF, and research organizations and universities(professors and students alike) became increasingly heavy users of this ever-growing “Internet.”Much of the Internet’s etiquette and rules for behavior (such as for sending e-mail and posting tonewsgroups) was established during this time. More and more private companies andorganizations linked up to the Internet, and by the mid-1990s, the Internet included connectionsto more than 60 countries and more than 2 million host computers with more than 15 millionusers worldwide. Commercial and business use of the Internet was not permitted until 1992, butbusinesses have since become its driving force. By 2001 there were 109,574,429 domain hostsand 407.1 million users of the Internet, representing 6.71 percent of the world’s population. Bythe beginning of 2010 (see Table 12-1), about one out of every four people around the world(26.6 percent) had access to the Internet, and more than 51 million domain names had beenregistered as “dot coms.”ProtocolsProtocols are the rules needed for communication. In human and computer communication,standard protocols eliminate confusion and wasted time misunderstanding one and other.In computer communication, protocols comprise three main components: interfaces, whichdefine the rules for using a protocol, and provide a service to the software that uses the interface;packet formats, which define the it syntax for the exchange of messages between local andremote systems; procedures, which define the operational rules concerning which packets can beexchanged when.Communication systems are built up out of multiple layered protocols. The concept of layering istwofold: firstly, common services can be built in all devices or subsystems, and specialisedservices built out of these for those devices or subsystems that need them; secondly, the detailsof operation of local, or technology specific features of a protocol can be hidden by one layerfrom the layer it above.In this book, we illustrate these different aspects protocols in several different ways: when wewant to show the layering of protocols, we make use of a stack diagram; when we want to showthe operational rules of a protocol, we sometimes use a time-sequence diagram such as 1.3,which portraits a particular instance of exchange of packets; in some places, we document thelayout of the packets themselves, to show what control functions are conveyed in eachtransmission, reception or exchange.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 14In the rest of this chapter, we introduce these aspects of the Internet multimedia protocols, aswell as the more subtle question of performance. Later chapters in the book cover these sectionsin more detail.First of all, we look at names, addresses and routes.Names, Addresses and RoutesThe user is usually very aware of the names of objects in the Internet. We frequently seereference to World Wide Web sites and pages (e.g. http://www.cs.ucl.ac.uk/staff/jon, whichincorporate Domain Name System names for a system or service in the network. A nameeffectively tells the user what something is.Names are useful to human users for this very reason, but are unwieldy and inefficient forcomputer systems. Typically, they are mapped into addresses, using a directory or nameservice.Addresses then tell a piece of software (or the network operator or manager) where something is.Each host in the network has a unique address of (currently) 32 bits (128 bits in IPv6.) Moreaccurately, each interface on each host in the Internet has a unique address, which is made up oftwo components:Host Part ; Network PartTable 1.1: The Structured Internet AddressAll IP packets carry both a source and a destination address. Typically, the network part isindicated by applying a mask of some number of bits to the address, leaving the bits for the hostpart. Common divisions are 24 bits for network, and 8 bits for host, and 16 bits for network partand 16 for the host part..The network part of the destination address is used by routers effectively to index routing tablesto figure out where a packet must be delivered to. In fact, a ``longest match lookup of the prefixor network part must be done which can be difficult to optimise. Recent work[#!Rout!#] hasachieved very good lookup times and table sizes.All packets in the Internet Protocol layer carry these addresses, so that they can be delivered tothe right destination, and so that we can determine where they came from. As we look at otherlayers in the Internet Multimedia protocol stack, we will see what other control information iscarried in packets to carry out additional functions.Internet Multimedia ProtocolsThe overall protocol architecture that makes up the capability to deliver multimedia over whatwas originally a pure data network, is surprisingly not so very different from the original InternetArchitecture.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 15Figure 1.1: The Internet Multimedia Protocol StackThe protocol stacks for Internet multimedia are show in figure 1.1 above. Most of the protocolsare not deeply layered unlike many other protocol stacks, but rather are used alongside eachother to produce a complete session. It is possible to use multimedia applications over theInternet without some (or all) of the additional protocols (e.g. omission of RSVP, or omission ofSDP, and so on) depending on the performance or functionality required. Later chapters willshow how each new protocol adds value to the basic IP packet delivery model.In the next section, we discuss the underlying unicast and multicast delivery model, which mustbe added within the IP layer to give maximum benefit to the network.Packet Switched Data networking adds value to circuit switched networks by sharing thecapacity amongst multiple users in time. The data stream from each user is broken into chunks toform packets, which only require capacity when each packet is being sent. The capacity requiredis thus the sum of the average bandwidth, rather than the total peak bandwidth. This ``statisticalmultiplexing gain comes at the expense of having sometimes to queue packets for access to thetransmission channel.The statistical multiplexing gain is quite large for traditional bursty data applications such asWWW access. For multimedia traffic, this gain is harder to achieve (depending on thecompression algorithms used, as discussed in later chapters, it can even become even harder) andyet we can get a spatial gain in use of the network by using group communication carefully. Asimple way to send data to multiple recipients is to send it multiple times from the source. Such``multiple unicasts make very poor use of the links near the sender, and potentially incur a lot ofdelay before a single transmission is completed. The Internet offers a mechanism to avoid theseoverheads called IP multicast.Multicast is essential in the context of Internet technologies that may replace television(streaming services, as discussed in chapter nine), but it is also highly relevant in telephonyservices, especially in the business community: it is extremely common for one phone call tospawn another - the ability to teleconference is not widespread in the traditional plain old
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 16telephone system except at some commercial cost, or in a restricted subset of telephones. TheInternet ability to provide this service is quite powerful.We discuss some of the limitations of multicast as a service in chapter three. Some of theproblems that vendors have had in supporting large scale multicast as well as service guaranteesin Router products have prompted the design of a new generation of routers coming on to themarket as we write. Similarly, the understanding of service guarantees and multicast has beenslow to diffuse into the commercial Internet Service Provider community until relativelyrecently, but this is changing rapidly.Multimedia CharacteristicsA multimedia system has four basic characteristics:• Computer is an intrinsic part of the multimedia system. As a result, multimedia hasbecome interactive. In multimedia, computer allows the user to interact with the mediaand thus manipulate it by controlling what is to be communicated and when. Multimediahas resulted in the creation of many new possibilities—(1) the computational power ofcomputer is utilized for multimedia applications, (2) the telecommunication network(Internet, WWW) along with the computer enables transmission and distribution ofinformation, and, (3) the use of computer facilitates design and creation of a variety ofnew applications.• The different elements of multimedia are combined and integrated into a singlemultimedia system. Special software is required for the integration of different mediaelement files.• The use of computer in multimedia requires all elements of multimedia to be in digitalformat. In a digital multimedia system, the media streams are digital and are produced,processed, stored, represented and transmitted using computers. The digital nature ofmultimedia requires special treatment of the multimedia elements. The hardware andsoftware are needed to convert multimedia elements from analog to digital format andvice versa. There is a need to decide about the resolution versus quality of outputrequired, during storing of data in the computer. Storing multimedia files on computerhard disk takes large amount of disk space, so compression technologies and file formatsfor storing the different media elements is required. Moreover, special programs arerequired to play the compressed files. Similarly, special software is required to edit thedifferent media element files, and to combine and integrate the different elements of themultimedia into a single multimedia system.• Multimedia system is interactive. The user is active and can manipulate whatever is beingcommunicated. Multimedia allows two-way communication. The user can use deviceslike keyboard, trackball or joystick to interact with the multimedia system. Interactivemultimedia is non-linear. The user is able to follow the links and jump from one part ofthe document to the other. Hypermedia enables a user to gain or provide access to text,audio and video, and computer graphics using links in a non-linear way, using computers.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 17World Wide Web (WWW) is an example of hypermedia application. The user is able torespond and control what to see or hear and when to do it.Evolution of Internet Service ModelsTraditionally the Internet has provided best-effort delivery of datagram traffic from senders toreceivers. No guarantees are made regarding when or if a datagram will be delivered to areceiver. However datagrams are normally only dropped when a router exceeds a queue sizelimit due to congestion. The best-effort Internet service model does not assume first-in-first-out(FIFO, also known as first-come-first-served) queueing, although many routers haveimplemented this. The effect is to provide rather unfair distribution of resources.With best-effort service, if a link is not congested, queues will not build at routers, datagramswill not be discarded in routers, and delays will consist of serialisation delays at each hop pluspropagation delays. With sufficiently fast link speeds, serialisation delays are insignificantcompared to propagation delays. However, if a link is congested, with best-effort servicequeueing delays will start to influence end-to-end delays, and packets will start to be lost asqueue size limits are exceeded.Non-best effort serviceReal-time internet traffic is defined as datagrams that are delay sensitive. ``Real-time is an oft-misused term, and we are guilty here too. In process control systems, telemetry monitoring andso on, real-time really refers to systems with drop dead deadlines, after which information isirretrievably lost, or catastrophic consequences ensue if deadlines are missed. In multimediasystems, while we might have data with real time delivery requirements, we are at liberty to loseit without necessarily losing much information. We are also at liberty to relax schedules fordelivery, since humans are tolerant creatures compared with machines. It could be argued that alldatagrams are delay sensitive to some extent, but for these purposes we refer only to datagramswhere exceeding an end-to-end delay bound of a few hundred milliseconds renders thedatagrams useless for the purpose they were intended. For the purposes of this definition, TCPtraffic is normally not considered to be real-time traffic, although there may be exceptions to thisrule.On congested links, best-effort service queueing delays will adversely affect real-time traffic.This does not mean that best-effort service cannot support real-time traffic - merely thatcongested best-effort links seriously degrade the service provided. For such congested links, abetter-that-best-effort service is desirable.To achieve this, the service model of the routers can be modified. At a minimum, FIFO queueingcan be replaced by packet forwarding strategies that discriminate different ``flows of traffic.The idea of a flow is very general. A flow might consist of ``all marketing site web traffic, or``all file server traffic to and from teller machines or ``all traffic from the CEOs laptopwherever it is. On the other hand, a flow might consist of a particular sequence of packets from
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 18an application in a particular machine to a peer application in another particular machinebetween specific times of a specific day.Flows are typically identifiable in the Internet by the tuple: source machine, destination machine,source port, destination port, protocol any of which could be ``ANY (wild-carded).In the multicast case, the destination is the group, and can be used to provide efficientaggregation.Flow identification is called classification and a class (which can contain one or more flows) hasan associated service model applied. This can default to best effort.Through network management, we can imagine establishing classes of long lived flows -enterprise networks (``Intranets) often enforce traffic policies that distinguish priorities whichcan be used to discriminate in favour of more important traffic in the event of overload (thoughin an under-loaded network, the effect of such policies will be invisible, and may incur noload/work in routers).The router service model to provide such classes with different treatment can be as simple as apriority queueing system, or it can be more elaborate.Although best-effort services can support real-time traffic, classifying real-time traffic separatelyfrom non-real-time traffic and giving real-time traffic priority treatment ensures that real-timetraffic sees minimum delays. Non-real-time TCP traffic tends to be elastic in its bandwidthrequirements, and will then tend to fill any remaining bandwidth.We could imagine a future Internet with sufficient capacity to carry all of the worlds telephonytraffic. Since this is a relatively modest capacity requirement, it might be simpler to establish``POTS (Plain Old Telephone System) as a static class which is given some fraction of thecapacity overall, and then no individual call need be given an allocation (i.e. we would no longerneed the call setup/tear down that was needed in the legacy POTS which was only present due tounder-provisioning of trunks, and to allow the trunk exchanges the option of call blocking). Thevision is of a network that is engineered with capacity for all of the average load sources to sendall the time.ReservationsFor flows that may take a significant fraction of the network (i.e. are ``special) , we need a moredynamic way of establishing these classifications. In the short term, this applies to anymultimedia calls since the Internet is largely under-provisioned at time of writing.The Resource Reservation Protocol, RSVP is being standardised for just this purpose. It providesflow identification and classification. Hosts and applications are modified to speak RSVP clientlanguage, and routers speak RSVP.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 19Since most traffic requiring reservations is delivered to groups (e.g. TV), it is natural for thereceiver to make the request for a reservation for a flow. This has the added advantage thatdifferent receivers can make heterogeneous requests for capacity from the same source. Againthe routers conspire to deliver the right flows to the right locations. RSVP accommodates thewild-carding noted above.Admission ControlIf a network is provisioned such that it has excess capacity for all the real-time flows using it, asimple priority classification ensures that real-time traffic is minimally delayed. However, if anetwork is insufficiently provisioned for the traffic in a real-time traffic class, then real-timetraffic will be queued, and delays and packet loss will result. Thus in an under-provisionednetwork, either all real-time flows will suffer, or some of them must be given priority.RSVP provides a mechanism by which an admission control request can be made, and ifsufficient capacity remains in the requested traffic class, then a reservation for that capacity canbe put in place.If insufficient capacity remains, the admission request will be refused, but the traffic will still beforwarded with the default service for that traffics traffic class. In many cases even an admissionrequest that failed at one or more routers can still supply acceptable quality as it may havesucceeded in installing a reservation in all the routers that were suffering congestion. This isbecause other reservations may not be fully utilising their reserved capacity.AccountingIf a reservation involves setting aside resources for a flow, this will tie up resources so that otherreservations may not succeed, and depending on whether the flow fills the reservation, othertraffic is prevented from using the network. Clearly some negative feedback is required in orderto prevent pointless reservations from denying service to other users. This feedback is typicallyin the form of billing. For real-time non-best effort traffic that is not reserved, this negativefeedback is provided in the form of loss due to congestion of a traffic class, and it is not clearthat usage based billing is required.Billing requires that the user making the reservation is properly authenticated so that the correctuser can be charged. Billing for reservations introduces a level of complexity to the Internet thathas not typically been experienced with non-reserved traffic, and requires network providers tohave reciprocal usage-based billing arrangements for traffic carried between them. It alsorequires mechanisms whereby some fraction of the bill for a link reservation can be charged toeach of the downstream multicast receivers.Recent work on charging[#!kelly!#] has proposed quite simple models of billing associated withmultimedia traffic. A generalised model for pricing bursty connections (or flows in our context)was proposed in [#!burst!#]:a * V + b * T + c
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 20where V is the traffic volume at the minimum requested rate (can be zero) and T is the time atthe average (measured) rate. The parameters a, b and c depend on the tarrifing scheme; e.g. peakrate, or IP subscribers line rate, plus equipment rental. A minimum rate (e.g. MCR or controlledload) gives a volume related charge (constant also factors in providers dimensioning) and amean rate (e.g. for VBR, or guaranteed) gives a time related charge; mixes are allowed. Delaybound can be simply a boolean in the QoS API, which is easily implemented as the ToS delaypreference bit in IP header; in most cases, this just selects a priority queue, although sometimes itselects a terrestrial over satellite route. For multimedia applications, which will probably initiallyuse a service that approximates to a lightly loaded network, we get a similar charge model as forTCP, which may be surprising to people, but it has lots of nice practical properties: for example,a typical network that permits ``premium service TCP and Internet telephony, might implementthe interface for the user as a single bit to select between normal TCP service (MCR==0), andcontrolled load TCP as well as telephony. The option for how to actually bill could be based onmeasured volume over time (the interval of an RTT is probably reasonable for both telephonyand TCP based traffic), or might simply relate to the fact that most domestic users access the netover a dial up link, so the time related charge could be folded in there trivially, and the volumerelated charge based on measured mean rate - conveniently, the access line polices the peak ratetrivially, since it is typically a modem+phone line or digital ISDN line, with a fixed line speedanyhow.Network Requirements for Audio/Video TransformIf your computer is connected to an existing network at an office or school, it is likely you arealready connected to the Internet. If you are an individual working from home, you will need atelephone dial-up account or broadband cable, Digital Subscriber Line (DSL), or wirelessequipment to connect to the backbone (the ultra-high-bandwidth underlying network operated byMCI, AT&T, Sprint, and other telecommunications companies) of the Internet through anInternet service provider (ISP).Bandwidth is how much data, expressed in bits per second (bps), you can send from onecomputer to another in a given amount of time. The faster your transmissions (or the greater thebandwidth of your connection), the less time you will spend waiting for text, images, sounds, andanimated illustrations to upload or download from computer to computer, and the moresatisfaction you will have with your Internet experience. To think in bytes per second, divide therate by eight. Table 12-2 lists the bandwidth of some common data transfer methods.Table 12-2 Bandwidth of Typical Internet and Computer Connections
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 21Table 12-2 Bandwidth of Typical Internet and Computer ConnectionsThe bottleneck at a typical user’s low-bandwidth modem connection is the most seriousimpediment to sending multimedia across the Internet. At low bandwidth, a page of text (3,000bytes) can take less than a second to send, but an uncompressed 640 × 480, 8-bit/256-colorimage (about 300,000 bytes) can take a few minutes; an uncompressed 640 × 480, 24-bit/16million-color image (about 900,000 bytes) can take many minutes to send. Occasionally also,even though you may have a high-speed connection, the server delivering your requested file orcontent may be “throttled down” while it manages many requests at once, and yours must waitits turn.To work within the constraints of bandwidth bottlenecks, multimedia developers on the Internethave but a few options:• Compress data as tightly as possible (into ZIP or SIT or TAR files) before transmitting.• Require users to download data only once; then store the data in a local hard disk cache(this is automatically managed by most browsers).• Design each multimedia element to be efficiently compact—don’t use a greater colordepth than is absolutely necessary or leave extra space around the edges.• Design alternate low-bandwidth and high-bandwidth navigation paths to accommodateall users.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 22• Implement streaming methods that allow data to be transferred and displayedincrementally as it comes in (without waiting for the entire data file to arrive).Multimedia Coding and Compression for Text, Image, Audio and VideoA key problem with multimedia is the sheer quantities of data that result from naive digitisationof audio, image or video sources. Other problems involve quality, representation of meta datasuch as timing and relationshgips between different media and so on.There are a variety of compression techniques commonly used in the Internet and other systemsto alleviate the storage, processing and transmission (and reception) costs for such data.We start by building a framework for understanding the systems requirements and componentsin dealing with multimedia flows - to start with, we look at the nature of the information and itsuse, leading to discussion of general principles of loss free and lossy compression. We look atsimple lossless schemes such as Run Length Encoding and systems based on the statistics offrequency of occurrences of codewords such as Huffman codes. We look at substitutional ordictionary based schemes such as the Lemple-Ziv family of algorithms. Then we look attransform based schemes, and the way in which controlled loss of quality can be achieved usingthese.We contrast data, audio, still image and moving image, covering the ideas of redundancy inimages, sound and motion, We look at the cycles within the data that lead to the signalprocessing models used by engineers, including those in computer generated and naturallyoccurring data, leading to model based coding and compression, including future schemes suchas wavelet, vector quantization, fractal and hierarchical use of lossy schemes.We look at Audio Compression. Audio-philes often use the term compression in another way - torefer to the reduction in dynamic range of an audio signal - for example, some noise reductionsystems use compression and expansion devices so that the noise w.r.t signal at low power levels(quiet bits) of a piece of music are less noticeable - this is quite different from compression of theamount of data needed to represent a signal at all. We look at the effect of the network on thedesign of coding and compression schemes - loss of synchronisation, data, re-ordering, andduplication all lead to the need for recovery ponts in the data stream, and place limits on thetime-frames that compression (and decompression) can operate over at sender and receiver endsfor interactive applications.We then discuss the main different current approaches and standards for multiplexing of audioand video between sender and recipient. Finally we cover the performance of some examplesystems.RoadmapFigure 4.1 illustrates the components of a system to capture, code, compress, transmit,decompress, decode and display multimedia data. The rest of this chapter describes each of the
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 23components in this diagram moving approximately from left to right across the picture. We coversome example cases of coding and compression schemes along the way.Figure 4.1: Road Map of Chapter FourSystem ComponentsAll data is eventually turned into some (typically repetitive) digital code, and the statistics of thedigital code are of great interest when we want to think about compressing a set of such data.The statistics are important at several levels of granularity.Even text input from a given language has several levels of interest, such as characters, words,and grammatical structure (sentences, paragraphs etc).In a similar way, speech or music signals have repetitive structure which shows correlation atseveral levels of detail. Images may have some structure although natural images this tends to bevery subtle (fractal). Moving images clearly have at least two timescales of interest, partly due tothe nature of the input and display devices (and the human eye), the scanline and the frame.Thus coding and compression go hand in hand. We choose some levels of granularities at whichto code an input signal or message - this determines the initial input buffer size over which werun our code. Thiswhich, for real time applications, determines the CODEC delays. This alsodetermines the number of quanta or different values for each "letter of the alphabet" in the inputbuffer.The selection of these two (or more) timescales may be determined long in advance for somedata types. For example, for text in a given language and alphabet there are large sets of samplesthat are amenable to analysis , so we can find nearly optimal digital representations of the data interms of storage. However, there may be other factors that affect our design of a code. Forexample a variable length code for the english alphabet could be devised that used less bits forthe average block of text than a fixed length codeword such as 7 bit ASCII. On the other hand, itmay be more efficient in computing terms to trade off a small overhead (even as much as 50%)in storage for speed of processing and choose a fixed length code - which is what has been donein practice for text.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 24For audio, while a speech input signal is composed of a streams of phonemes, words andsentences, it is relatively hard to build an intermediate representation of this, so typically we startwith a set of fixed length samples of short time spans of the signal and work from there.Similarly with a still (or single image from a moving) image, we choose to sample the inputscene at typically fixed horizontal and vertical intervals giving a 2D image of a given resolution.We make a design decision when we choose the number of levels (quanitisation) of the samples(the familiar 8-bit versus 24-bit color display is such a decision - 1 or 3 bytes of storage per pixelof the image).Nature of the SignalWe must distinguish between raw data and information, but such a distinction is quite a subtlebusiness. ``Information refers to that part of a signal that constitutes useful information for someuser.Thus depending on the user, some part of a signal may be regarded as less useful. This meansthat there may be redundancy in the data. In some cases, the redundancy is unambiguous - e.g. inthe easy case of simple repetition - where data is coded in some grossly inefficient manner.Depending on the source, and on the form of the signal, we may know something about thestatistics of the contents in advance, or we may have to do some sort of online analysis if we areto remove redundancy. The performance of online analysis will depend on the range andaccuracy over which the signal repeats itself - in other words the blocksize.How much data we can store in a compression algorithm that does onlien analysis will beaffected by how much delay we are allowed to incur (over and above the delay `` budget fortransmission and reception), and the CPU load incurred processign larger chunks of the signal.Finally, redundancy is in the eye of the beholder - we are rarely obliged to keep the originalsignal with 100% integrity since human frailty would mean that even without an Internet pathbetween light or sound source and a person, it is likely that the receiver would miss some parts ofa signal in any case. This latter point is extremely task dependent.Analog to Digital Conversion: SamplingAn input signal is converted from some continuosly varying physical value (e.g. pressure in air,or frequency or wavelength of light), by some electro-mechanical device into a continuouslyvarying electrical signal. This signal has a range of amplitude, and a range of frequencies thatcan present. This continuously varying electrical signal can then be converted to a sequence ofdigital values, called samples, by some analog to digital conversion circuit. Figure 4.2 illustratesthis process.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 25Figure 4.2: Sampling a Continuous SignalThere are two factors which determine the accuracy with which the digital sequence of valuescaptures the original continuous signal: the maximum rate at which we sample, and the numberof bits used in each sample. This latter value is known as the quanisation level, and is illustratedin figure 4.3.Figure 4.3: Quantisation of SamplesThe raw (un-compressed) digital data rate associated with a signal then is simply the sample ratetimes the number of bits per sample. To capture all possible frequencies in the original signal,Nyquists theorem shows that the digital rate must be twice the highest frequency component inthe continuous signal. However, it is often not necessary to capture all frequencies in the originalsignal - for example, voice is comprehensible with a much smaller range of frequencies than wecan actually hear. When the sample rate is much lower than the highest frequency in thecontinuosu signal, a band-pass filter which only allows frequencies in the range actually needed,is usally put before the sampling circuit. This avoids possible ambiguous samples (``aliases).
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 26Constructing a Signal out of ComponentsOne view of the input signal illustrated above in figure 4.2, is that it is made up of a number ofcontributing signals - mathematically, we can consider any reasonable set of orthogonal signalsas components, but the easiest ones to use are sine functions.One extreme case often used to illustrate this is the square wave signal. In figure 4.4, we show asquare wave. If this was made out of a number of sine waves with different frequencies, thecontribution of each frequency would be as illustrated in figure 4.5.Figure 4.4: Square WaveFigure 4.5: Spectrum of a Square WaveThe way then that we build up the square wave constructively out of a set of sine waves ofdifferent frequencies can be seen in the progression of figures,Figure 4.6: Square from One Sine WaveFigure 4.7: Square from Two Sine Waves
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 27Figure 4.8: Square from Three Sine WavesFigure 4.9: Square from Four Sine WavesIt may seem odd that a simple ``on-off signal takes a lot of contributions, but then the point isthat this method of representing the continuous signal is general, and can represent any inputsignal.Input data can be transformed in a number of ways to make it easier to apply certain compressiontechniques. The most common transform in current techniques is the Discrete Cosine Transform.This is a variant of the Discrete Fourier Transform, which is in turn, the digital (discrete) versionof the Continuous Fourier Transform.As described earlier, any signal (whether a video or audio signal) can be considered a periodicwave. If we think of a sequence of sounds, they are a modulation of an audio wave; similarly, thescan over an image or scene carried out by a camera conveys a wave which has periodic featuresin time (in the time frame of the scene, as well as over multiple video frames in the movingpicture). It is possible to convert from the original signal as a function of time, to the fourierseries, which is the sum of a set of terms, each being a particular frequency (or wavelength). Youcan think of these terms or coefficients as being the contribution of a set of base pure ``sine-wave frequencies (also known as the spectral density), that together make up the actual signal.andYou can imagine these sweeping through a typical audio signal as shown in figure 4.10, and``pulling out a spectrum (see figure 4.11, or set of coefficients that represent the contribution ofeach frequency to that part of the signal strength.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 28Lossless Data CompressionThere is a huge range of data compression techniques - these are of some interest to themultimedia systems designer, but there are many good books on them already. Suffice it to saythat three common techniques used are runlength encoding (removing repetitions of values andreplacing them with a counter and single value), Huffman coding, and dictionary techniques suchas the Lempel-Ziv family of substitutional compression algorithms.Run Length CompressionRun length coding is fairly simple to implement, and with all lossless schemes, its performancedepends heavily on the input data statistics. Computer generated binary files are often veryamenable to this type of compression, for example with a codeword size of 1 bit or a byte or aword often leading to elimination of many all 1s or all 0s successive fields. The nice feature ofthis scheme is that it incurs very little delay at sender or receiver. Note that this and otherschemes do incur a variable output bit/symbol rate.Huffman CompressionHuffman coding is the most widespread way of replacing a set of values of fixed size code wordswith an optimal set of different sized code words based on the statistics of the input data. Theway a huffman code is constructed involves constructing a frequecny distribution of the symbols.This is then used to decide the new compressed representation for each symbol. The easiest wayto do this is to consider the case for compressing alphabetic text, with symbols drawn fromcharacters in an alphabet with 256 letters. If these are all equally likely to occur, then it is hard tocompress the data. However, if there is a severe skew in the frequcny distribution in typical data(texts in this alphabet) then we can use more less bits to represent the most frequently occurringcharacters/codeword values, and more bits for the less commonly occurring ones, with somecompression gain. So, how to build this new coding scheme for the alphabet? The classic schemeis to construct a tree from the frequency distribution, taking pairs of characters/codeword valuesat a time from the least frequently occurring values in the frequency chart, and adding a bitposition to the string representation for them, with value 1 for one, 0 for the other, then movingto the next less commonly occurring; eventually, the most commonly occuring two values taketwo bits, one to say which is which, and one to say that it is one of these two values, or its one ofthe 254 other values. And so on...This scheme is optimal in only one case, which is when the probability of occurrences ofcodeword values is distributed as a set of inverse powers of 1/2, i.e. 1/2, 1/4, 1/8, 1/6 etc.Otherwise the scheme is less and less good.A generalization of huffman coding that avoids this latter problem is arithmetic codeing, whichuses binary fraction representations in building the coding tree. In practice, this can becomputationally expensive.If one is transmitting huffman (or arithmetic) compressed data, one must also share the samecodebook at sender and receiver: the list of codes and their compressed representation must be
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 29the same at both ends. This can be done on a case by case basis, but is usually based in long termstatistics of the data (e.g. the frequency of occurence of the letter `ee in the written Englishlanguage is a well known example of this sort of statistic).Dictionary Approaches to CompressionA completely different approach is to look at the data as it arrives and form a dictionary on thefly. As the dictionary is formed, it can be used to look up new input, dynamically, and if the newinput existed earlier in the stream, the dictionary position can be transmitted instead of the newinput codes. These schemes are known as ``substitutional compression algorithms, and there aretwo patented families of schemes invented by J Ziv and A Lempel in the 1970s that cover abroad class of the ideas here.Essentially, the dictionary is constructed as a data structure that holds strings of symbols that arefound in the input data, together with short bitstring entry numbers. Whenever an entry is notfound, it is added to the dictionary in a new position, and the new position and string sent. Thismeans that the dictionary is constructed at the receiver dynamically, so that there is no need tocarry out statistics or share a table separately.A second family of Lempel-Ziv dictionary based compression schems is based on the idea of asliding window over the input text. The compression algorithm consists of searching forsubstrings ahead in the text, in the current window. This approach constrains the size of thedictionary, which could otherwise grow in an unbounded way.Continuous Data: Sample Rates and QuantisationIn the next two sections, we look at audio and then video coding and compression. Here, one isconcerned with the initial fidelity of the signal which is tied up with the sampling mechanism -the number of samples per second, and the number of bits per sample (quanitisation levels) - inother words, one has to choose an accuracy right from the beginning, and this represents anopportunity for compression even before we have got to the digital domain! After this, there area number of other techniques, including the lossless ones just described, which are applicable.AudioDevices that encode and decode audio and video, as well compress and decompress are calledCODECs or CODer DECoders. Sometimes, these terms are used for audio, but mainly they arefor video devices.Figure 4.10: The Author Saying ``smith
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 30Figure 4.11: Typical Voice SpectrumVoice coding techniques take advantage of features of the voice signal. In the time domain, wecan see that there is a lot of similarity between adjacent speech samples - this means that asystem that only sends differences between sample values will achieve some compression. Wecan see tha there are a lot more values in samples with low intensity, than with high. This meansthat we could use more bits to represent the low values than the high ones. This could be done ina fixed way and A and mu law encodings do just this by choosing a logarithmic encoding. Or wecould adapt to the signal, and APCM does this. These techniques can be combined, and theADPCM (Adaptive Differential Pulse Code Modulation) achieves 50% savings over basic PCMwith no apparent loss of quality, and relatively cheap implementation.More ingenious compression relies on two things: an appreciation of the actual model of speechand a model of the listener. Such techniques usually involve recognizing the actual speechproduction and synthesising a set of filters which are transmitted to the receiver and used toreconstruct sound by applying them to raw ``sound from a single frequency source and a whitenoise generator - examples of CODECs that are based on this idea are Linear Predictive Coding(LPC) and CELP (Code Excited Linear Predictor. Including a model of how humans perceivesound (so called ``psycho acoustics) leads to more expensive, but highly effective compressionsuch as is used in MPEG audio CODECs.Audio Input and OutputAudio signals to and from the real (analog) world have a less immediately obvious mapping tothe digital world. Audio signals vary depending on the application. Human speech has a well
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 31understood spectrum, and set of characteristics, whereas musical input is much more varied, andthe human ear and perception and cognition systems behave rather differently in each case. Forexample, when a speech signal degrades badly, humans make use of comprehension tointerpolate. This may be harder or easier to do with music depending on levels of expertise andfamiliarity with the style/idiom.Basically, for speech, the analog signal from a microphone is passed through several stages.Firstly a band pass filter is applied eliminating frequencies in the signal that we are not interestedin (e.g. for telephone quality speech, above 3.6Khz). Then the signal is sampled, converting theanalog signal into a sequence of values, each of which represents the amplitude of the analoguesignal over a small discrete time interval. This is then quantised, or mapped into one of a set offixed values - e.g. for telephone quality speech, one of 2**8, or 256 possible values. Thesevalues are then coded (represented in some standard form) for transmission or storage.The process at the receiver is simply the reverse.Audio OutputAudio output is generally made by some physical process interacting with the air (in space, ofcourse, there is no sound!). The air conveys the sound to your ear (or to a microphone or otherinput device).To produce output from a computer, we need to take a digital signal, convert it to analogue, anduse that to drive a loudspeaker. Many PCs now feature fairly standard hi-fi amplifiers and stereospeaker systems.Sounds can be created purely digitally (from synthesiszers) or partly digitally (from samples) ornaturally from the surrounding world (wind in the trees, rain on the roof), or from analog musicalinstruments or from human speech.Audio Output by PeoplePeople generate sounds by breathing air through the vocal chords, which resonate, and thencontrolling the production of sound by varying the shape of their vocal tract, mouth, tongue andso on. For the purposes of communication, speech is generally more useful than music, andhappens to use a constrained part of the frequency and power range that humans are capable ofgenerating - and a much more constrained part of the range than they are capable of hearing.Typically, we can generate sounds over a dynamic range of 40 decibels, For recognisable speech,the vast majority of important sounds are in the frequency range 60H to 8000Hz (compared withmusic which is typically audible up to nearly 20KHz).Speech is made up of sound units called phonemes (the smallest unit of distinguishable sound).These are specific to a language, so for example ,we find that English and Japanese each havephonemes that the other language does not (e.g. ``l and ``r) (hence the difficulty in learning topronounce another distant language). We illustrate some of the international phonetic alphabetfor British English with example words in the table 4.1 below.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 32Vowels Diphthongs Semi-vowels Nasals Fricatives Affricatives Plosives/i/ heed /gi/ buy /w/ was /m/ am /s/ sail /dg/ jaw /b/ bat/I/ hid /ai/ by r/ ran /n/ an /S/ ship /tj/ chore /d/ disc/e/ head /au/ bow /l/ lot /n/ sang/ae/ had /ao/ bough /j/ yachtTable 4.1: International Phonetic Alphabet for British EnglishPhonemes are vowels or consonants, where vowels are either pure or diphthongs (made of twosounds), and consonants may be semi-vowel, fricative (use teeth to make), plosive (use lips ) ornasal (use nose). Other factors influencing the sound we make are stress (change of strength),rhythm and pace, and intonation (pitch).Audio Input by PeopleWe hear mainly through our ears which respond over a frequency range of around 20Khz. Stereois important to many human actions and even the phase difference between signals arriving fromthe same source at each ear (as well as simple timing, since sound moves so slow compared withlight) gives us good directional hearing, although mainly at high frequencies. As people et older,their high frequency accuracy decreases quite markedly, although this doesnt usually affectspeech recognition until old age.Summary of Audio and Video Input and OutputData (files etc.) typically compressed using simple schemes such as Run Length encoding, orstatistically based Huffman codes or dictionary based substitutional schemes such as the Lempel-Ziv algorithms. Audio and Video are loss tolerant, so can use cleverer compression that discardssome information. Compression of 400 times is possible on video - useful given the baseuncompressed data rate of a 25 fps CCIR 601 image is 140Mbps.4.2 A lot of standards for thisnow including schemes based on PCM, such as ADPCM, or on models such as LPC, and MPEGAudio. Note that lossy compression of audio and video is not acceptable to some classes of user(e.g. radiologist, or air traffic controller).It is sometimes said that ``the eye integrates while the ear differentiates. What is meant by thisis that the eye responds to stronger signals or higher frequencies with cumulative reaction, whilethe ear responds less and less (i..e to double the pitch, you have to double the frequency - so wehear a logarithmic scale as linear, and to double the loudness, you have to increase the powerexponentially too).A video CODEC can be anything from the simplest A2D device, through to something that doespicture pre-processing, and even has network adapters build into it (i.e. a videophone!). A
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 33CODEC usually does most of its work in hardware, but there is no reason not to implementeverything (except the a2d capture:-), in software on a reasonably fast processor.The most expensive and complex component of a CODEC is the compression/decompressionpart. There are a number of international standards, as well as any number of proprietarycompression techniques for video.The ITU (was CCITT) Audio FamilyThe fundamental standard upon which all videoconferencing applications are based is G.711 ,which defines Pulse Code Modulation(PCM). In PCM, a sample representing the instantaneousamplitude of the input waveform is taken regularly, the recommended rate being 8000 samples/s(50 ppm). At this sampling rate frequencies up to 3400-4000Hz are encodable. Empirically, thishas been demonstrated to be adequate for voice communication, and, indeed, even seems toprovide a music quality acceptable in the noisy environment around computers (or perhaps myhearing is failing). The samples taken are assigned one of 212 values, the range being necessaryin order to minimize signal-to-noise ratio (SNR) at low volumes. These samples are then storedin 8 bits using a logarithmic encoding according to either of two laws (A-law and =-law). Intelecommunications, A-law encoding tends to be more widely used in Europe, whilst =-lawpredominates in the US However, since most workstations originate outside Europe, the soundchips within them tend to obey =-law. In either case, the reason that a logarithmic compressiontechnique is preferred to a linear one is that it more readily represents the way humans perceiveaudio. We are more sensitive to small changes at low volume than the same changes at highvolume; consequently, lower volumes are represented with greater accuracy than high volumes.ADPCMAdaptive Differential Pulse Code Modulation ADPCM (G.721) allows for the compression ofPCM encoded input whose power varies with time. Feedback of a reconstructed version of theinput signal is subtracted from the actual input signal, which is then quantised to give a 4 bitoutput value. This compression gives a 32 kbit/s output rate. This standard was recently extendedin G.726 , which replaces both G.721 and G.723 , to allow conversion between 64 kbit/s PCMand 40, 32, 24, or 16 kbit/s channels. G.727 is an extension of G.726 and issued for embeddedADPCM on 40, 32, 24, or 16 kbit/s channels, with the specific intention of being used inpacketised speech systems utilizing the Packetized Voice Protocol (PVP), defined in G.764.The encoding of higher quality speech (50Hz-7kHz) is covered in G.722 and G.725 , and isachieved by utilizing sub-band ADPCM coding on two frequency sub-bands; the output rate is64 kbit/s.LPCLPC (Linear Predictive Coding) is used to compress audio at 16 Kbit/s and below. In thismethod the encoder fits speech to a simple, analytic model of the vocal tract. Only the
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 34parameters describing the best-fit model is transmitted to the decoder. An LPC decoder usesthose parameters to generate synthetic speech that is usually very similar to the original. Theresult is intelligible but machine-sound like talking.CELPCELP (Code Excited Linear Predictor) is quite similar to LPC. CELP encoder does the sameLPC modeling but then computes the errors between the original speech and the synthetic modeland transmits both model parameters and a very compressed representation of the errors. Thecompressed representation is an index into an excitation vector (which can be considered like a``code book shared between encoders and decoders. The result of CELP is a much higherquality speech at low data rate.MPEG AUDIOHigh quality audio compression is supported by MPEG. MPEG I defines sample rates of 48KHz, 44.1 KHz and 32 KHz. MPEG II adds three other frequencies , 16 KHz, 22,05 and 24KHz. MPEG I allows for two audio channels where as MPEG II allows five audio channels plusan additional low frequency enhancement channel.MPEG defines three compression levels that is Audio Layer I, II and III. Layer I is thesimplest, a sub-band coder with a psycho-acoustic model. Layer II adds more advanced bitallocation techniques and greater accuracy. Layer III adds a hybrid filterbank and non-uniformquantization. Layer I, II and III gives increasing quality/compression ratios with increasingcomplexity and demands on processing power.Still Image``A picture is worth a thousand words. But an image, uncomrpessed is worth many megabytes.How Big Is a Single Frame of Video?First we consider the spatial size of analogue video when compared to the common formats fordigital video standards. A PAL television displays video as 625 lines and an NTSC televisiondisplays 525 lines. Current televisions have an aspect ratio of 4:3, giving PAL a spatialresolution of 833 x 625, and NTSC a resolution of 700 x 525, not all of which is visible. Mostcommon formats for digital video are related to the visible area for each of the televisionstandards. The size of video when using the international standard H.261, found in [#!h261!#] is352 x 288 for the Common Image Format (CIF) format and 176 x 144 for the (Quarter CIF)QCIF format, and 704 x 576 for the (Super CIF) SCIF format, where a CIF image is a quarter thesize of the visible area of a PAL image. For NTSC derived formats 640 x 480, 320 x 240, and160 x 120 are common. Figure 4.12 shows the spatial size of these common resolutions withrespect to a PAL TV image.
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 35Figure 4.12: The spatial size of digital video compared with a PAL TV imageIt can be seen that digital images are all smaller than current television sizes. Moreover,television images are significantly smaller than current workstation screen sizes which arecommonly of the order 1200 x 1000 pixels. Digital video utilizes even less of a workstationscreen.Due to this significant size difference, some observers have commented that digital video oftenlooks like "moving postage stamps", on modern workstations.For digital video, as with analogue video, a new frame is required every 1/25th second for PALand every 1/30th second for NTSC. If we assume that there are 24 bits per pixel in the digitalvideo and 30 frames per second, the amount of disc space required for such a stream of full-motion video is shown in table 4.2. The table is presented for the amount of time the digitalvideo is shown and for a given spatial size in pixels.Time:Size 640x480 320x240 160x1201sec 27Mb 6.75Mb 1.68Mb1min 1.6Gb 400Mb 100Mb1hour 97Gb 24Gb 6Gb1000hours 97Tb 24Tb 6TbTable 4.2: The amount of data for full-motion digital videoWe can see that 1 hour of video with a resolution of 640 x 480 would consume 97 Gb of discspace, which is significantly larger than most storage devices. An equivalent amount of analoguevideo (i.e. a 1 hour video) , which has a higher resolution and also contains audio, would onlytake between a half and a quarter of a video cassette, for a 120 minute or a 240 minute cassette,respectively. However, although there are devices that can store this amount of data, there arecurrently no digital storage devices which could store 97 Gb on half a device which is the size ofa video cassette. The data shown in the tables was collated by Larry Rowe of the ComputerScience Division - EECS, University of California at Berkeley, for his work on The ContinuousMedia Player [#!rowe!#].In order to reduce the amount of data used for digital video, it is common to use compressiontechniques, such as the international standards H.261, MPEG [#!mpegrtp!#], or to useproprietary techniques such as nv encoding [#!frederick!#] or CellB [#!cellb!#]. Rowe has alsoestimated the amount of space used when compression techniques are used. Table 4.3 shows the
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 36space needed when compressing video of size 640 x 480 pixels, and table 4.4 shows the spaceused when compressing video of size 320 x 240 pixels. Both tables present data for a given scalefactor of compression and for the time the video is shown. The 97 Gb used for the 1 hour of 640x 480 video can be reduced to approximately 1 Gb when compression is done at a scale factor of100:1.Time v. Scale None 3:1 25:1 (JPEG) 100:1 (MPEG)1 sec 27 Mb 9 Mb 1.1 Mb 270 Kb1 min 1.6 Gb 540 Mb 65 Mb 16 Mb1 hour 97 Gb 32 Gb 3.9 Gb 970 MbTable 4.3: The amount of data for compressed video of size 640x480Time v. Scale None 3:1 25:1 (JPEG) 100:1 (MPEG)1 sec 6.75 Mb 2.25 Mb 270 Kb 68 Kb1 min 400 Mb 133 Mb 16 Mb 4 Mb1 hour 24 Gb 8 Gb 1 Gb 240 MbTable 4.4: The amount of data for compressed video of size 320x240Although the table shows compression factors for MPEG, the H.261 standard uses a DiscreteCosine Transform encoding function which is similar to that used in MPEG, therefore we canexpect the compression ratios to be of a similar order of magnitude. In reality, when encodingreal video the compression factor is not constant but variable because the amount of dataproduced by the encoder is a function of motion. However, these figures do give a reasonableestimation of what can be achieved.It is significant that with digital video it is possible to dramatically reduce the amount of datagenerated even further by reducing the perceived frame rate of the video from 30 frames asecond down to 15 or even 2 frames a second. This can be achieved by explicitly limiting thenumber of frames or through a bandwidth limitation mechanism. In many multicast conferencesthe bandwidth used is between 15 and 64 Kbps. Although the reduced frame rate video loses thequality of full-motion video, it is perfectly adequate for many situations, particularly inmultimedia conferencing.There are a large number of still image formats and compression schemes in use in the networktoday. Common schemes include:TIFF and GIFThese both use compression schemes based o nthe Lempel-Ziv type of algorithms describedearlier.JPEGThis is from the Joint Photographic Experts Group in the International Organisation forStandardization (ISO).
UNIT I INTERNETWORKING MULTIMEDIAAdri Jovin J.J.Page | 37The first two of these still image schemes are discussed elsewhere in great detail. JPEG isinteresting as it is also the same baseline technology as is used partly in several populat movingimage compression schemes. The JPEG standard`s goal has been to develop a method forcontinuous-tone image compression for both color and greyscale images. The standard definefour modes:• Sequential In this mode each image is encoded in a single left-to-right, top-to-bottomscan. This mode is the simplest and most implemented one in both hardware and softwareimplementation.• Progressive In this mode the image is encoded in multiple scans. This is helpful forapplications in which transmission time is too long and the viewer prefers to watch theimage building in multiple coarse-to-clear passes.• Lossless The image here is encoded to guarantee exact recovery of every source imagesample value. This is important to applications where any small loss of image data issignificant. Some medical applications do need that mode.• Hierarchical Here the image is encoded at multiple resolutions, so that low-resolutionversions may be decoded without having to decode the higher resolution versions. Thismode is beneficial when transmission over packet switched networks. Only the datasignificant for a certain resolution determined by the application can be transmitted, thusallowing more applications to share the same network resources. In real timetransmission cases (e.g. an image pulled out of an information server and synchronizedwith a real-time video clip), a congested network can start dropping packets containingthe highest resolution data resulting in a degraded quality of the image instead of delay.JPEG uses the Discrete Cosine Transform to compress spatial redundancy within an image in allof its modes apart from the lossless one where a predictive method issued instead.As JPEG was essentially designed for the compression of still images, it makes no use oftemporal redundancy which is a very important element in most video compression schemes.Thus, despite the availability of real-time JPEG video compression hardware, its use will bequite limit due to its poor video quality.Moving ImageVideo Input and OutputBefore you can digitize a moving image, you need to know what the analog form is, in terms ofresolution and frame rate. Unfortunately, there are three main standards in use. PAL is used inthe UK, while NTSC is used in the US and in JAPAN, and SECAM is used mainly in France andRussia. The differences are in number of lines, frame rate, scan order and so forth. PAL is usedmost widely - it uses 625 lines, and has a line rate of 50Hz. NTSC is used in the USA and Japan -it uses 525 lines, and a line rate of 59.94Hz. SECAM is used in France and Russia These alldiffer in number of lines, frame rate, interlace order and so on. There are also a number ofvariants of PAL and NTSC (e.g. I/PAL is used in the UK and Ireland, while M/NTSC is the ITUdesignation for the scheme used in the US and Japan).