Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on


Published in: Education, Technology, Business


  1. 1. 13PIT101 Multimedia Communication & Networks UNIT - IV Dr.A.Kathirvel Professor & Head/IT - VCEW
  2. 2. Unit - IV Stream characteristics for Continuous media – Temporal Relationship – Object Stream Interactions, Media Levity, Media Synchronization – Models for Temporal Specifications – Streaming of Audio and Video – Jitter – Fixed playout and Adaptive playout – Recovering from packet loss – RTSP –– Multimedia Communication Standards – RTP/RTCP – SIP and H.263
  3. 3. Multimedia- Definitions – Multi - many; much; multiple – Medium - a substance regarded as the means of transmission of a force or effect; a channel or system of communication, information, or entertainment (Merriam-Webster Dictionary ) • So, Multimedia??? – The terms Multi & Medium don’t seem fit well – Term Medium needs more clarification !!! 3
  4. 4. Multimedia- Definitions Perception • Medium – Means for distribution and presentation of information • Classification based on perception (text, audio, video) is appropriate for defining multimedia Representation Presentation Criteria for the classification of medium Storage Transmission Information Exchange 4
  5. 5. Multimedia- Definitions • Time always takes separate dimension in the media representation • Based on time-dimension in the representation space, media can be – Time-independent (Discrete) • Text, Graphics – Time dependent (Continuous) • Audio, Video • Video, sequence of frames (images) presented to the user periodically. • Time dependent aperiodic media is not continuous!! – Discrete & Continuous here has no connection with internal representation !! (relates to the viewers impression…) 5
  6. 6. Multimedia- Definitions • Multimedia is any combination of digitally manipulated text, art, sound, animation and video. • A more strict version of the definition of multimedia do not allow just any combination of media. • It requires – Both continuous & discrete media to be utilized – Significant level of independence between media being used • The less stricter version of the definition is used in practice. 6
  7. 7. Multimedia- Definitions • Multimedia elements are composed into a project using authoring tools. • Multimedia Authoring tools are those programs that provide the capability for creating a complete multimedia presentations by linking together objects such as a paragraph of text (song), an illustration, an audio, with appropriate interactive user control. 7
  8. 8. Multimedia- Definitions • By defining the objects' relationships to each other, and by sequencing them in an appropriate order, authors (those who use authoring tools) can produce attractive and useful graphical applications. • To name a few authoring tools – Macromedia Flash – Macromedia Director – Authorware • The hardware and the software that govern the limits of what can happen are multimedia platform or environment 8
  9. 9. Multimedia- Definitions • Multimedia is interactive when the end-user is allowed to control what and when the elements are delivered. • Interactive Multimedia is Hypermedia, when the end-user is provided with the structure of linked elements through which he/she can navigate. 9
  10. 10. Multimedia- Definitions • Multimedia is linear, when it is not interactive and the users just sit and watch as if it is a movie. • Multimedia is nonlinear, when the users are given the navigational control and can browse the contents at will. 10
  11. 11. Multimedia System • Following the dictionary definitions, Multimedia system is any system that supports more than a single kind of media – Implies any system processing text and image will be a multimedia system!!! – Note, the definition is quantitative. A qualitative definition would be more appropriate. – The kind of media supported should be considered, rather the number of media 11
  12. 12. Multimedia System • A multimedia system is characterized by computer-controlled, integrated production, manipulation, storage and communication of independent information, which is encoded at least through a continuous (time-dependent) and a discrete (time-independent) medium. 12
  13. 13. Data streams • Data Stream is any sequence of individual packets transmitted in a time-dependent fashion – Packets can carry information of either continuous or discrete media • Transmission modes – Asynchronous • Packets can reach receiver as fast as possible. • Suited for discrete media • Additional time constraints must be imposed for continuous media 13
  14. 14. Data streams – Synchronous • Defines maximum end-to-end delay • Packets can be received at an arbitrarily earlier time • For retrieving uncompressed video at data rate 140Mbits/s & maximal end-to-end delay 1 second the receiver should have temporary storage 17.5 Mbytes – Isochronous • Defines maximum & minimum end-to-end delay • Storage requirements at the receiver reduces 14
  15. 15. Characterizing continuous media streams • Periodicity – Strongly periodic • PCM -Pulse-code modulation coded speech in telephony behaves this way. – Weakly Periodic – Aperiodic • Cooperative applications with shared windows • Variation of consecutive packet size – Strongly Regular • Uncompressed digital data transmission – Weakly regular • Mpeg standard frame size I:P:B is 10:1:2 – Irregular 15
  16. 16. Characterizing continuous media streams • Contiguous packets – Are contiguous packets are transmitted directly one after another/ any delay? – Can be seen as the utilization of resource – Connected / unconnected data streams 16
  17. 17. Streaming Media • Popular approach to continuous media over the internet • Playback at users computer is done while the media is being transferred (no waiting till complete download!!!) • You can find streaming in – Internet radio stations – Distance learning – Cricket Live!!! – Movie Trailers 17
  18. 18. Streaming Media 18
  19. 19. Multimedia- Applications Multimedia plays major role in following areas – Instruction – Business – Advertisements – Training materials – Presentations – Customer support services – Entertainment – Interactive Games 19
  20. 20. Multimedia- Applications – Enabling Technology – Accessibility to web based materials – Teaching-learning disabled children & adults – Fine Arts & Humanities – Museum tours – Art exhibitions – Presentations of literature 20
  21. 21. Multimedia- Applications In Medicine Source: Cardiac Imaging, YALE centre for advanced cardiac imaging 21
  22. 22. Multimedia- Applications In training 22
  23. 23. Multimedia- Applications Public awareness campaign Source Interactive Multimedia Project Department of food science& nutrition, Colorado State Univ 23
  24. 24. Notion of Synchronization • Multimedia synchronization is understood in correspondence of content relation, spatial relation and temporal relation • Content Relation: defines dependency of media objects for some data – Example: dependency between a filled spreadsheet and graphics that represent data listed in spreadsheet
  25. 25. Spatial Relation • Spatial relation is represented through layout relation and defines space used for presentation of a media object on an output device at a certain point of time in a multimedia document – Example: desktop publishing – Layout frames are placed on an output device and a content is assigned to this frame • Positioning of layout frames: – Fixed to a position of a document – Fixed to a position on a page – Relative to the position of other frame – Concept of frames is used for positioning of time-dependent objects • Example: in window-based systems, layout frames correspond to windows and video can be positioned in a window
  26. 26. Temporal Relation • Temporal relation defines temporal dependencies between media objects – Example: lip synchronization • This relation is the focus of our papers, we will not talk about content or spatial relation • Time-dependent objects – represent a media stream because there exists temporal relations between consecutive units of the stream • Time-independent objects are traditional media such as images or text.
  27. 27. Temporal Relations (2) • Temporal synchronization is supported by many system components: – OS (CPU scheduling, semaphores during IPC) – Communication systems (traffic shaping, network scheduling) – Databases – Document handling • Synchronization needed at several levels of a Multimedia System
  28. 28. Temporal Relations (3) • 1. level: OS and lower communication layers bundle single streams – Objective is to avoid jitter at the presentation time of one stream • 2. level: on top of this level is the RUN-TIME support for synchronization of multimedia streams (schedulers) – Objective is bounded skews between various streams • 3. level: next level holds the run-time support for synchronization between time-dependent and timeindependent media together with handling of user interaction – Objective is bounded skews between time-dependent and time-independent media
  29. 29. Specification of Synchronization • Implicit Specification – Temporal relation may be specified implicitly during capturing of the media objects; the goal of a presentation is to present media in the same ways as they were originally captured • Audio/video recording and playback /VOD applications • Explicit Specification – Temporal relation may be specified explicitly in the case of presentations that are composed of independently captured or otherwise created objects • Slide show where presentation designed – Selects appropriate slides – Creates audio slides – Defines units of audio presentation stream – Defines units of audio presentation stream where slides have to be presented
  30. 30. Inter-object and Intra-Object Synchronization • Intra-object synchronization refers to the time relation between various presentation units of one time-dependent media object • Inter-object synchronization refers to the synchronization between media objects 40ms 40ms Audio 2 Audio 1 Slides Video Animation
  31. 31. Classification of Synchronization Units • Logical Data Units (LDU) – Samples or Pixels, – Notes or frames – Movements or scenes – Symphony or movie • Fixed LDU vs Variable LDU • LDU specification during recording vs LDU defined by user • Open LDU vs Closed LDU
  32. 32. Live Synchronization • Goal is to exactly reproduce at a presentation the temporal relations as they existed during the capturing process • Need to capture temporal relation information during the capturing • Live Sync is needed in conversational services – Video Conferencing, Video Phone – Recording and Retrieval services are considered retrieval services, or presentations with delay
  33. 33. Synthetic Synchronization • Temporal relations are artificially specified • Often used in presentation and retrieval-based systems with stored data objects that are arranged to provide new combined multimedia objects – Authoring and tutoring systems • Need synchronization editors to support flexible synchronization relations between media • Two phases: (1) specification phase defines temporal relations, (2) presentation phase presents data in a synchronized mode – 4 audio messages recorded relate to parts of an engine in an animation; the animation sequence shows a slow 360 degrees rotation of the engine.
  34. 34. Synchronization Requirements • For intra-object synchronization: – Accuracy concerning jitters and end-to-end delays in the presentations of LDUs • For inter-object synchronization: – Accuracy in the parallel presentation of media objects • Implication of blocking method: – Fine for time-independent media – Gap problem for time-dependent media • What does the blocking of a stream mean for the output device? • Should be repeated previous parts in case of speech or music? • Should the last picture of a stream be shown? • How long can such a gap exist?
  35. 35. Synchronization Requirements (2) • Solutions to Gap problem – Restricted blocking method • Switching to alternative presentation if gap between late video and audio exceeds a predefined threshold • Show last picture as a still image – Re-sampling of a stream • Speed up or slow down streams for the purpose of synchronization – Off-line-re-sampling – used after capturing of media streams with independent devices » Concert which is captured with two independent audio and video devices – Online re-sampling – used during a presentation in the case that at run-time a gap between media streams occurs
  36. 36. Synchronization Requirements (3) • Lip Synchronization Requirements refer to temporal relation between audio and video streams for human speaking • Time difference between related audio and video LDUs is called synchronization skew • Streams are in sync if skew = 0 or skew ≤ bound • Streams are out of sync if skew > 0 • Bounds: – Audio/Video in sync means -80ms ≤ skew ≤ 80ms – Audio/Video out of sync means skew < -160ms or skew > 160ms – Transient means -160ms ≤ skew < -80ms, and 80ms < skew ≤ 160ms
  37. 37. Synchronization Requirements (4) • Pointer Synchronization Requirements are very important in computer-supported cooperative work (CSCW) • We need synchronization between graphics, pointers and audio • Comparison – Lip sync error is skew between 40 to 60ms – Pointer sync error is skew between 250 to 1500ms • Bound: – Pointer/Audio/Graphics in sync means -500ms ≤ skew 750ms – Out of sync means skew < -1000ms and skew > 1250ms – Transient means -1000ms ≤ skew < -500ms and 750ms < skew ≤ 1250ms
  38. 38. Synchronization Requirements (5) • Digital audio on CD-ROM: – maximum allowable jitter delay in perception experiments 5-10ns, other experiments suggest 2ms • Combination of audio and animation is not as stringent as lip synchronization – Maximum allowable skew is +/- 80ms • Stereo audio is tightly coupled – Maximum allowable skew is 20ms; because of listening errors, suggestion for skew is +/- 11ms • Loosely coupled audio channels: speaker and background music – Maximum allowable skew is 500ms.
  39. 39. Synchronization Requirements (6) • Production-level synchronization should be guaranteed prior to the presentation of data at the user interface – For example, in case of recording of synchronized data for subsequent playback • Stored data should be captured and recorded with no skew • For playback, the defined lip sync boundaries are 80 ms • For playback at local and remote workstation simultaneously, sync skews should be between 160ms and 0ms (video should be ahead of audio for remote station due to pre-fetching)
  40. 40. Synchronization Requirements (7) • Presentation-level synchronization should be defined at the user interface • This synchronization focuses on human perception – Examples • Video and image overlay +/- 240ms • Video and image non-overlay +/- 500ms • Audio and image (music with notes) +/- 5ms • Audio and slide show (loosely coupled image) +/500ms • Audio and text (Text annotation) +/-240ms
  41. 41. Reference Model for Synchronization • Synchronization of multimedia objects are classified with respect to four-level system SPECIFICATION LEVEL (is an open layer, includes applications and tools Which allow to create synchronization specifications, e.g. sync editors); editing And formatting, mapping of user QoS to abstractions at object level OBJECT/SERVICE LEVEL (operates on all types of media and hides differences between discrete and continuous media), plan and coordinate presentations, initiate presentations STREAM LEVEL (operates on multiple media streams, provides inter-stream synchronization), resource reservation and scheduling MEDIA LEVEL (operates on single stream; treaded as a sequence of LDUs, provides intra-stream synchronization), file and device access
  42. 42. Synchronization in Distributed Environments • Information of synchronization must be transmitted with audio and video streams, so that the receiver side can synchronize the streams • Delivery of complete sync information can be done before the start of presentation – This is used in synthetic synchronization • Advantage: simple implementation • Disadvantage: presentation delay • Deliver of complete sync information can be using out-of-band communication via a separate sync channel – This is used in live synchronization • Advantage: no additional presentation delays • Disadvantage: additional channel is needed; additional errors can occur
  43. 43. Synchronization in Distributed Environments (2) • Delivery of complete synchronization information can be done using in-band communication via multiplexed data streams, i.e., synchronization information is in headers of the multimedia PDU – Advantage: related sync information is delivered together with media units – Disadvantage: difficult to use for multiple sources
  44. 44. Synchronization in Distributed Environments (3) Location of Synchronization Operations • It is possible to synchronize media objects by recording objects together and leave them together as one object, i.e., combine objects into a new media object during creation; Synchronization operation happens then at the recording site • Synchronization operation can be placed at the sink. In this case the demand on bandwidth is larger because additional sync information must be transported • Synchronization operation can be placed at the source. In this case the demand on bandwidth is smaller because the streams are multiplexed according to synchronization requirements
  45. 45. Synchronization in Distributed Environments (4) Clock Synchronization • Consider synchronization accuracy between clocks at source and destination • Global time-based synchronization needs clock synchronization • In order to re-synchronize, we can allocate buffers at the sink and start transmission of audio and video in advance, or use NTP (Network Time Protocol) to bound the maximum clock offset
  46. 46. Synchronization in Distributed Environments (5) Other Synchronization Issues • Synchronization in distributed environment is a multi-step process – Sync must be considered during object acquisition (during video digitization) – Sync must be considered during retrieval (synchronize access to frames of a stored video) – Sync must be considered during delivery of LDUs to the network (traffic shaping) – Sync must be considered during transport (use isochronous protocols if possible) – Sync must be considered at the sink (sync delivery to the output devices)
  47. 47. Synchronization Specification Methods • Interval-based Specification – presentation duration of an object is considered as interval • Examples of operations: A before(0) B, A overlap B, A starts B, A equals B, A during B, A while(0,0) B – Advantage: easy to handle open LDUs ad therefore user interactions – Disadvantage: model does not include skew specifications Audio 1 Audio 2 slides Video 1 animation
  48. 48. Synchronization Specification (2) • Control Flow-based Specification – Hierarchical Approach – Flow of concurrent presentation threads is synchronized in predefined points of the presentation – Basic hierarchical specification: 1. serial synchronization, 2. parallel synchronization of actions – Action can be atomic or compound – Atomic action handles presentation of a single media object, user input or delay – Compound actions are combinations of synchronization operators and atomic actions – Delay as an atomic action allows modeling of further synchronization (e.g. delay in serial presentation)
  49. 49. Synchronization Specification (3) • Control Flow-based Specification – Hierarchical Approach • Advantage: easy to understand, natural support of hierarchy and integration of interactive objects is easy • Disadvantage: Additional description of skews and QoS is necessary, we must add presentation durations Slides Animation Audio 1 Video Audio 2
  50. 50. Synchronization Specification (3) • Control flow-based Synchronization Specification – Timed Petri Nets transition Input place with token Output place • Advantages: Timed Petri nets allow all kinds of synchronization specification • Disadvantage: Difficulty with complex specification and offers insufficient abstraction of media object content because the media objects must be split into sub-objects
  51. 51. Multimedia
  52. 52. OBJECTIVES:  To show how audio/video files can be downloaded for future use or broadcast to clients over the Internet. The Internet can also be used for live audio/video interaction. Audio and video need to be digitized before being sent over the Internet.  To discuss how audio and video files are compressed for transmission through the Internet.  To discuss the phenomenon called Jitter that can be created on a packetswitched network when transmitting real-time data.  To introduce the Real-Time Transport Protocol (RTP) and Real-Time Transport Control Protocol (RTCP) used in multimedia applications.  To discuss voice over IP as a real-time interactive audio/video application. TCP/IP Protocol Suite 52
  53. 53. OBJECTIVES (continued):  To introduce the Session Initiation Protocol (SIP) as an application layer protocol that establishes, manages, and terminates multimedia sessions.  To introduce quality of service (QoS) and how it can be improved using scheduling techniques and traffic shaping techniques.  To discuss Integrated Services and Differential Services and how they can be implemented.  To introduce Resource Reservation Protocol (RSVP) as a signaling protocol that helps IP create a flow and makes a resource reservation. TCP/IP Protocol Suite 53
  54. 54. INTRODUCTION We can divide audio and video services into three broad categories: streaming stored audio/video, streaming live audio/video, and interactive audio/video, as shown in Figure 1. Streaming means a user can listen (or watch) the file after the downloading has started. TCP/IP Protocol Suite 54
  55. 55. Figure 1: Internet audio/video TCP/IP Protocol Suite 55
  56. 56. Note Streaming stored audio/video refers to ondemand requests for compressed audio/video files. TCP/IP Protocol Suite 56
  57. 57. Note Streaming live audio/video refers to the broadcasting of radio and TV programs through the Internet. TCP/IP Protocol Suite 57
  58. 58. Note Interactive audio/video refers to the use of the Internet for interactive audio/video applications. TCP/IP Protocol Suite 58
  59. 59. DIGITIZING AUDIO AND VIDEO Before audio or video signals can be sent on the Internet, they need to be digitized. We discuss audio and video separately. TCP/IP Protocol Suite 59
  60. 60. Topics Discussed in the Section  Digitizing Audio  Digitizing Video TCP/IP Protocol Suite 60
  61. 61. Note Compression is needed to send video over the Internet. TCP/IP Protocol Suite 61
  62. 62. AUDIO AND VIDEO COMPRESSION To send audio or video over the Internet requires compression. In this section, we first discuss audio compression and then video compression. TCP/IP Protocol Suite 62
  63. 63. Topics Discussed in the Section  Audio Compression  Video Compression TCP/IP Protocol Suite 63
  64. 64. Figure 2 TCP/IP Protocol Suite JPEG gray scale 64
  65. 65. Figure 3 TCP/IP Protocol Suite JPEG process 65
  66. 66. Figure 4 TCP/IP Protocol Suite Case 1: uniform gray scale 66
  67. 67. Figure 5 TCP/IP Protocol Suite Case2: two sections 67
  68. 68. Figure 6 TCP/IP Protocol Suite Case 3 : gradient gray scale 68
  69. 69. Figure 7 Reading the table TCP/IP Protocol Suite 69
  70. 70. Figure 8 MPEG frames TCP/IP Protocol Suite 70
  71. 71. Figure 9 TCP/IP Protocol Suite MPEG frame construction 71
  72. 72. STREAMING STORED AUDIO/VIDEO Now that we have discussed digitizing and compressing audio/video, we turn our attention to specific applications. The first is streaming stored audio and video. Downloading these types of files from a Web server can be different from downloading other types of files. To understand the concept, let us discuss three approaches, each with a different complexity. TCP/IP Protocol Suite 72
  73. 73. Topics Discussed in the Section  First Approach: Using a Web Server  Second Approach: Using a Web Server with Metafile  Third Approach: Using a Media Server  Fourth Approach: Using a Media Server and RTSP TCP/IP Protocol Suite 73
  74. 74. Figure 10 Using a Web server 1 GET: audio/video file RESPONSE 3 Audio/video file TCP/IP Protocol Suite 74 2
  75. 75. Figure 11 Using a Web server with a metafile 1 GET: metafile RESPONSE 2 3 Metafile 4 GET: audio/video file RESPONSE TCP/IP Protocol Suite 75 5
  76. 76. Figure 12 Using a media server 1 GET: metafile RESPONSE 2 3 Metafile 4 GET: audio/video file RESPONSE TCP/IP Protocol Suite 76 5
  77. 77. Figure 13 Using a media server and RSTP 1 GET: metafile RESPONSE 2 3 Metafile 4 SETUP RESPONSE 6 5 PLAY RESPONSE 7 Audio/video Stream 8 TEARDOWN RESPONSE TCP/IP Protocol Suite 77 9
  78. 78. STREAMING LIVE AUDIO/VIDEO Streaming live audio/video is similar to the broadcasting of audio and video by radio and TV stations. Instead of broadcasting to the air, the stations broadcast through the Internet. There are several similarities between streaming stored audio/video and streaming live audio/video. They are both sensitive to delay; neither can accept retransmission. However, there is a difference. In the first application, the communication is unicast and on-demand. In the second, the communication is multicast and live. TCP/IP Protocol Suite 78
  79. 79. REAL-TIME INTERACTIVE AUDIO/VIDEO In real-time interactive audio/video, people communicate with one another in real time. The Internet phone or voice over IP is an example of this type of application. Video conferencing is another example that allows people to communicate visually and orally. TCP/IP Protocol Suite 79
  80. 80. Topics Discussed in the Section  Characteristics TCP/IP Protocol Suite 80
  81. 81. Figure 14 Time relationship TCP/IP Protocol Suite 81
  82. 82. Note Jitter is introduced in real-time data by the delay between packets. TCP/IP Protocol Suite 82
  83. 83. Figure 15 Jitter TCP/IP Protocol Suite 83
  84. 84. Figure 16 Timestamp TCP/IP Protocol Suite 84
  85. 85. Note To prevent jitter, we can timestamp the packets and separate the arrival time from the playback time. TCP/IP Protocol Suite 85
  86. 86. Figure 17 Playback buffer TCP/IP Protocol Suite 86
  87. 87. Note A playback buffer is required for real-time traffic. TCP/IP Protocol Suite 87
  88. 88. Note A sequence number on each packet is required for real-time traffic. TCP/IP Protocol Suite 88
  89. 89. Note Real-time traffic needs the support of multicasting. TCP/IP Protocol Suite 89
  90. 90. Note Translation means changing the encoding of a payload to a lower quality to match the bandwidth of the receiving network. TCP/IP Protocol Suite 90
  91. 91. Note Mixing means combining several streams of traffic into one stream. TCP/IP Protocol Suite 91
  92. 92. Note TCP, with all its sophistication, is not suitable for interactive multimedia traffic because we cannot allow retransmission of packets. TCP/IP Protocol Suite 92
  93. 93. Note UDP is more suitable than TCP for interactive traffic. However, we need the services of RTP, another transport layer protocol, to make up for the deficiencies of UDP. TCP/IP Protocol Suite 93
  94. 94. RTP Real-time Transport Protocol (RTP) is the protocol designed to handle real-time traffic on the Internet. RTP does not have a delivery mechanism (multicasting, port numbers, and so on); it must be used with UDP. RTP stands between UDP and the application program. The main contributions of RTP are timestamping, sequencing, and mixing facilities. TCP/IP Protocol Suite 94
  95. 95. Topics Discussed in the Section  RTP Packet Format  UDP Port TCP/IP Protocol Suite 95
  96. 96. Figure 18 RTP packet header format TCP/IP Protocol Suite 96
  97. 97. Figure 19 RTP packet header format TCP/IP Protocol Suite 97
  98. 98. TCP/IP Protocol Suite 98
  99. 99. Note RTP uses a temporary even-numbered UDP port. TCP/IP Protocol Suite 99
  100. 100. RTCP RTP allows only one type of message, one that carries data from the source to the destination. In many cases, there is a need for other messages in a session. These messages control the flow and quality of data and allow the recipient to send feedback to the source or sources. Real-Time Transport Control Protocol (RTCP) is a protocol designed for this purpose. TCP/IP Protocol Suite 100
  101. 101. Topics Discussed in the Section  Sender Report  Receiver Report  Source Description Message  Bye Message  Application-Specific Message  UDP Port TCP/IP Protocol Suite 101
  102. 102. Figure 20 RTCP message types TCP/IP Protocol Suite 102
  103. 103. Note RTCP uses an odd-numbered UDP port number that follows the port number selected for RTP. TCP/IP Protocol Suite 103
  104. 104. VOICE OVER IP Let us concentrate on one real-time interactive audio/video application: voice over IP, or Internet telephony. The idea is to use the Internet as a telephone network with some additional capabilities. Instead of communicating over a circuit-switched network, this application allows communication between two parties over the packet-switched Internet. Two protocols have been designed to handle this type of communication: SIP and H.323. We briefly discuss both. TCP/IP Protocol Suite 104
  105. 105. Topics Discussed in the Section  SIP  H.323 TCP/IP Protocol Suite 105
  106. 106. Figure 21 SIP messages TCP/IP Protocol Suite 106
  107. 107. Figure 22 SIP formats TCP/IP Protocol Suite 107
  108. 108. Figure 23 SIP simple session INVITE: address, options Establishing OK: address ACK Communicating Exchanging audio BYE Terminating TCP/IP Protocol Suite 108
  109. 109. Figure 24 Tracking the callee INVITE Lookup Reply INVITE OK OK ACK ACK Exchanging audio BYE TCP/IP Protocol Suite 109
  110. 110. Figure 25 H.323 architecture TCP/IP Protocol Suite 110
  111. 111. Figure 26 H.323 protocols TCP/IP Protocol Suite 111
  112. 112. Figure 27 H.323 example Find IP address of gatekeeper Q.931 message for setup RTP for audio exchange RTCP for management Q.931 message for termination TCP/IP Protocol Suite 112
  113. 113. H.263
  114. 114. Chronology of Video Standards H.261 ITU-T H.263 H.263++ H.263+ ISO MPEG 1 1990 MPEG 4 MPEG 2 1992 H.263L 1994 1996 MPEG 7 1998 2000 2002
  115. 115. Chronology of Video Standards • (1990) H.261, ITU-T – Designed to work at multiples of 64 kb/s (px64). – Operates on standard frame sizes CIF, QCIF. • (1992) MPEG-1, ISO “Storage & Retrieval of Audio & Video” – Evolution of H.261. – Main application is CD-ROM based video (~1.5 Mb/s).
  116. 116. Chronology continued • (1994-5) MPEG-2, ISO “Digital Television” – Evolution of MPEG-1. – Main application is video broadcast (DirecTV, DVD, HDTV). – Typically operates at data rates of 2-3 Mb/s and above.
  117. 117. Chronology continued • (1996) H.263, ITU-T – Evolution of all of the above. – Supports more standard frame sizes (SQCIF, QCIF, CIF, 4CIF, 16CIF). – Targeted low bit rate video <64 kb/s. Works well at high rates, too. • (1/98) H.263 Ver. 2 (H.263+), ITU-T – Additional negotiable options for H.263. – New features include: deblocking filter, scalability, slicing for network packetization and local decode, square pixel support, arbitrary frame size, chromakey transparency, etc…
  118. 118. Chronology continued • (1/99) MPEG-4, ISO “Multimedia Applications” – MPEG4 video based on H.263, similar to H.263+ – Adds more sophisticated binary and multi-bit transparency support. – Support for multi-layered, non-rectangular video display. • (2H/’00) H.263++ (H.263V3), ITU-T – Tentative work item. – Addition of features to H.263. – Maintain backward compatibility with H.263 V.1.
  119. 119. Chronology continued • (2001) MPEG7, ISO “Content Representation for Info Search” – Specify a standardized description of various types of multimedia information. This description shall be associated with the content itself, to allow fast and efficient searching for material that is of a user’s interest. • (2002) H.263L, ITU-T – Call for Proposals, early ‘98. – Proposals reviewed through 11/98, decision to proceed. – Determined in 2001
  120. 120. Section 1: Conferencing Video  Video Compression Review  Chronology of Video Standards  The Input Video Format  H.263 Overview  H.263+ Overview
  121. 121. Video Format for Conferencing • Input color format is YCbCr (a.k.a. YUV). Y is the luminance component, U & V are chrominance (color difference) components. • Chrominance is subsampled by two in each direction. • Input frame size is based on the Common Intermediate Format (CIF) which is 352x288 pixels for luminance and 176x144 for each of the chrominance components. = Y Cr Cb
  122. 122. YCbCr (YUV) Color Space • Defined as input color space to H.263, H.263+, H.261, MPEG, etc. • It’s a 3x3 transformation from RGB. Y Cb Cr 0.299 0.587 0.114 R = -0.169 -0.331 0.500 G 0.500 -0.419 -0.081 B Y represents the luminance of a pixel. Cr, Cb represents the color difference or chrominance of a pixel.
  123. 123. Subsampled Chrominance • The human eye is more sensitive to spatial detail in luminance than in chrominance. • Hence, it doesn’t make sense to have as many pixels in the chrominance planes.
  124. 124. Spatial relation between luma and chroma pels for CIF 4:2:0 Different than MPEG-2 4:2:0 luminance pel chrominance pel sample block edge
  125. 125. Common Intermediate Format •The input video format is based on Common Intermediate Format or CIF. •It is called Common Intermediate Format because it is derivable from both 525 line/60 Hz (NTSC) and 625 line/50 Hz (PAL) video signals. •CIF is defined as 352 pels per line and 288 lines per frame. •The picture area for CIF is defined to have an aspect ratio of about 4:3 . However, 352 / 4  3  264  288
  126. 126. Picture & Pixel Aspect Ratios 288 Pixels are not square in CIF. Pixel 12:11 352 Picture 4:3
  127. 127. Picture & Pixel Aspect Ratios Hence on a square pixel display such as a computer screen, the video will look slightly compressed horizontally. The solution is to spatially resample the video frames to be 384 x 288 or 352 x 264 This corresponds to a 4:3 aspect ratio for the picture area on a square pixel display.
  128. 128. Blocks and Macroblocks The luma and chroma planes are divided into 8x8 pixel blocks. Every four luma blocks are associated with a corresponding Cb and Cr block to create a macroblock. macroblock Cb Y 8x8 pixel blocks Cr
  129. 129. Section 1: Conferencing Video Video Compression Review Chronology of Video Standards The Input Video Format  H.263+ Overview
  130. 130. ITU-T Recommendation H.263
  131. 131. ITU-T Recommendation H.263 • H.263 targets low data rates (< 28 kb/s). For example it can compress QCIF video to 10-15 fps at 20 kb/s. • For the first time there is a standard video codec that can be used for video conferencing over normal phone lines (H.324). • H.263 is also used in ISDN-based VC (H.320) and network/Internet VC (H.323).
  132. 132. ITU-T Recommendation H.263 Composed of a baseline plus four negotiable options Baseline Codec Unrestricted/Extended Motion Vector Mode Advanced Prediction Mode PB Frames Mode Syntax-based Arithmetic Coding Mode
  133. 133. Frame Formats Always 12:11 pixel aspect ratio.
  134. 134. Picture & Macroblock Types • Two picture types: – INTRA (I-frame) implies no temporal prediction is performed. – INTER (P-frame) may employ temporal prediction. • Macroblock (MB) types: – INTRA & INTER MB types (even in P-frames). • INTER MBs have shorter symbols in P frames • INTRA MBs have shorter symbols in I frames – Not coded - MB data is copied from previous decoded frame.
  135. 135. Motion Vectors • Motion vectors have 1/2 pixel granularity. Reference frames must be interpolated by two. • MV’s are not coded directly, but rather a median predictor is used. • The predictor residual is then coded using a VLC table. B A X MV  MVX  median MVA , MVB , MVC  C
  136. 136. Code length in bits Motion Vector Delta (MVD) Symbol Lengths 14 12 10 8 6 4 2 0 0 0.5 1 1.5 2 2.5 - 4.0 - 5.5 - 12.53.5 5.0 12.0 15.5 MVD Absolute Value
  137. 137. Transform Coefficient Coding Assign a variable length code according to three parameters (3-D VLC): 1 - Length of the run of zeros preceding the current nonzero coefficient. 2 - Amplitude of the current coefficient. 3 - Indication of whether current coefficient is the last one in the block. 3 - The most common are variable length coded (313 bits), the rest are coded with escape sequences (22 bits)
  138. 138. Quantization • H.263 uses a scalar quantizer with center clipping. • Quantizer varies from 2 to 62, by 2’s. • Can be varied ±1, ±2 at macroblock boundaries (2 bits), or 2-62 at row and picture boundaries (5 bits). out 2Q Q -Q -2Q in
  139. 139. Bit Stream Syntax Hierarchy of three layers. Picture Layer GOB* Layer MB Layer *A GOB is usually a row of macroblocks, except for frame sizes greater than CIF. Picture Hdr GOB Hdr MB MB ... GOB Hdr ...
  140. 140. Picture Layer Concepts Picture Start Code Temporal Reference Picture Type Picture Quant • PSC - sequence of bits that can not be emulated anywhere else in the bit stream. • TR - 29.97 Hz counter indicating time reference for a picture. • PType - Denotes INTRA, INTER-coded, etc. • P-Quant - Indicates which quantizer (2…62) is used initially for the picture.
  141. 141. GOB Layer Concepts GOB Headers are Optional GOB Start Code GOB Number GOB Quant • GSC - Another unique start code (17 bits). • GOB Number - Indicates which GOB, counting vertically from the top (5 bits). • GOB Quant - Indicates which quantizer (2…62) is used for this GOB (5 bits). • GOB can be decoded independently from the rest of the frame.
  142. 142. Macroblock Layer Concepts Coded Flag MB Type Code Block Pattern DQuant MV Deltas Transform Coefficients • COD - if set, indicates empty INTER MB. • MB Type - indicates INTER, INTRA, whether MV is present, etc. • CBP - indicates which blocks, if any, are empty. • DQuant - indicates a quantizer change by +/- 2, 4. • MV Deltas - are the MV prediction residuals. • Transform coefficients - are the 3-D VLC’s for the coefficients.
  143. 143. Unrestricted/Extended Motion Vector Mode • Motion vectors are permitted to point outside the picture boundaries. – non-existent pixels are created by replicating the edge pixels. – improves compression when there is movement across the edge of a picture boundary or when there is camera panning. • Also possible to extend the range of the motion vectors from [-16,15.5] to [-31.5,31.5] with some restrictions. This better addresses high motion scenes.
  144. 144. Motion Vectors Over Picture Boundaries Edge pixels are repeated. Reference Frame N-1 Target Frame N
  145. 145. Extended MV Range 15.5 15.5 (31.5,31.5) -16 15.5 -16 -16 15.5 -16 Extended motion vector range, [-16,15.5] around MV predictor. Base motion vector range.
  146. 146. Advanced Prediction Mode • Includes motion vectors across picture boundaries from the previous mode. • Option of using four motion vectors for 8x8 blocks instead of one motion vector for 16x16 blocks as in baseline. • Overlapped motion compensation to reduce blocking artifacts.
  147. 147. Overlapped Motion Compensation • In normal motion compensation, the current block is composed of – the predicted block from the previous frame (referenced by the motion vectors), plus – the residual data transmitted in the bit stream for the current block. • In overlapped motion compensation, the prediction is a weighted sum of three predictions.
  148. 148. Overlapped Motion Compensation • Let (m, n) be the column & row indices of an 88 pixel block in a frame. • Let (i, j) be the column & row indices of a pixel within an 88 block. • Let (x, y) be the column & row indices of a pixel within the entire frame so that: (x, y) = (m8 + i, n8 + j)
  149. 149. Overlapped Motion Comp. • Let (MV0x,MV0y) denote the motion vectors for the current block. • Let (MV1x,MV1y) denote the motion vectors for the block above (below) if the current pixel is in the top (bottom) half of the current block. • Let (MV2x,MV2y) denote the motion vectors for the block to the left (right) if the current pixel is in the left (right) half of the current block. MV0 MV1 MV1 MV2 MV2
  150. 150. Overlapped Motion Comp. Then the summed, weighted prediction is denoted: P(x,y) = (q(x,y) H0(i,j) + r(x,y) H1(i,j) + s(x,y) H2(i,j) +4)/8 Where, q(x,y) = (x + MV0x, y + MV0y), r(x,y) = (x + MV1x, y + MV1y), s(x,y) = (x + MV2x, y + MV2y)
  151. 151. Overlapped Motion Comp. H0(i, j) =
  152. 152. Overlapped Motion Comp. H1(i, j) = H2(i, j) = ( H1(i, j) )T
  153. 153. PB Frames Mode • Permits two pictures to be coded as one unit: a P frame as in baseline, and a bi-directionally predicted frame or B frame. • B frames provide more efficient compression at times. • Can increase frame rate 2X with only about 30% increase in bit rate. • Restriction: the backward predictor cannot extend outside the current MB position of the future frame. See diagram.
  154. 154. PB Frames -V 1/2 V 1/2 Picture 1 P or I Frame Picture 2 B Frame Picture 3 P or I Frame PB 2X frame rate for only 30% more bits.
  155. 155. Syntax based Arithmetic Coding Mode • In this mode, all the variable length coding and decoding of baseline H.263 is replaced with arithmetic coding/decoding. This removes the restriction that each sumbol must be represented by an integer number of bits, thus improving compression efficiency. • Experiments indicate that compression can be improved by up to 10% over variable length coding/decoding. • Complexity of arithmetic coding is higher than variable length coding, however.
  156. 156. H.263 Improvements over H.261 • H.261 only accepts QCIF and CIF format. • No 1/2 pel motion estimation in H.261, instead it uses a spatial loop filter. • H.261 does not use median predictors for motion vectors but simply uses the motion vector in the MB to the left as predictor. • H.261 does not use a 3-D VLC for transform coefficient coding. • GOB headers are mandatory in H.261. • Quantizer changes at MB granularity requires 5 bits in H.261 and only 2 bits in H.263.
  157. 157. Demo: QCIF, 8 fps @ 28 Kb/s H.261 H.263
  158. 158. Queries