Successfully reported this slideshow.
Your SlideShare is downloading. ×

Video Compression, Part 3-Section 2, Some Standard Video Codecs

Ad

Dr. Mohieddin Moradi
mohieddinmoradi@gmail.com
1
Dream
Idea
Plan
Implementation

Ad

Section I
– ISO/IEC JTC 1/SC 29 Structure and MPEG
– ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding...

Ad

3
Video Source
Decompress
(Decode)
Compress
(Encode)
Video Display
Coded
video
ENCODER + DECODER = CODEC

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 245 Ad
1 of 245 Ad
Advertisement

More Related Content

Advertisement
Advertisement

Video Compression, Part 3-Section 2, Some Standard Video Codecs

  1. 1. Dr. Mohieddin Moradi mohieddinmoradi@gmail.com 1 Dream Idea Plan Implementation
  2. 2. Section I – ISO/IEC JTC 1/SC 29 Structure and MPEG – ITU-T structure and VCEG (Video Coding Experts Group or Visual Coding Experts Group) – A Generic Interframe Video Encoder – H.261 Video Coding Standard – MPEG-1 Video Coding Standard – MPEG-2 Video Coding Standard Section II – MPEG-2 Transport and Program Streams – H.263 Video Coding Standard – H.263+ Video Coding Standard – H.263++ Video Coding Standard – Bit-rate (R) and Distortion (D) in Video Coding 2 Outline
  3. 3. 3 Video Source Decompress (Decode) Compress (Encode) Video Display Coded video ENCODER + DECODER = CODEC
  4. 4. − Created in 1992 • 300 members, >35 countries - www.dvb.org • Promotion of open standards for Digital TV broadcasting − Principal Recommandations • Physical Layer − Satellite: DVB-S, DVB-S2 − Cable: DVB-C − Terrestrial: DVB-T, DVB-T2 − Mobiles DVB-H, DVB-SH • Signalisation − Information de services: DVB-SI − Services synchro: DVB-SAD • Protection − DVB-CAS, DVB-CSA − Interface smartcard: DVB-CI, DVB-CI+ 4 DVB
  5. 5. DVB-C (QAM) DVB-T (COFDM) DVB-T (COFDM) DVB-S (QPSK) DVB-S (QPSK) Multiplexing Scrambling MPEG-2 Coding Descrambling MPEG-2 Decoding Satellite Terrestrial Cable DVB-C (QAM) Scrambling Key DVB Demultiplexing Descrambl. Key MPEG-2 Video Audio Data Data Audio Video DVB Systems 5
  6. 6. …bits bits bits ... Video or Audio Elementary Stream (ES) PES Packet Header Payload Packetized Elementary Stream (PES) Time stamps TS Packet (188 bytes) Header Payload MPEG 2 Transport Stream (TS) PID TS Header, contains PID and clock PES Header Rule: Every elementary stream gets its own (Packet ID) PID The MPEG Transport Stream 6
  7. 7. Processing of The Streams in The STB Tuner/ Demod MPEG2 Demux Video Decomp. Audio Decomp. System Memory Processor • 6 TV • 20 Radio • Service Information QAM OFDM A/D A/D MPEG2-TS : 40 Mbit/s, e.g..: 188 188 MPEG2-TS PID Header Payload DEMUX queues PID 1 PID 2 section section QPSK 7
  8. 8. 8 Digital Terrestrial TV - Layers . . . provide clean interface points. . . . Picture Layer Multiple Picture Formats and Frame Rates 1920 x 1080 1280 x 720 50,25, 24 Hz Transmission Layer 7 MHz COFDM / 8-VSBVHF/UHF TV Channel Video Compression Layer MPEG-2 compression syntax ML@MP or HL@MP Data Headers Motion Vectors Chroma and Luma DCT Coefficients Variable Length Codes Transport Layer MPEG-2 packets Video packet Video packetAudio packet Aux data Packet Headers Flexible delivery of data
  9. 9. 9 Digital Television Encode Layers Delivery System Bouquet Multiplexer Program 2 Program 3 Service Mux Other Data Control Data Program Association Table (PAT) Picture Coding Audio Coding Data Coding MPEG-2 or AC-3MPEG-2 Control Data Video Data Sound Modulator & Transmitter Error Protection Control Data 188 byte packetsMPEG Transport Data Stream Program 1 Multiplexer MPEG Transport Stream MuxControl Data Program Map Table (PMT) PES PES PES
  10. 10. 10 Digital Television Decode Layers Audio Decoder Data Decoder Picture Decoder MPEG or AC-3MPEG-2 Demodulator & Receiver Error Control Delivery System DataMon Speakers MPEG Transport Stream De-Multiplexer MPEG DeMux Transport Stream
  11. 11. 11 − MPEG-2 Container formats (a file format that can contain data compressed by standard codecs) • TS: Transport Stream (Multiplexed A/V PES and User Data) • PS: Program Stream − PES: Packetized Elementary Stream, Audio or Video − ES: Elementary Streams-Compressed Data Video Data Audio Data Elementary Streams Video Encoder Audio Encoder Packetizer Packetizer ES ES Video PES Program Stream MUX Transport Stream MUX Audio PES PS: Program Stream TS: Transport Stream MPEG-2 Video System Standard For noisier environments such as terrestrial broadcast channels For an error-free environment such as Digital Storage Media (DSM)
  12. 12. 12 MPEG-2 Packetized Elementary Stream (PES) MPEG-2 Video Video ES (Elementary Stream) I0 P3 B1 B2 P6 B3 B4 I9 B7 B8 P12 B B P B B I I0 B1 B2 P3 B4 B5 P6 B7 B8 Video Frames Frame Frame Frame Frame Frame Frame Frame Frame Frame MPEG-2 System Subband Samples Side Information Sync, System Info. and CRC Ancillary Data Field Audio ES (Elementary Stream) MPEG-2 System Audio Tracks frame frame frame frame frame frame frame frame frame frame frame frame MPEG-2 Audio Video PES (Packetized Elementary Stream) Audio PES (Packetized Elementary Stream)
  13. 13. Output from MPEG-2 System Encoder: 13 MPEG-2 Packetized Elementary Stream (PES) MPEG -2 System Processor Elementary Stream (ES): - Digital Control Stream - Digital Audio (compressed) - Digital Video (compressed) - Digital Data PES Packet has 6 bytes Protocol Header • 3 bytes start code • 1 byte stream ID – 110x xxxx audio stream number x xxxx – 1110 yyyy video stream number yyyy – 1111 0010 DSM-CC (Digital Storage Media) control packet • 2 bytes length field Packet Start Code Prefix 24 Stream ID 8 16 PES Header (optional) PES Packet PES Packet Length Packet Start Code Prefix 24 Stream ID 8 16 PES Header (optional) PES Packet PES Packet Length (up to 65536 bytes including 6 byte protocol header)
  14. 14. 14 PES Packet Syntax Diagram Packet Start Code Prefix 24 Stream ID 8 16 PES Header (optional) PES Packet Data Bytes PES Packet Length ’10’ PES Scrambling Control PES Priority Optional Fields 7 Flags Copyright PES Header Length Data Alignment Indicator Stuffing Bytes (0xFF) Original or Copy 2 2 1 81 11 DSM Trick Mode PTS DTS PES Extension Additional Copy Info ES Rate ESCR Previous PES CRC1 33 42 22 8 7 16 5 Flags Optional Fields Pack Header Field PES Private Data Program Packet Seq Counter P-STD Buffer PES Extension Length PES Extension Data 128 168 16 7 8 m * 8
  15. 15. Packetized Elementary Stream • The basic stream format for video, audio, data, .. • PES offers a mechanism to carry conditional access information • PES can be scrambled and also assigned priority • PES can carry time references: PTS and DTS • The largest data size within a PES packet is 64k Bytes. PES Indicators • PES_Priority - Indicates priority of the current PES packet. • PES_Scrambling_Control - Defines whether scrambling is used, and the chosen scrambling method. • Data_alignment_indicator - Indicates if the payload starts with a video or audio start code. • Copyright information - Indicates if the payload is copyright protected. • Original_or_copy - Indicates if this is the original ES 15 MPEG-2 Packetized Elementary Stream (PES)
  16. 16. PES Optional Field − Presentation Time Stamp (PTS) and possibly a Decode Time Stamp (DTS) • For audio / video streams these time stamps which may be used to synchronize a set of elementary streams and control the rate at which they are replayed by the receiver. − Elementary Stream Clock Reference (ESCR) − Elementary Stream Rate - Rate at which the ES was encoded. − Trick Mode - indicates the video/audio is not the normal ES, e.g. after DSM-CC has signaled a replay. − Copyright Information - set to 1 to indicated copyright ES. − CRC - this may be used to monitor errors in the previous PES packet − PES Extension Information - may be used to support MPEG-1 streams. 16 MPEG-2 Packetized Elementary Stream (PES)
  17. 17. It is the central structure used in both PS and TS Streams; results from packetizing continuous streams of compressed audio or video − PES packets contain 2 timestamps 1. Decoding Time Stamp (DTS) – this tells the decoder when the packet should be decoded. The data is then decoded into the bit stream. 2. Presentation Time Stamp (PTS) – this tells the decoder when the data should be displayed. − The systems part specifies that the decoder must contain a Systems Time Clock (STC: Systems Time Clock). • When a decoder’s STC is equal to a packet’s DTS the data in the packet is decoded • When the STC is equal to a packet’s PTS the decoded data is sent to the display device (eg. graphics card or sound card) • The state of the encoders clock is placed in the stream at regular intervals. This synchronises the decoder with the encoder. 17 MPEG-2 Packetized Elementary Stream (PES)
  18. 18. − Packetising the continuous streams of compressed video and audio bitstreams (elementary streams or ES) generates PES packets. − A typical method of transmitting elementary stream data from a video or audio encoder is to first create PES packets from the elementary stream data and then to encapsulate these PES packets inside Transport Stream (TS) packets or Program Stream (PS) packets. − The TS packets can then be multiplexed and transmitted using broadcasting techniques, such as those used in an ATSC and DVB. − Simply stringing together PES packets from the various encoders with other packets containing necessary data to generate a single bitstream generates a programme stream. − A transport stream consists of packets of fixed length containing 4 bytes of header followed by 184 bytes of data, where the data are obtained by segmenting the PES packets. 18 MPEG-2 Packetized Elementary Stream (PES)
  19. 19. 19 MPEG-2 Transport Stream (TS) Multiplexing Subsystem Multiplexer TransportAudio Compression Digital Modulation Error Correction Encoder Video Compression Video Ancillary data Audio Transmission Subsystem Control data Mixer Video Subsystem Audio Subsystem ES ES ES ES TS ES Paketizer ES Paketizer ES Paketizer PES PES PES PES
  20. 20. …bits bits bits ... Video or Audio Elementary Stream (ES) PES Packet Header Payload Packetized Elementary Stream (PES) Time stamps TS Packet (188 bytes) Header Payload MPEG 2 Transport Stream (TS) PID TS Header, contains PID and clock PES Header Rule: Every elementary stream gets its own (Packet ID) PID The MPEG Transport Stream 20
  21. 21. Program 1 Video 1 PES Program 2 video 2 PES Audio 1 PES Transport Stream 188 Bytes MPEG-2 Transport Stream (TS) Formation 21
  22. 22. MPEG-2 Transport Stream Packetizer Packetizer Packetizer Packetizer Packetizer Packetizer Packetizer Video Encoder Audio Encoder Video Encoder Audio Encoder Video Encoder Audio Encoder Packetizer Packetizer Program 1 Video_1 Audio_1 Data_1 Program 2 Program 3 Video_2 Audio_2 Data_2 Video_3 Audio_3 Data_3 TRANSPORT MUX TP1_1 TP2_1 TP1_2 TP2_2 TP3_1 TP1_3 TP2_3 TP3_2 Transport Stream TP3_3 TP1_1 TP1_2 TP1_3 TP2_1 TP2_2 TP2_3 TP3_1 TP3_2 TP3_3 Transport MuxTransport Mux Transport Mux MPEG-2 Transport Stream (TS) Formation 22
  23. 23. 23 MPEG-2 Transport Stream (TS) Packet Video Audio Teletext (DVB) SI Cond. Access IP Packets Private Data Applications App. Info Time Division Multiplexing (TDM) MPEG-2 packets can contain − Video, Audio, Teletext, Data streaming (13818-1) − DSM-CC (Digital Storage Media Command and Control): data carousel, object carousel, SI-tables, etc ) (13818-6) − DVB Data Piping 1 TS Packet (188 Bytes) Payload PES / Section / Piped Data (( 184-n) Byte ) Header with PID ( 3 byte ) Adaptation Field ( n byte ) Sync ( 1 byte ) TS Packets
  24. 24. It significantly differs from MPEG-1. • It offers robustness for noisy channels • It offer ability to assemble multiple programmes into a single stream. • It uses fixed-length packets of size 188 bytes with a new header syntax. • This packet can be segmented into four 47 bytes to be accommodated in the payload of four ATM cells, with the AAL1 adaptation scheme. • It is therefore more suitable for hardware processing and for error correction schemes, such as those required in television broadcasting, satellite/cable TV and ATM networks. 24 The MPEG Transport Stream
  25. 25. The transport stream uses a fixed packet length (188 bytes) • This allows easy decoder/encoder synchronisation. • It also allows error correction codes to be inserted. Transport Streams can contain packets from a number of Programs • These can be different TV channels or maybe an EPG. • Each program has a unique Packet ID placed in the packet header. • Decoder can discard packets of other programs by checking the PID. 25 The MPEG Transport Stream
  26. 26. − The multiple programmes with independent time bases can be multiplexed in one transport stream. − The transport stream also allows • Synchronous multiplexing of programmes • Fast access to the desired programme for channel hopping • Multiplexing of programmes with clocks unrelated to transport clock • Correct synchronization of elementary streams for playback. • Control of the decoder buffers during start-up and playback for both constant and variable bit rate (VBR) programmes. 26 The MPEG Transport Stream
  27. 27. Sync Byte PID 8 1 1 1 13 2 2 Continuity Counter PES 1 PES 2 PES N…….. Adaptation Field Transport Error Indicator Payload Unit Start Indicator Transport Priority Flags 51 1 1 Flag 8 3 Optional Fields Adaptation Field Length 8 Adaptation Field Extension Adaptation Field Control Scrambling Control PCR Original OCR (OPCR) Private Data Length 8 Private ….. Data …. Adaptation Field Extension Length Discontinuity Indicator Random Access Indicator PES Priority Indicator 188 bytes Sync 1 byte Header 3 bytes TS Payload 184 bytes MPEG-2 Transport Stream (TS) Packet 4 Splice Countdown PCR Fields TS Header 42 42 27
  28. 28. PID numbers for Program Specific Information (PSI) used for Service Information (SI) 0x0000 PAT Program Association Table 0x0001 CAT Conditional Access Table 0x0002 TSDT Transport Stream Description Table, EI DVB 0x0003-0x000F reserved 0x0010 NIT, ST Network Information Table, Stuffing Table 0x0011 SDT, BAT, ST Service Description Table, Bouquet Association Table, Stuffing Table 0x0012 EIT, ST Event Information Table, Stuffing Table 0x0013 RST, ST Running Status Table, Stuffing Table 0x0014 TDT, TOT, ST Time and Date Table, Time Offset Table, Stuffing Table 0x0015 Network synchronization 0x0016-0x001D reserved for future use (0x001E DIT Discontinuity Information Table (0x001F SIT Selection Information Table 188 bytes Sync 1 byte Header 3 bytes Optional Adaptation Field X bits Payload 184 bytes PID Packet Identifier 13 bits MPEG-2 Transport Stream (TS) Packet 28
  29. 29. PID − Indicates where the data goes • Allows filtering of packet for non viewed programs − Does not indicate PES/section or coding type − Reserved PID • Some PSI data • Program Assocation Table (PAT) • Conditional Acces Table (CAT) • Transport Stream Description Table (TSDT) • User-reserved: Other standard bodies (DVB, ATSC, …) PSI − Multiplex description − Program description − Stream Descirpion 29 Program ID (PID) and Program Service Information (PSI)
  30. 30. 30 Program Association Table (PAT) Program # 100 – PMT PID 1025 Program # 200 – PMT PID 1026 Program Map Table (PMT) Program # 100 Video PID – 501 – MPEG-2 Video Audio PID (English) – 502 – MPEG-2 Audio Audio PID (Spanish) – 503 – MPEG-2 Audio Program Map Table (PMT) Program # 200 Video PID – 601 – AVC Video Audio PID (English) – 602 – AAC Audio MPEG-2 Signaling Tables
  31. 31. 31 MPEG-2 Signaling Tables Network Information Bouquet Association Service Description Event Information Running Status Time & Date Stuffing
  32. 32. 32 MPEG-2 Signaling Tables
  33. 33. Program Association Table (PAT) • Identifies a multiplex (ID 16 bits) (The PAT is sent with the well-known PID value of 0x0000) • Lists all programs (Lists the PIDs of tables describing each program) ─ Program Number (16 bit) ─ PID carrying PMT • If PID= 0, NIT Program Map Table (PMT) • Defines the set of PIDs associated with a program, e.g. audio, video, ... • PID carrying the PCR ─ Not always a media stream ! • Program Descriptors ─ Protection systems, interactive apps … • Lists all streams ─ PID: where stream data is carried in the multiplex ─ streamType: type of media compression ─ Stream descriptors • Language, coding parameters, demux parameters, … 33 MPEG-2 Signaling Tables Program Associate Table (PAT) Program Map Table (PMT) Other PacketsAudio Packet Video Packet 51 51 51 6664 0 150 101
  34. 34. CAT - Conditional Access Table − Defines type of scrambling used and PID values of transport streams which contain the conditional access management and entitlement information (EMM). TSDT- Transport Stream Description Table − Contains descriptors relating to the overall transport stream 34 MPEG-2 Signaling Tables
  35. 35. NIT - Network Information Table − It contains details of the bearer network (network topology) used to transmit the MPEG multiplex, including the carrier frequency Service Description Table (SDT) − Multiplex Description (channel names, …) − Editorial description of the services in a TS − Service names and ancillary services Event Information Table (EIT) − Electronic Program Guide for present and following shows Time and Date Table (TDT) − Current date and time, UTC (used to synchronize STB system time) 35 MPEG-2 Signaling Table (DVB, Mandatory)
  36. 36. Bouquet Association Table (BAT) − Commercial operator description and services − Several commercial operators may sell the same services Running Status Table (RST) Stuffing Table (ST) Time Offset Table (TOT) − Local offset by region (used to synchronize STB system time) Application Information Table (AIT) − Interactive App signaling (MHP, HbbTV,…) − Type d’application IP/MAC Notification Table (INT) − IP Transport 36 MPEG-2 Signaling Tables (DVB, Optional)
  37. 37. − Scrambling may happen: • At PES payload level • At some sections payload level • At TS packet level − Most common use case − PES headers are scrambled − Exceptions • PAT: required to get list of programs • PMT: required to get protection system used • NIT/TSDT (Transport Stream System Target Decoder): infrastructure management 37 Scrambling in MPEG-2 TS
  38. 38. AV Synchronization − Want audio and video streams to be played back in sync with each other − Video stream contains “Presentation Time Stamps (PTS)” − MPEG-2 clock runs at 90 kHz • Good for both 25 and 30 fps − Each program carries a clock • Program Clock Reference (PCR) – PCR timestamps are sent with data by sender • PES Timestamps relate to this clock − Receiver uses PLL to synchronize clocks 38 MPEG-2 TS Timing
  39. 39. bit Byte 7 6 5 4 3 2 1 0 Program Clock Reference (PCR) base The intended time, in 90 kHz clock symbols, of the arrival at the input of the decoder of the fourth byte of this structure. (cont.) reserved PCR extension. Additional resolution, in 27 MHz clock. PCR = 300*base + ext PCROPCR Original PCR (OPCR) base It should not be modified by any multiplexer or decoder Used for recovery of single-program PCR from multi-program Transport Stream (cont.) reserved Original PCR extension MPEG-2 Transport Stream (TS) Packet PCR Original PCR (OPCR) PCR Fields 42 42 PCR (Program Clock Reference) 39
  40. 40. Program Associate Table (PAT) Program Map Table (PMT) Other PacketsAudio Packet Video Packet Packet header includes a unique Packet ID (PID) for each stream PAT lists PIDs for program map tables Network Info=10 Prog 1 = 150 Prog 2 = 301 Prog 3 = 511 etc. Program guides Subtitles Multimedia data Internet Packets etc. PMT lists PID associated with a particular program Video = 51 Audio (English) = 64 Audio (French) = 66 Subtitle = 101 etc. 51 51 51 6664 0 150 101 MPEG-2 Signaling Tables 40
  41. 41. MPEG-2 Example Transport Stream Packet 41 Example Transport Stream Packet 188 Bytes Header Flags • Transport Error Indicator • Payload Unit Start Indicator • Transport Priority • Transport Scrambling Control Important PIDs • 0x0000 – PAT PID • 0x1FFF – “Null PID” gives space for VBR Continuity Counter (CC) • 4-bit per-PID sequence # • Helps detect packet loss Adaptation Field (optional) • Can carry range of other info • PCR, splice point flags • Transport of private data Example Transport Stream 0x47 (sync) Flags PID (Payload ID) More Flags CC Adaptation Field Data Payload PID 0 CC 3 PAT Data PID 601 CC 11 PID 602 CC 7 PID 0x1FFF NULL PID 601 CC 12 PID 602 CC 8
  42. 42. MPEG-2/DVB PID Allocation − Program Association Table (PAT) • always has PID = 0 (zero) − Conditional Access Table (CAT) • always has PID = 1 − Event Information Table (EIT) • always has PID = 18 (0x0012) − Program Map Tables (PMTs) • have the PIDs specified in the PAT − The audio, video, PCR, subtitle, teletext etc PIDs for all programs are specified in their respective PMTs MPEG-2/DVB PID Allocation Table PID value PAT 0x0000 CAT 0x0001 TSDT 0x0002 Reserved 0x0003 – 0x000F NIT, ST 0x0010 SDT, BAT, ST 0x0011 EIT, ST 0x0012 RST, ST 0x0013 TDT, TOT, ST 0x0014 Network Synchronization 0x0015 Reserved 0x0016 – 0x001B Inband signaling 0x001C measurement 0x001D DIT 0x001E SIT 0x001F 42
  43. 43. Increase resilience to transmissions errors − Redundancy − Reed Solomon 255/191, 25% redundant − Each RS column is send in a section − FEC aggregation is in another table • Can be ignored • Does not interfere with MPE Without modifying existing implementations − No modification on MPE (MPEG Movie File) sections • Each MPE+IP on a section • Aggregation of IP datagrams in memory 43 DVB MPE-FEC
  44. 44. 44 Data over DVB − Data piping • raw transport on a PID − Data streaming • send in PES packets − DSM-CC Data carrousel • Transport on sections − Object Carrousel • Data Carousel + file system − Multi Protocol Encapsulation (MPE) • IP datagram over TS Application
  45. 45. Program 0 PID=16 Program 1 PID=22 Program 2 PID=33 … … Program M PID=55 PMT (Program Map Table) for Program 1 CAT (Conditional Access Table) (PID=1) NAT (Network Information Table) (always Program 0, PID=16) NIT is considered a Private data by ISO Table section ID assigned by systemTable section ID always set to 0x01 Table section ID always set to 0x02 Table section ID always set to 0x00 Stream 1 PCR 31 Stream 2 Video 1 54 Stream 3 Audio 1 48 Stream 3 Audio 2 49 … … … Stream k Data K 66 PAT (Program Associate Table) (PID=0) CA Section 1 (Program 1) EMM PID(99) CA Section 2 (Program 2) EMM PID(109) CA Section 3 (Program 3) EMM PID(119) … … CA Section k (Program k) EMM PID(x) Private Section 1 NIT Info. Private Section 2 NIT Info. Private Section 3 NIT Info. … … Private Section k NIT Info. 0 PAT 22 Prog 1. PMT 33 Prog 2. PMT 99 Prog 1 EMM 31 Prog 1 PCR 48 Prog 1 Audio 1 54 Prog 1 Video 1 109 Prog 2 EMM Multiple-Program MPEG-2 Transport Stream: PMT (Program Map Table) for Program 2 Stream 1 PCR 41 Stream 2 Video 1 19 Stream 3 Audio 1 81 Stream 3 Audio 2 82 … … … Stream k Data K 88 MPEG-2 / DVB PSI (Program Specific Information) Structure
  46. 46. 46 Transport Multiplexing & Decoding Transport Stream Demultiplex and Decoder Clock Control Video Decoder Channel Specific Decoder Audio Decoder Decoded Video Decoded Audio Transport stream containing one or multiple programs Transport Stream Demultiplex and Decoder Channel Specific Decoder Transport Stream with single program Program Stream ≠ Transport Stream Channel Channel
  47. 47. 47 Transport Stream Decoder Multiplex Buffer Video Decoder Transport Buffer Re-order Buffer Decoded Video Decoded Audio ES Stream Buffer Multiplex Buffer Transport Buffer ES Stream Buffer Multiplex Buffer Transport Buffer ES Stream Buffer Audio Decoder System Info. Decoder System Control Transport Stream Decoder
  48. 48. − At the receiver, the transport streams are decoded by a transport demultiplexer (which includes a clock extraction mechanism), unpacketised by a depacketiser and sent to audio and video decoders for decoding. − The decoded signals are sent to the receiver buffer and presentation unit, which outputs them to a display device and a speaker at the appropriate time. − Similarly, if the programme streams are used, they are decoded by the programme stream demultiplexer and depacketiser and sent to the audio and video decoders. − The decoded signals are sent to the respective buffer to await presentation. − Also similar to MPEG-1 systems, the information about systems timing is carried by the clock reference field in the bitstream that is used to synchronise the decoder Systems Time Clock (STC). − Presentation Time Stamps (PTS), which are also carried by the bitstream, control the presentation of the decoded output. 48 Transport Stream Decoder
  49. 49. − For a payload of around 19 Mb/s • 1 HDTV service - sport & high action • 2 HDTV services - both film material • 1 HDTV + 1 or 2 SDTV non action/sport • 3 SDTV for high action & sport video • 6 SDTV for film, news & soap operas • However you do not get more for nothing. − More services means less quality 49 Examples of DVB Data Containers Single HDTV program HDTV 1 SDTV 1 SDTV 2 SDTV 3 SDTV 4 SDTV 5 Multiple SDTV programs SDTV 1 HDTV 1 Simulcast HDTV & SDTV Channel bandwidth can be used in different ways
  50. 50. 50 − MPEG-2 Container formats (a file format that can contain data compressed by standard codecs) • TS: Transport Stream (Multiplexed A/V PES and User Data) • PS: Program Stream − PES: Packetized Elementary Stream, Audio or Video − ES: Elementary Streams-Compressed Data Video Data Audio Data Elementary Streams Video Encoder Audio Encoder Packetizer Packetizer ES ES Video PES Program Stream MUX Transport Stream MUX Audio PES PS: Program Stream TS: Transport Stream MPEG-2 Video System Standard For noisier environments such as terrestrial broadcast channels For an error-free environment such as Digital Storage Media (DSM)
  51. 51. 51 Program Stream Structure (Simplified)
  52. 52. Program Stream (PS) − It is similar to the MPEG-1 systems stream but uses a modified syntax and new functions to support advanced functionalities (e.g. scalability). − It provides compatibility with the MPEG-1 systems (MPEG-2 should be capable of decoding an MPEG-1 bitstream. − Like the MPEG-1 decoder, programme stream decoders typically employ long- and variable-length packets. Such packets are well suited for software-based processing and error free transmission environments ( such as storage, disk). − The packet sizes are usually 1–2 kbytes long, chosen to match the disc sector sizes (typically 2 kbytes). − However, packet sizes as long as 64 kbytes are also supported. 52 MPEG-2 Systems
  53. 53. 53 MPEG-2 Systems Program Stream (PS) − It includes features not supported by MPEG-1 systems. • Scrambling of data • Assignment of different priorities to packets • Information to assist alignment of elementary stream packets • Indication of copyright • Indication of fast forward, fast reverse and other trick modes for storage devices. • An optional field in the packets is provided for testing the network performance • Optional numbering of a sequence of packets is used to detect lost packets.
  54. 54. 54 Video Source Decompress (Decode) Compress (Encode) Video Display Coded video ENCODER + DECODER = CODEC
  55. 55. − H.263 standardization effort started Nov 1993 (finalization:1995) − The primary goal in the H.263 standard codec was coding of video at low or very low bit rates (less than 64 kbps) for applications such as mobile networks, public switched telephone network (PSTN) and the narrowband Integrated Services Digital Network (ISDN). − Later on, the codec was found so attractive that higher resolution pictures could also be coded at relatively low bit rates. − The standard recommends operation on five standard pictures of the CIF family, known as sub- QCIF, QCIF, CIF, 4CIF and 16CIF. − The H.263+ (H.263 Ver. 2) was the first set of extensions to this family, which was intended for near-term standardisation of enhancements of H.263 video coding algorithms for real-time telecommunications. − Work on improving the encoding performance was an ongoing process under H.263++ (H.263 Ver. 3), and every now and then a new extension called annex was added to the family. 55 H.263, H.263+ and H.263++ Standard
  56. 56. − The codec for long-term standardisation was called H.26L. − The H.26L project had the mandate from ITU-T to develop a very low bit rate (less than 64 kbit/s with emphasis on less than 24 kbit/s) video coding recommendation achieving • Better Video Quality • Lower Delay • Lower Complexity • Better Error Resilience − In 2001, MPEG-4 committee joined the project in investigating new video coding techniques and technologies as candidates for recommendation. − The joint team eventually recommended the Joint Video Team (JVT) Codec which is informally known as Advanced Video Coding (AVC). − The standard is formally known as H.264 by the ITU-T and MPEG-4 part 10 by ISO/IEC. 56 H.26L Standard
  57. 57. − H.263 is a combination of H.261 and MPEG − H.261 only accepts QCIF and CIF format → Various picture formats such as sub-QCIF,4CIF, etc. − No 1/2 pel motion estimation in H.261, instead it uses a spatial loop filter → Half-pel motion compensation − H.261 does not use median predictors for motion vectors but simply uses the motion vector in the MB to the left as predictor. − In H.263 there are four negotiable options − H.261 does not use a 3-D VLC for transform coefficient coding → 3D VLC for transform coefficients − GOB headers are mandatory in H.261. − Quantizer changes at MB granularity requires 5 bits in H.261 and only 2 bits in H.263. − No loop filter in H.263 − No macroblock addressing in H.263 (include in MB header) 57 H.263 Improvements over H.261
  58. 58. Unrestricted Motion Vector Mode (Annex D) –MVs are allowed to point outside (outside pixels obtained from boundary repetition extension) –Larger ranges: [-31.5, 31.5] instead of [-16, 15.5] Syntax-Based Arithmetic Coding Mode (Annex E) –Provide about 5% bit rate reduction and rarely used Advanced Prediction Mode (Annex F) –Allow 4 motion vectors per MB, one for each 8x8 block –Overlapped block motion compensation for luminance –Allow MVs point outside of picture (Motion vectors can now point to outside of picture). –Reduce blocking artifacts and increase subjective picture quality. PB-Frames Mode (Annex G) –Double the frame rate without significant increase in bit rate Usage: – The decoder signals the encoder which of the options it has the capability to decode. – If the encoder supports some of these options, it may enable them. 58 Negotiable Options in H.263
  59. 59. H.261 H.263 Demo: QCIF, 8 fps @ 28 Kb/s 59
  60. 60. Composed of a baseline plus four negotiable options 60 ITU-T Recommendation H.263 Baseline Codec Unrestricted/Extended Motion Vector Mode Advanced Prediction Mode PB Frames Mode Syntax-based Arithmetic Coding Mode
  61. 61. Always 12:11 pixel aspect ratio. 61 Frame Formats Format Y U,V SQCIF 128x96 64x48 QCIF 176x144 88x72 CIF 352x288 176x144 4CIF 704x576 352x288 16CIF 1408x1152 704x576 352 288 Pixel 12:11 Picture 4:3
  62. 62. Picture & Macroblock Types − Two picture types: • Intra (I-frame) implies no temporal prediction is performed. • Inter (P-frame) may employ temporal prediction. − Macroblock (MB) types:  Intra & Inter MB types (even in P-frames). • Inter MBs have shorter symbols in P frames • Intra MBs have shorter symbols in I frames  Not coded MB types- MB data is copied from previous decoded frame. 62 H.263 Baseline
  63. 63. Motion Vectors − Motion vectors have 1/2 pixel granularity. − Reference frames must be interpolated by two. − MV’s are not coded directly, but rather a median predictor is used. − The predictor residual is then coded using a VLC table. 63 H.263 Baseline X CB A CBAX MVMVMVMVMV ,,median
  64. 64. Motion Vector Delta (MVD) Symbol Lengths 64 H.263 Baseline 0 2 4 6 8 10 12 14 0 0.5 1 1.5 2 2.5 - 3.5 4.0 - 5.0 5.5 - 12.0 12.5- 15.5 MVD Absolute Value Codelengthinbits
  65. 65. Transform Coefficient Coding − Assign a variable length code according to three parameters (3-D VLC): 1) Length of the run of zeros preceding the current nonzero coefficient. 2) Amplitude of the current coefficient. 3) Indication of whether current coefficient is the last one in the block. − The most common are variable length coded (3-13 bits), the rest are coded with escape sequences (22 bits) 65 H.263 Baseline
  66. 66. Quantization − H.263 uses a scalar quantizer with center clipping. − Quantizer varies from 2 to 62, by 2’s. − Can be varied ±1, ±2 at macroblock boundaries (2 bits) − Can be varied 2-62 at row and picture boundaries (5 bits). 66 H.263 Baseline Q -Q 2Q -2Q IN OUT
  67. 67. Bit Stream Syntax 67 H.263 Baseline Hierarchy of three layers. Picture Layer GOB* Layer MB Layer *A GOB is usually a row of macroblocks, except for frame sizes greater than CIF. Picture Hdr GOB Hdr MB MB ... GOB Hdr ...
  68. 68. Picture Layer Concepts − PSC - sequence of bits that can not be emulated anywhere else in the bit stream. − TR - 29.97 Hz counter indicating time reference for a picture. − PType - Denotes Intra, Inter-coded, etc. − P-Quant - Indicates which quantizer (2…62) is used initially for the picture. 68 H.263 Baseline Picture Start Code Temporal Reference Picture Type Picture Quant
  69. 69. GOB Layer Concepts, GOB Headers are Optional − GSC - Another unique start code (17 bits). − GOB Number - Indicates which GOB, counting vertically from the top (5 bits). − GOB Quant - Indicates which quantizer (2…62) is used for this GOB (5 bits). GOB can be decoded independently from the rest of the frame 69 H.263 Baseline GOB Start Code GOB Number GOB Quant
  70. 70. Macroblock Layer Concepts − COD - if set, indicates empty Inter MB. − MB Type - indicates Inter, Intra, whether MV is present, etc. − CBP - indicates which blocks, if any, are empty. − DQuant - indicates a quantizer change by +/- 2, 4. − MV Deltas - are the MV prediction residuals. − Transform coefficients - are the 3-D VLC’s for the coefficients. 70 H.263 Baseline Coded Flag MB Type Code Block Pattern MV Deltas Transform Coefficients DQuant 8x8 pixel blocks macroblock Y Cb Cr
  71. 71. Deblocking Filter 71 H.263 Options No Filter Deblocking Loop Filter
  72. 72. Unrestricted/Extended Motion Vector Mode (UMV Mode) 1. Motion Vectors Over Picture Boundaries − UMV dramatically improves motion estimation when moving objects are entering/exiting the frame or moving around the frame border) − Motion vectors are permitted to point outside the picture boundaries – Non-existent pixels are created by replicating the edge pixels (When a pixel referred to by motion vector points to outside of coded area, last full pixel inside the coded picture area is used). – Motion vector restricted such that no pixel of 16x16 (or 8x8) block shall have horizontal or vertical distance more than 15 pixels outside of picture. – Improves compression when there is movement across the edge of a picture boundary or when there is camera panning. 72 H.263 Options Target Frame NReference Frame N-1 Edge pixels are repeated.
  73. 73. Unrestricted/Extended Motion Vector Mode 2. Extended MV Range − To extend the range of the motion vectors from [-16,15.5] to [-31.5,31.5] with some restrictions. − This better addresses high motion scenes. 73 H.263 Options 15.5 15.5 -16 -16 -16 -16 15.5 15.5 (31.5,31.5) Base motion vector range. Extended motion vector range, [-16,15.5] around MV predictor.
  74. 74. Advanced Prediction Mode − The motion compensation in the core H.263 is based on one motion vector per macroblock of 16×16 pixels, with half-pixel precision. − The macroblock motion vector is then differentially coded with predictions taken from three surrounding macroblocks, as indicated in Figure. 74 H.263 Options MV: Current Motion Vector MV1: Previous Motion Vector MV2: Above Motion Vector MV3: Above Right Motion Vector MV2 MV3 MV1 MV
  75. 75. Advanced Prediction Mode − The predictors are calculated separately for the horizontal and vertical components of the motion vectors, MV1, MV2 and MV3. − For each component, the predictor is the median* value of the three candidate predictors for this component: − The difference between the components of the current motion vector and their predictions is variable length coded. The vector differences are defined by 75 H.263 Options
  76. 76. Advanced Prediction Mode − In the special cases, at the borders of the current group of blocks (GOB) or picture, the following decision rules are applied in order: • The candidate predictor MV1 is set to zero if the corresponding macroblock is outside the picture at the left side . • The candidate predictors MV2 and MV3 are set to MV1 if the corresponding macroblocks are outside the picture at the top, or if the GOB header of the current GOB is nonempty. • The candidate predictor MV3 is set to zero if the corresponding macroblock is outside the picture at the right side. • When the corresponding macroblock is intra coded or was not coded, the candidate predictor is set to zero. − Like unrestricted motion vector mode, motion vectors can refer to the area outside the picture 76 H.263 Options MV: Current Motion Vector MV1: Previous Motion Vector MV2: Above Motion Vector MV3: Above Right Motion Vector MV2 MV3 MV1 MV Picture or GOB border MV2 MV3 (0,0) MV MV1 MV1 MV1 MV MV2 (0,0) MV1 MV
  77. 77. Advanced Prediction Mode − Includes motion vectors across picture boundaries from the previous mode. − Option of using four motion vectors for 8x8 blocks instead of one motion vector for 16x16 blocks as in baseline. • In H.263, one motion vector per macroblock is used except in the advanced prediction mode, where either one (four vectors with the same value) or four motion vectors per macroblock are employed. • When there are four motion vectors, the information for the first motion vector is transmitted as the code word motion vector data (MVD), and the information for the three additional vectors in the macroblock is transmitted as the code word MVD2–4. − Overlapped motion compensation to reduce blocking artifacts. 77 H.263 Options
  78. 78. Four motion vectors for 8x8 blocks instead of one motion vector for 16x16 blocks. − The vectors are obtained by adding predictors to the vector differences indicated by MVD and MVD2–4, as was the case when only one motion vector per macroblock was present. − The predictors are calculated separately for the horizontal and vertical components. − However, the candidate predictors MV1, MV2 and MV3 are redefined as indicated in Figure. − The neighbouring 8×8 blocks that form the candidates for the prediction of the motion vector MV take different forms depending on the position of the block in the macroblock. 78 H.263 Options • Redefinition of the candidate predictors MV1, MV2 and MV3 for each luminance block in a macroblock. • Motion vector prediction for 8x8 blocks used three surrounding block motion vectors MV2 MV1 MV3 MV MV2 MV1 MV3 MV MV2 MV1 MV3 MV MV2 MV1 MV3 MV
  79. 79. Overlapped Motion Compensation (OBMC) − In normal motion compensation, the current block is composed of • The predicted block from the previous frame (referenced by the motion vectors) • The residual data transmitted in the bit stream for the current block. − Overlapped motion compensation is only used for the 8×8 luminance blocks. − Each pixel in an 8×8 luminance prediction block is the weighted sum of three prediction values, divided by 8 (with rounding). 79 H.263 Options Reference frame Current MB
  80. 80. Overlapped Motion Compensation (OBMC) − To obtain the prediction values, three motion vectors are used. They are the motion vector of the current luminance block and two out of four remote vectors, as follows: • the motion vector of the block at the left or right side of the current luminance block; • the motion vector of the block above or below the current luminance block. 80 H.263 Options
  81. 81. Overlapped Motion Compensation (OBMC) − Let (m, n) be the column & row indices of an 88 pixel block in a frame. − Let (i, j) be the column & row indices of a pixel within an 88 block. − Let (x, y) be the column & row indices of a pixel within the entire frame (𝒙, 𝒚) = (𝒎𝟖 + 𝒊, 𝒏𝟖 + 𝒋) 81 H.263 Options B 88 pixel block n, block column number m, block row number y, pixel column number x, pixel row number j, pixel column number i, pixel row number
  82. 82. Overlapped Motion Compensation (OBMC) • Let (MV0 x,MV0 y) denote the motion vectors for the current block. • Let (MV1 x,MV1 y) denote the motion vectors for the block above (below) if the current pixel is in the top (bottom) half of the current block. • Let (MV2 x,MV2 y) denote the motion vectors for the block to the left (right) if the current pixel is in the left (right) half of the current block. 82 H.263 Options MV0 MV1 MV1 MV2 MV2Current Block Right Block Below Block
  83. 83. Overlapped Motion Compensation (OBMC) • The creation of each interpolated (overlapped) pixel, p(i, j), in an 8×8 reference luminance block is governed by 𝑷(𝒙, 𝒚) = (𝒒(𝒙, 𝒚) 𝑯 𝟎(𝒊, 𝒋) + 𝒓(𝒙, 𝒚) 𝑯 𝟏(𝒊, 𝒋) + 𝒔(𝒙, 𝒚) 𝑯 𝟐(𝒊, 𝒋) + 𝟒)/𝟖 − Where, 𝒒 𝒙, 𝒚 = 𝒙 + 𝑴𝑽 𝟎 𝒙, 𝒚 + 𝑴𝑽 𝟎 𝒚 𝒓 𝒙, 𝒚 = 𝒙 + 𝑴𝑽 𝟏 𝒙, 𝒚 + 𝑴𝑽 𝟏 𝒚 𝒔(𝒙, 𝒚) = (𝒙 + 𝑴𝑽 𝟐 𝒙, 𝒚 + 𝑴𝑽 𝟐 𝒚) 83 H.263 Options 4 5 5 5 5 5 5 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 5 5 5 5 6 6 6 6 5 5 5 5 6 6 6 6 5 5 5 5 6 6 6 6 5 5 5 5 5 5 5 5 5 5 4 5 5 5 5 5 5 4 1 2 2 2 2 2 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 2 2 2 2 2 2 1 𝑯 𝟐 𝒊, 𝒋 = (𝑯 𝟏 𝒊, 𝒋 ) 𝑻 𝑯 𝟎 𝒊, 𝒋 = 𝑯 𝟏 𝒊, 𝒋 =
  84. 84. 84 H0(i, j) Weighting values for prediction with motion vector of current block H2(i, j) Weighting values for prediction with motion vectors of luminance blocks to the left or right of current luminance block Right of the current block Left of the current block H1(i, j) Weighting values for prediction with motion vectors of the luminance blocks on top or bottom of the current luminance block Bottom of the current block Top of the current block Overlapped Motion Compensation (OBMC) The neighbouring pixels closer to the pixels in the current block take greater weights. H.263 Options
  85. 85. PB Frames Mode − A PB frame consists of two P- and B-pictures coded as one unit (coded together) (a P frame as in baseline, and a B frame) − The P-picture is predicted from the last decoded P-picture, and the B-picture is predicted from both the last decoded P-picture and the P-picture currently being decoded (The prediction process is illustrated in Figure). − Can increase frame rate 2X with only about 30% increase in bit rate (because of B-frame). − Since in the PB frames mode a unit of coding is a combined macroblock from P- and B-pictures, the composite macroblock comprises 12 blocks. − First the data for the six P-blocks are transmitted as the default H.263 mode, and then the data for the six B-blocks. − The composite macroblock may have various combinations of coding status for the P- and B-blocks, which are dictated by the MCBPC. 85 H.263 Options
  86. 86. Best match Forward Motion Vector Macroblock to be coded Previous reference picture Current B-picture Future reference picture Best match Backward Motion Vector 86 Forward Motion Vector and Backward Motion Vector, Recall Forward Prediction Backward Prediction
  87. 87. PB Frames Mode 87 H.263 Options Restriction: the backward predictor cannot extend outside the current MB position of the future frame. Picture 1 P Frame (decoded P-picture) Picture 2 B Frame Picture 3 P Frame (current P-picture) V 1/2 -V 1/2 PB Forward Motion Vector Backward Motion Vector Forward Prediction Backward Prediction Forward Prediction
  88. 88. P B P PB frame TRB TRD 𝑀𝑉 𝐹 = 𝑇𝑅 𝐵𝑀𝑉 𝑇𝑅 𝐷 + 𝑀𝑉𝐷 𝑀𝑉 𝐵 = 𝑇𝑅 𝐵 − 𝑇𝑅𝐷 𝑀𝑉 𝑇𝑅 𝐷 𝑖𝑓 𝑀𝑉𝐷 𝑖𝑠 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0 𝑀𝑉 𝐵 = 𝑀𝑉𝐹 − 𝑀𝑉 𝑖𝑓 𝑀𝑉𝐷 𝑖𝑠 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0 H.263 Options PB Frames Mode P-picture is predicted from the previous decoded P-picture B-picture is predicted both from the previous decoded P-picture and the P picture currently being decoded. 𝑀𝑉 𝐹 𝑀𝑉 𝐵 𝑀𝑉 Assume: 𝑀𝑉 𝐷 is the delta vector component given by the motion vector data of a B-picture (MVDB) and corresponds to the vector component MV. Forward and Bi-directional Prediction in B-block Part of the block that is predicted bidirectionally part that uses only forward prediction FWD: Forward prediction BID: Bidirectional prediction P-Macroblock BID FWD B-lock 88
  89. 89. Improved PB frames (BPB) − This mode is an improved version of the optional PB frames mode of H.263 [22-M]. − Most parts of this mode are similar to the PB frames mode, the main difference being that in the improved PB frames mode, the B part of the composite PB-macroblock, known as BPB-macroblock, may have a separate motion vector for forward and backward prediction. − This is in addition to the bidirectional prediction mode that is also used in the normal PB frames mode. − Hence, there are three different ways of coding a BPB-macroblock, and the coding type is signalled by the MVDB parameter. • Bidirectional prediction • Forward prediction • Backward prediction 89 H.263 Options Picture 1 P Frame (decoded P-picture) Picture 2 B Frame Picture 3 P Frame (current P-picture) V 1/2 -V 1/2 PB Forward Motion Vector Backward Motion Vector Forward Prediction Backward Prediction Forward Prediction
  90. 90. Syntax-based Arithmetic Coding Mode − In encoding, a symbol is encoded by a specific array of integers (model) based on syntax and by calling a encode_a_symbol (index, cumul_freq). − A FIFO buffers the bits from arithmetic encoder. − In decoding, a symbol is decoded by a specific model based on syntax and by calling decode_a_symbol (cumul_freq). − Syntax of top 3 layers: Picture, Group-of-Blocks and Macroblock remains the same, but that of block is modified. 90 H.263 Options
  91. 91. Syntax-based Arithmetic Coding Mode − In this mode, all the variable length coding and decoding of baseline H.263 is replaced with arithmetic coding/decoding. − This removes the restriction that each sumbol must be represented by an integer number of bits, thus improving compression efficiency. − Experiments indicate that compression can be improved by up to 10% over variable length coding/decoding. − Complexity of arithmetic coding is higher than variable length coding, however. 91 H.263 Options
  92. 92. 92 Video Source Decompress (Decode) Compress (Encode) Video Display Coded video ENCODER + DECODER = CODEC
  93. 93. Enhance H.263 with additional options (Draft 20, Sept. ’97) Coding efficiency • Advanced intra coding mode • Deblocking filter mode • Improved PB-frames mode • Reference picture resampling mode • Alternative inter VLC mode • Modified quantization mode Error robustness • Slice structured mode • Referenced picture selection mode • Independently segmented decoding mode Enhanced Communication • Temporal, SNR, and spatial scalability mode • Reduced-resolution update mode 93 H.263 Ver. 2 (H.263+)
  94. 94. − H.263+ was standardized in January, 1998. − The expected enhancements of H.263+ over H.263 fall into two basic categories: • enhancing quality within existing applications; • broadening the current range of applications. − Adds negotiable options and features while still retaining a backwards compatibility mode. − A few examples of the enhancements are as follows: • improving perceptual compression efficiency; • reducing video coding delay; • providing greater resilience to bit errors and data losses. 94 H.263 Ver. 2 (H.263+)
  95. 95. 95 H.263 Ver. 2 (H.263+)
  96. 96. Annex I: Advanced Intra Coding mode Annex J: Deblocking Filter mode Annex K: Slice Structured mode Annex L: Supplemental Enhancement Information Specification Annex M: improved PB Frame mode Annex N: Reference Picture Selection mode Annex O: Temporal, SNR, and Spatial Scalability mode Annex P: Reference Picture Resampling Annex Q: Independent Segment Decoding mode Annex R: Independent Segment Decoding mode Annex S: Alternative Inter VLC mode Annex T: Modified Quantization mode 96 H.263+ (v2) Optional Tools
  97. 97. − In addition to the multiples of CIF, H.263+ permits • any frame size from 4x4 to 2048x1152 pixels in increments of 4. − Besides the 12:11 pixel aspect ratio (PAR), H.263+ supports • Square (1:1) • 525-line 4:3 picture (10:11) • CIF for 16:9 picture (16:11) • 525-line for 16:9 picture (40:33) • and other arbitrary ratios − In addition to picture clock frequencies of 29.97 Hz (NTSC), H.263+ supports • 25 Hz (PAL) • 30 Hz • and other arbitrary frequencies 97 Arbitrary Frame Size, Pixel Aspect Ratio, Clock Frequency
  98. 98. 98 Level 1 Level 2 Level 3 Advanced INTRA Coding Yes Yes Yes Deblocking Filter Yes Yes Yes Supplemental Enhancement Information (Full-Frame Freeze Only) Yes Yes Yes Modified Quantization Yes Yes Yes Unrestricted Motion Vectors No Yes Yes Slice Structured Mode No Yes Yes Reference Picture Resampling (Implicit Factor-of-4 Mode Only) No Yes Yes Advanced Prediction No No Yes Improved PB-frames No No Yes Independent Segment Decoding No No Yes Alternate INTER VLC No No Yes H.263v2 specified a set of recommended modes in an informative appendix (Appendix II, since deprecated) The prior informative Appendix II (recommended optional enhancement) was obsoleted by the creation of the normative Annex X. H.263 Ver. 2 (H.263+)
  99. 99. − In this mode, either the DC coefficient, 1st column, or 1st row of coefficients are predicted from neighboring blocks (Dc only, Vertical DC & AC, Horizontal DC &AC) − Prediction is determined on a MB-by-MB basis. − Essentially DPCM of Intra DCT coefficients. − Can save up to 40% of the bits on Intra frames. − A separate VLC table for intra DCT − Modified quantization for intra coefficients − Spatial prediction of DCT coefficients 99 Advanced Intra Coding Mode Three neighboring blocks to the DCT domain u 0 1 2 3 4 5 6 7 Block A Rec A(u, v) Block C Rec C(u, v) v 0 1 2 3 4 5 6 7 Block B Rec B(u, v) Index Prediction mode Code 0 0 (DC Only) 0 1 1 (Vertical DC & AC) 10 2 2 (Horizontal DC & AC) 11
  100. 100. − At very low bit rates, the block of pixels is mainly made of low-frequency DCT coefficients. − In these areas, when there is a significant difference between the DC levels of the adjacent blocks, they appear as block borders. − The overlapped block matching motion compensation to some extent reduces these blocking artefacts. − For further reduction in the blockiness, the H.263 specification recommends deblocking of the picture through the block edge filter. − The Deblocking Filter mode improves subjective quality by removing blocking and mosquito artifacts common to block-based video coding at low bit rates. 100 Deblocking Filter Mode
  101. 101. − Deblocking Filter Mode introduces a deblocking filter inside the coding loop. − Unlike in post-filtering, predicted pictures are computed based on filtered versions of the previous ones. − Like the Advanced Prediction mode of H.263, the Deblocking Filter mode involves using four motion vectors per macroblock. − The filtering is performed on 8×8 block edges and assumes that 8×8 DCT is used and the motion vectors may have either 8×8 or 16×16 resolution. − Filtering is equally applied to both luminance and chrominance data. − No filtering is permitted on the frame and slice edges. 101 Deblocking Filter Mode
  102. 102. − Consider four pixels A, B, C and D on a line (horizontal or vertical) of the reconstructed picture, where A and B belong to block 1 and C and D belong to a neighbouring block 2, which is either to the right of or below block 1. − It Filters pixels along block boundaries while preserving edges in the image content. − Filter is in the coding loop which means, it filters the decoded reference frame used for motion compensation. − It can be used in conjunction with a post-filter to further reduce coding artifacts. 102 Deblocking Filter Mode A B C D A B C D block2block1 block1 Blockboundary Example for filtered pixels on a horizontal Block edge (Filtered pixels on a vertical block edge) Example for filtered pixels on a vertical Block edge (Filtered pixels on a horizontal block edge) Block boundary
  103. 103. Deblocking Filter − To turn the filter on for a particular edge, either block 1 or block 2 should be an intra or a coded macroblock with the code COD =0. − A, B, C and D are replaced by new values, A1, B1, C1, and D1 based on a set of non-linear equations. − The strength of the filter is proportional to the quantization strength. − The sign of d1 is the same as the sign of d. 103 H.263 Options A B C D A B C D block2block1 block1
  104. 104. Deblocking Filter − Figure shows how the value of d1 changes with d and the quantiser parameter QP, to make sure that only block edges which may suffer from blocking artefacts are filtered and not the natural edges. − As a result of this modification, only the pixels on the edge are filtered so that their luminance changes are less than the quantisation parameter, QP. 104 H.263 Options d1 as a function of d
  105. 105. − To turn the filter on for a particular edge, either block 1 or block 2 should be an intra or a coded macroblock with the code COD =0. − A, B, C and D are replaced by new values, A1, B1, C1, and D1 based on a set of non-linear equations. − The strength of the filter is proportional to the quantization strength. 𝑩𝟏 = 𝒄𝒍𝒊𝒑 (𝑩 + 𝒅𝟏) 𝑪𝟏 = 𝒄𝒍𝒊𝒑 (𝑪 − 𝒅𝟏) 𝑨𝟏 = 𝑨 − 𝒅𝟐 𝑫𝟏 = 𝑫 + 𝒅𝟐 𝒅𝟏 = 𝑭𝒊𝒍𝒕𝒆𝒓 [ 𝑨 − 𝟒𝑩 + 𝟒𝑪 − 𝑫 𝟖 , 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉 𝑸𝑼𝑨𝑵𝑻 ] 𝒅𝟐 = 𝒄𝒍𝒊𝒑 𝒅𝟏 𝑨 − 𝑫 𝟒 , 𝒅𝟏 𝟑 𝑭𝒊𝒍𝒕𝒆𝒓(𝒙, 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉) = 𝑺𝑰𝑮𝑵(𝒙) ∗ (𝑴𝑨𝑿(𝟎, 𝒂𝒃𝒔(𝒙) − 𝑴𝑨𝑿(𝟎, 𝟐 ∗ ( 𝒂𝒃𝒔(𝒙) − 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉)))) 105 Deblocking Filter Mode
  106. 106. − The Deblocking Filter mode improves subjective quality by removing blocking and mosquito artifacts common to block-based video coding at low bit rates. − Many applications make use of a post filter to reduce these artifacts. − The post-filtering is useful in error-free and error-prone environments. − This post filter is usually present at the decoder and is outside the coding loop. Therefore, prediction is not based on the post filtered version of the picture. − The one-dimensional version of the filter will be described. − To obtain a two-dimensional effect, the filter is first used in the horizontal direction and then in the vertical direction. − The post filter is applied to all pixels within the picture. − Edge pixels should be repeated when the filter is applied at picture boundaries. 106 Post-Filter
  107. 107. − The pixels A, B, C, D, E, F, G, (H) are aligned horizontally or vertically. − The post-filter strength is proportional to the quantization: Strength(QUANT) − The Strength1 and Strength2 may be different to better adapt the total filter strength to QUANT. − The Strength1, 2 may be related to QUANT for the macroblock where D belongs or to some average value of QUANT over parts of the picture or over the whole picture. 107 Post-Filter 𝑫𝟏 = 𝑫 + 𝑭𝒊𝒍𝒕𝒆𝒓 𝑨 + 𝑩 + 𝑪 + 𝑬 + 𝑭 + 𝑮 − 𝟔𝑫 𝟖 , 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉𝟏 when filtering in the first direction 𝑫𝟏 = 𝑫 + 𝑭𝒊𝒍𝒕𝒆𝒓 𝑨 + 𝑩 + 𝑪 + 𝑬 + 𝑭 + 𝑮 − 𝟔𝑫 𝟖 , 𝑺𝒕𝒓𝒆𝒏𝒈𝒕𝒉𝟐 when filtering in the second direction The relation between Strength1, 2 and QUANT
  108. 108. 108 Deblocking Loop Filter Demo No Filter Deblocking Loop Filter
  109. 109. 109 Deblocking Loop Filter and Post Filter Demo Deblocking Loop Filter and Post FilterNo Filter
  110. 110. 110 Deblocking Loop Filter and Post Filter Demo No Filter Loop Filter Only Deblocking Loop Filter and Post Filter
  111. 111. 111 No Filter Deblocking Loop Filter and TMN-8 Post FilterDeblocking Loop Filter Only TMN-8 Post Filter Only Deblocking Loop Filter and Post Filter Demo sequenceForeman24Kbps,10fps TMN-8:VideoCodecTestModel,near-term,Version8(TMN8) − The deblocking filter alone reduces blocking artifacts significantly, mainly due to the use of four motion vectors per macroblock. − The filtering process provides smoothing, further improving subjective quality. − The effects of the post filter are less noticeable, and adding the post filter may actually result in blurriness. − Therefore, the use of the deblocking filter alone is usually sufficient.
  112. 112. − Allows insertion of resynchronization markers at macroblock boundaries to improve network packetization and reduce overhead. More on this later • Allows more flexible tiling of video frames into independently decodable areas to support “view ports”, a.k.a. “local decode.” • Improves error resiliency by reducing intra-frame dependence. • Permits out-of-order transmission to reduce latency. 112 Slice Structured Mode
  113. 113. 113 Slice Structured Mode Slice Boundaries No INTRA or MV Prediction Across Slice Boundaries. Slices Start And End on Macroblock Boundaries. Slice Boundaries No INTRA or MV Prediction Across Slice Boundaries. Slice Sizes Remain Fixed Between INTRA Frames.
  114. 114. Backwards compatible with H.263 but permits indication of supplemental information for features such as: • Partial and full picture freeze requests • Partial and full picture snapshot tags • Video segment start and end tags for off-line storage • Progressive refinement segment start and end tags • Chroma keying info for transparency • The Chroma Keying Information Function (CKIF) indicates that the "chroma keying" technique is used to represent "transparent" and "semi-transparent" pixels in the decoded video pictures. • When being presented on the display, "transparent" pixels are not displayed. • Instead, a background picture which is either a prior reference picture or is an externally controlled picture is revealed. • Semitransparent pixels are displayed by blending the pixel value in the current picture with the corresponding value in the background picture. 114 Supplemental Enhancement Information
  115. 115. − Resampling of a temporally previous reference picture prior to its use as a reference for encoding, enabling global motion compensation, predictive dynamic resolution conversion, predictive picture area alteration and registration, and special-effect warping; − Allows frame size changes of a compressed video sequence without inserting an Intra frame (No Intra frame required when changing video frame sizes). − Permits the warping of the reference frame via affine transformations to address special effects such as zoom, rotation, translation. − Can be used for emergency rate control by dropping frame sizes adaptively when bit rate get too high. 115 Reference Picture Resampling
  116. 116. − Specifies generalized method applied to previous reference picture to generate warped picture for use in predicting current picture − Special case of factor of 4 resampling, which converts horizontal and vertical size by factor of 2 (upsampling) or ½(downsampling) in each direction. 116 Reference Picture Resampling Pixel positions of the reference picture Pixel positions of the downsamped predicted picture a=(A+B+C+D+1+RCRPR)/4 . . Downsampling a A B C D Pixel positions of the reference picture Pixel positions of the upsamped predicted picture a=(9A+3B+3C+D+7+RCRPR)/16 b=(3A+9B+C+3D+7+RCRPR)/16 c=(3A+B+9C+3D+7+RCRPR)/16 d=(A+3B+3C+9D+7+RCRPR)/16 a c b d A B C D Upsampling
  117. 117. − Specify arbitrary warping parameters via displacement vectors from corners. − For source format changes − Global motion compensation − Special-effect warping 117 Reference Picture Resampling with Warping MV00 MV10 MV11 MV01
  118. 118. No Intra frame required when changing video frame sizes 118 Reference Picture Resampling Factor of 4 Size Change P P P P P
  119. 119. − Allows more flexibility in adapting quantizers on a macroblock by macroblock basis, by enabling large quantizer changes through the use of escape codes. − A mode which improves the control of the bit rate by changing the method for controlling the quantizer step size on a macroblock basis. − Reduces quantizer step size for chrominance blocks, compared to luminance blocks to reduces the prevalence of chrominance artifacts . − Modifies the allowable DCT coefficient range to avoid clipping, yet disallows illegal coefficient/quantizer combinations. − Increases the range of representable DCT coefficient values for use with small quantizer step sizes, and increases error detection performance and reduces decoding complexity by prohibiting certain unreasonable coefficient representations. 119 Modified Quantization (MQ)
  120. 120. − Allow modification of the quantizer at macroblock layer to any value, not limited to +1, -1, +2, and -2. • DQUANT uses 2 bits (started with “1”) to specify small changes. − It uses 6 bits (started with “0”) to specify other changes. − Codeword: 0xxxxx, where the last 5 bits specify the new QUANT value. 120 Modified Quantization (MQ) Change of QUANT Prior QUANT DQUANT = 10 DQUANT = 11 1 +2 +1 2- 10 - 1 +1 11- 20 - 2 +2 21- 28 - 3 +3 29 - 3 +2 30 - 3 +1 31 - 3 - 5
  121. 121. − Enhance chrominance quality by a finer quantizer. − Improve picture quality by extending the range of representable quantized DCT coefficients, not limited by [-127, +127]. 121 Modified Quantization (MQ) Range of QUANT Value of QUANT_C 1- 6 QUANT_C = QUANT 7- 9 QUANT_C = QUANT - 1 10- 11 9 12- 13 10 14- 15 11 16- 18 12 19- 21 13 22- 26 14 27- 31 15
  122. 122. − Used for bit rate control by reducing the size of the residual frame adaptively when bit rate gets too high. − A mode which allows an encoder to maintain a high frame rate during heavy motion by encoding a low- resolution update to a higher-resolution picture while maintaining high resolution in stationary areas 122 Reduced-Resolution Update (RRU) Up sampling 16*16 reconstructed block 8*8 Coefficients block Result of inverse transform Coefficients decoding Block layer decoding Bitstream Scaling-up Macroblock Layer Decoding Pseudo- Vector 16*16 Reconstructed prediction error block 16*16 prediction blockMotion Compensation Reconstructed Vector
  123. 123. − A scalable bit stream consists of layers representing different levels of video quality. − Everything can be discarded except for the base layer and still have reasonable video. − If bandwidth permits, one or more enhancement layers can also be decoded which refines the base layer in one of three ways: temporal, SNR, or spatial 123 Scalability Mode Enh. Layer 1 Enhancement Layer 3 Enhancement Layer 4 Base Layer Enhancement Layer 2 H.263+Encoder 40kb/s 20kb/s 90kb/s 200kb/s 320kb/s Layered Video Bitstreams
  124. 124. − Scalability is typically used when one bit stream must support several different transmission bandwidths simultaneously, or some process downstream needs to change the data rate unbeknownst to the encoder. 124 Scalability Mode Example: Conferencing Multipoint Control Unit
  125. 125. 125 384 kb/s 384 kb/s 128 kb/s 28.8 kb/s Scalability Mode Layered Video Bit Streams in Multipoint Conferencing
  126. 126. 126 Scalability Mode Higher Frame Rate! Base Layer + B Frames Better Spatial Quality! Base Layer + SNR Layer SNR Enhancement More Spatial Resolution!! Base Layer + Spatial Layer Spatial Enhancement Temporal Enhancement
  127. 127. SNR Scalability EI EP EP Enhancement Layer I P P Base Layer Spatial Scalability Base Layer I P P EI EP EPEnhancement Layer Temporal Scalability B2 B4I1 P3 P5 Scalability Mode Low Temporal Resolution High Temporal Resolution 127
  128. 128. Two or more frame rates can be supported by the same bit stream. − It is achieved using bidirectionally predicted pictures or B-pictures. − The B-frames can be discarded (to lower the frame rate) and the bit stream remains usable. − These B-pictures differ from the B-picture part of PB-frames in that they are separate entities in the bitstream. − These B-pictures are not syntactically intermixed with a subsequent P or its enhancement part EP. − B-pictures and the B part of PB-frames are not used as reference pictures for the prediction of any other pictures. This property allows for B-pictures to be discarded if necessary without adversely affecting any subsequent pictures, thus providing temporal scalability. − Since H.263 is normally used for low frame rate applications (low bit rates, e.g. mobile), due to larger separation between the base layer I- and P- pictures, there is normally one B-picture between them. 128 Temporal Scalability I or P B B P ...... • I and P frames form the base layer • B-frames from the temporal enhancement layer • B-frames can be discarded Temporal Scalability Demonstration • layer 0, 3.25 fps, P-frames • layer 1, 15 fps, B-frames
  129. 129. The difference between the input picture and lower quality base layer picture is coded. − The picture in the base layer which is used for the prediction of the enhancement layer pictures may be an I-picture, a P-picture, or the P part of PB frames, but should not be a B-picture or the B part of a PB frame. − In the enhancement layer two types of picture are identified, EI (enhancement I-picture) and EP (enhancement P-picture). − If prediction is only formed from the base layer, then the enhancement layer picture is referred to as EI-picture. − In this case, the base layer picture can be an I- or a P-picture (or the P part of PB frames). − For both EI- and EP-pictures, prediction from the reference layer uses no motion vectors → no inter prediction from base layer. − however, EP may be predictively coded with respect to its previous reconstructed picture at the same layer, called forward prediction. 129 SNR Scalability Base Layer (15 kbit/s) Enhancement Layer (40 kbit/s) EI EP EP PPI EI EP P P I - Intracoded or Key Frame P - Predicted Frame EI - Enhancement layer key frame (enhancement I-picture) EP - Enhancement layer predicted frame (enhancement P-picture) SNR Scalability Demonstration • layer 0, 10 fps, 40 kbps • layer 1, 10 fps, 400 kbps
  130. 130. − The arrangement of the enhancement layer pictures in the spatial scalability is similar to that of SNR scalability. − The only difference is that before the picture in the reference layer is used to predict the picture in the spatial enhancement layer, it is downsampled by a factor of 2 either horizontally or vertically (one-dimensional spatial scalability), or both horizontally and vertically (two-dimensional spatial scalability). − If enhancement layer be 2X the size of the base layer in each dimension the base layer is interpolated (by 2X) before predicting the spatial enhancement layer. 130 Spatial Scalability Base Layer Enhancement Layer EI EP EP PPI EI EP P P I - Intracoded or Key Frame P - Predicted Frame EI - Enhancement layer key frame (enhancement I-picture) EP - Enhancement layer predicted frame (enhancement P-picture) Spatial Scalability Demonstration • layer 0, QCIF, 10 fps, 60 kbps • layer 1, CIF, 10 fps, 300 kbps
  131. 131. − It will increase the robustness of H.263 against the channel errors. − It is possible for B-pictures to be temporally inserted not only between the base layer pictures of type I, P, PB and, but also between the enhancement picture types of EI and EP, whether these consist of SNR or spatial enhancement pictures. − It is also possible to have more than one SNR or spatial enhancement layer in conjunction with the base layer. Thus, a multilayer scalable bitstream can be a combination of SNR layers, spatial layers and B- pictures. − As with the two-layer case, B-pictures may occur in any layer. − However, any picture in an enhancement layer which is temporally simultaneous with a B-picture in its reference layer must be a B-picture or the B-picture part of PB frames. This is to preserve the disposable nature of B-pictures. − Note, however, that B-pictures may occur in any layers that have no corresponding picture in the lower layers. This allows an encoder to send enhancement video with a higher picture rate than the lower layers. 131 Hybrid or Multilayer Scalability EP E I P EI E P P B E P P EI E I I Base Layer Enhancement Layer1 Enhancement Layer2 E P P B I - Intracoded or Key Frame P - Predicted Frame EI - Enhancement layer key frame (enhancement I-picture) EP - Enhancement layer predicted frame (enhancement P-picture) Scalability Demonstration • SNR/Spatial Scalability, 10 fps • layer 0, 88x72, ~5 kbit/s, layer 1, 176x144, ~15 • layer 2, 176x144, ~40, layer 3, 352x288, ~80 • layer 4, 352x288, ~200
  132. 132. Pictures, which are dependent on other pictures, are located in the bitstream after the pictures on which they depend. − The bitstream syntax order is specified such that for reference pictures (i.e. pictures having types I, P, EI, EP or the P part of PB) the following two rules shall be obeyed: 1. All reference pictures with the same temporal reference appear in the bitstream in increasing enhancement layer order. This is because each lower layer reference picture is needed to decode the next higher layer reference picture. 2. All temporally simultaneous reference pictures as discussed in item 1 appear in the bitstream prior to any B-pictures for which any of these reference pictures is the first temporally subsequent reference picture in the reference layer of the B-picture. This is done to reduce the delay of decoding all reference pictures, which may be needed as references for B-pictures. 132 Transmission Order of Pictures Enhancement Layer 2 Base Layer Enhancement Layer 1 4 3 2 1 8 7 6 5 EI EP P I B B B B Enhancement Layer 2 Base Layer Enhancement Layer 1 4 3 2 1 5 8 7 6 EI EP P I B B B B Two Allowable Picture Transmission Orders
  133. 133. − Then, the B-pictures with earlier temporal references follow (temporally ordered within each enhancement layer). − The bitstream location of each B-picture complies with the following rules: • Be after that of its first temporally subsequent reference pictures in the reference layer. This is because the decoding of the B-pictures generally depends on the prior decoding of that reference picture. • Be after that of all reference pictures that are temporally simultaneous with the first temporally subsequent reference picture in the reference layer. This is to reduce the delay of decoding all reference pictures, which may be needed as references for B- pictures. • Precede the location of any additional temporally subsequent pictures other than B- pictures in its reference layer. Otherwise, it would increase picture storage memory requirement for the reference layer pictures. • Be after that of all EI- and EP-pictures that are temporally simultaneous with the first temporally subsequent reference picture. • Precede the location of all temporally subsequent pictures within the same enhancement layer. Otherwise, it would introduce needless delay and increase picture storage memory requirements for the enhancement layer. 133 Transmission Order of Pictures Enhancement Layer 2 Base Layer Enhancement Layer 1 4 3 2 1 8 7 6 5 EI EP P I B B B B Enhancement Layer 2 Base Layer Enhancement Layer 1 4 3 2 1 5 8 7 6 EI EP P I B B B B Two Allowable Picture Transmission Orders
  134. 134. I B Base Layer P EI EP Enhancement Layer 1 EP SNR Scalability Spatial Scalability Enhancement Layer 2 EI EP EI Temporal Scalability B B Temporal Scalability Enhancement Layer 3 Hybrid or Multilayer Scalability Example 134
  135. 135. I PBBase Layer EI EPEnhancement Layer 1 SNR Scalability Enhancement Layer 3 B B Temporal Scalability Enhancement Layer 2 EI EI EPSpatial Scalability Multilayer Transmission Order Example I B P EI EP EP EI EP EI Temporal Scalability B B Temporal Scalability EP 135
  136. 136. Method for interpolating pixels for 2-D scalability 136 Interpolation for Spatial Scalability a b c d A B C D Original pixel positions Interpolated pixel positions a =(9A+3B+3C+D+8)/16 b =(3A+9B+C+3D+8)/16 c =(3A+B+9C+3D+8)/16 d =(A+3B+3C+9D+8)/16 Interpolation Formulation (Filtering)
  137. 137. Method for 2-D interpolation at boundaries 137 Interpolation for Spatial Scalability Original Pixel Positions Interpolated Pixel Positions a = A b = (3*A +B + 2) / 4 c = (A + 3*B + 2) / 4 d = (3*A + C + 2) / 4 e = (A +3*C + 2) / 4 Picture Boundary a b c d e A B C D Picture Boundary
  138. 138. − Improved PB-frames • Improves upon the previous PB-frame mode by permitting forward prediction of “B” frame with a new vector. − Reference Picture Selection (RPS) • A lower latency method for dealing with error prone environments by using some type of back- channel to indicate to an encoder when a frame has been received and can be used for motion estimation. • In RPS Mode, a frame is not used for prediction in the encoder until it’s been acknowledged to be error free. 138 Other Miscellaneous Features
  139. 139. − Independently Decodable Segments • When signaled, it restricts the use of data outside of a current Group-of-Block segment or slice segment. Useful for error resiliency. − Alternative INTER VLC (AIV): • Permits use of an alternative VLC table that is better suited for Intra coded blocks, or blocks with low quantization. • A mode which reduces the number of bits needed for encoding predictively-coded blocks when there are many large coefficients the block. 139 Other Miscellaneous Features
  140. 140. 140 Video Source Decompress (Decode) Compress (Encode) Video Display Coded video ENCODER + DECODER = CODEC
  141. 141. − Phone lines are “circuit-switched”. − A (virtual) circuit is established at call initiation and remains for the duration of the call. 141 Internet Basics Source Dest.switch switch switch
  142. 142. − Computer networks are “packet-switched”. − Data is fragmented into packets, and each packet finds its way to the destination using different routes. − Lots of implications... 142 Internet Basics Source Dest.switch switch switchX
  143. 143. 143 The Internet Is Heterogeneous Router Router Router Corporate LAN INTERNET (Global Public) AOL HyperStream FR: Frame Relay SMDS: Switched Multimegabit Data Service ATM: Asynchronous Transfer Mode LAN LAN TYMNET MCI Mail LAN Mail Gateway Host Dial-up IP “SLIP: Serial Line Internet Protocol”, “PPP: Point- to-Point Protocol ” IP IPIP “SMTP: Simple Mail Transfer Protocol” E-mail FR: Frame Relay “SLIP: Serial Line Internet Protocol”, “PPP: Point-to-Point Protocol ” X.25 “SMTP: Simple Mail Transfer Protocol” IP Dial-up E-mail FR: Frame RelayFR: Frame Relay
  144. 144. − MCI Mail was one of the first ever commercial email services in the United States and one of the largest telecommunication services in the world. − AOL Mail is a free web-based email service provided by AOL, a division of Verizon Communications. − X. 25 is an ITU-T standard protocol suite for packet-switched data communication in wide area networks (WAN). − Frame Relay (FR) is a standardized wide area network technology that specifies the physical and data link layers of digital telecommunications channels using a packet switching methodology. − Asynchronous Transfer Mode (ATM) is a telecommunications standard defined by ANSI and ITU standards for carriage of user traffic, including telephony, data, and video signals. − Switched Multimegabit Data Service (SMDS) is a wide area networking (WAN) connection service designed for LAN interconnection through the public telephone network. SMDS is designed for moderate bandwidth connections, between 1 to 34 Mbps, although SMDS has and is being extended to support both lower and higher bandwidth connections. − Tymnet was an international data communications network headquartered in Cupertino, California that used virtual call packet switched technology and X.25, SNA/SDLC, ASCII and BSC interfaces to connect host computers at thousands of large companies, educational institutions, and government agencies. 144 The Internet Is Heterogeneous
  145. 145. OSI (Open System Interconnection) Model 145
  146. 146. Comparison Between OSI and TCP/IP Model 146
  147. 147. 147 Layers in the Internet Protocol Architecture Network Access Layer consists of routines for accessing physical networks 1 Internet Layer defines the datagram and handles the routing of data. 2 Host-to-Host Transport Layer provides end-to-end data delivery services. 3 Application Layer consists of applications and processes that use the network. 4 Header Header Header Data Data Header Data Header Header Data
  148. 148. 148 Internet Protocol Architecture I P FDDI Ethernet Token Ring HDLC SMDS X.25 ATM FR TCP UDP SNMP DNS TELNET FTP SMTP MIME . . . . . . Network Access Layer Internet Host-Host Transport Utility/Application RTP MBone VIC/VAT
  149. 149. 149 Specific Protocols for Multimedia IP TCP UDP RTP Physical Network Data IP UDP RTP payload RTP payload UDP RTP payload Payload header
  150. 150. − IP implements two basic functions • Addressing • Fragmentation − IP treats each packet as an independent entity. − Internet routers choose the best path to send each packet based on its address. Each packet may take a different route. − Routers may fragment and reassemble packets when necessary for transmission on smaller packet networks. − No guarantee a packet will reach its destination, and no guarantee of when it will get there. • IP packets have a Time-to-Live, after which they are deleted by a router. • IP does not ensure secure transmission. • IP only error-checks headers, not payload. 150 The Internet Protocol (IP) IP TCP UDP RTP Physical Network Data IP UDP RTP payload RTP payload UDP RTP payload Payload header
  151. 151. − TCP is connection-oriented, end-to-end reliable, in-order protocol. − TCP does not make any reliability assumptions of the underlying networks. − Acknowledgment is sent for each packet. − A transmitter places a copy of each packet sent in a timed buffer. − If no “ack” is received before the time is out, the packet is re-transmitted. − TCP has inherently large latency → not well suited for streaming multimedia. 151 Transmission Control Protocol (TCP) IP TCP UDP RTP Physical Network Data IP UDP RTP payload RTP payload UDP RTP payload Payload header
  152. 152. − UDP is a simple protocol for transmitting packets over IP. − Smaller header than TCP, hence lower overhead. − Does not re-transmit packets. − This is OK for multimedia since a late packet usually must be discarded anyway. − Performs check-sum of data. 152 Universal Datagram Protocol (UDP) IP TCP UDP RTP Physical Network Data IP UDP RTP payload RTP payload UDP RTP payload Payload header
  153. 153. 153 Transmission Control Protocol (TCP) and Universal Datagram Protocol (UDP)
  154. 154. − RTP carries data that has real time properties − Typically runs on UDP/IP − Does not ensure timely delivery or QoS. − Does not prevent out-of-order delivery. − Profiles and payload formats must be defined. • Profiles define extensions to the RTP header for a particular class of applications such as audio/video conferencing (IETF RFC 1890). • Payload formats define how a particular kind of payload, such as H.261 video, should be carried in RTP. − Used by Netscape LiveMedia, Microsoft NetMeeting®, Intel VideoPhone, ProShare® Video Conferencing applications and public domain conferencing tools such as VIC and VAT. 154 Real time Transport Protocol (RTP) IP TCP UDP RTP Physical Network Data IP UDP RTP payload RTP payload UDP RTP payload Payload header
  155. 155. − RTCP is a companion protocol to RTP which • monitors the quality of service • conveys information about the participants in an on-going session − It allows participants to send transmission and reception statistics to other participants. − It also sends information that allows participants to associate media types such as audio/video for lip-sync. − Sender reports allow senders to derive round trip propagation times. − Receiver reports include count of lost packets and inter-arrival jitter. − Scales to a large number of users since must reduce the rate of reports as the number of participants increases. 155 Real-time Transport Control Protocol (RTCP)
  156. 156. − Most IP-based communication is unicast. A packet is intended for a single destination. − In unicasting, the router forwards the received packet through only one of its interfaces. − The relationship between the source and the destination is one-to-one. 156 Unicasting
  157. 157. − For multi-participant applications, streaming multimedia to each destination individually can waste network resources. − A multicast address is designed to enable the delivery of packets to a set of hosts that have been configured as members of a multicast group across various subnetworks. − In multicasting, the router may forward the received packet through several of its interfaces. − The source address is a unicast address, but destination address is a group address. 157 Multicast Packets are duplicated in routers One source and a group of destination
  158. 158. 158 Unicast Example, Streaming Media to Multi-participants S1 D1 S2 D1 D2 R R R 1 1 2 S1 sends duplicate packets because there’s two participants: D1, D2.. D2 sees excess traffic on this subnet.
  159. 159. 159 Multicast Example, Streaming Media to Multi-participants S1 D1 S2 D1 D2 R R R 1 2 S1 sends single set of packets to a multicast group. D2 doesn’t see any excess traffic on this subnet. Both D1 receivers subscribe to the same multicast group.
  160. 160. − A multicast router may not find another multicast router in the neighborhood to forward the multicast packet. − We make a multicast backbone (Mbone) out of these isolated routers using the concept of tunneling. − The multicast backbone (Mbone) was an experimental backbone and virtual network built on top of the Internet for carrying IP multicast traffic on the Internet. It required specialized hardware and software (early of 1990s). 160 Multicast Backbone (Mbone) concept of tunneling. Virtual point- to-point link Isolated island of routers Nonmulticast routers
  161. 161. − Easy to deploy (no explicit router support). − Manual tunnel creation/maintenance. − No routing policy – single tree. 161 Multicast Backbone (Mbone) MBONE
  162. 162. 162 Multicast Backbone (Mbone) IP header G=224.x.x.x Data Nonmulticast routers IP header G=224.x.x.x Data Encapsulator (router entry point of the tunnel) Decapsulator (router exit point of the tunnel) Mbone IP in IP Tunneling
  163. 163. Real-time applications • Interactive applications are sensitive to packet delays (telephone) • Non-interactive applications can adapt to a wider range of packet delays (audio, video broadcasts) • Guarantee of maximum delay is useful 163 Quality of Service Requirements (1) Arrival Offset Graph Playout Point Sampled Audio Playout Buffer must be small for interactive applications
  164. 164. Elastic applications − Interactive data transfer (e.g. HTTP, FTP) • Sensitive to the average delay, not to the distribution tail − Bulk data transfer (e.g. mail and news delivery) • Delay insensitive − Best effort works well 164 Quality of Service Requirements (2) Document Document is only useful when it is completely received. This means average packet delay is important, not maximum packet delay. Document
  165. 165. Used by hosts to obtain a certain QoS from underlying networks for a multimedia stream (It operates over an IPv4 or IPv6). − It provides receiver-initiated setup of resource reservations for multicast or unicast data flows. − At each node, RSVP daemon attempts to make a resource reservation for the stream. − It communicates with two local modules: • Admission Control: It determines whether the node has sufficient resources available. “The Internet Busy Signal” • Policy Control: It determines whether the user has administrative permission to make the reservation. 165 ReSerVation Protocol (RSVP) Application RSVPD Admissions Control Packet Classifier Packet Scheduler Policy Control DATA DATA RSVPD Policy Control Admissions Control Packet Classifier Packet Scheduler DATA Routing Process Host Router RSVP Functional Diagram
  166. 166. 166 ReSerVation Protocol (RSVP) R4 R5 R3R2 R1 Host A 24.1.70.210 Host B 128.32.32.69 PATH PATH 2 2. The Host A RSVP daemon generates a PATH message that is sent to the next hop RSVP router, R1, in the direction of the session address, 128.32.32.69. 3 3. The PATH message follows the next hop path through R5 and R4 until it gets to Host B. Each router on the path creates soft session state with the reservation parameters. 1. An application on Host A creates a session, 128.32.32.69/4078, by communicating with the RSVP daemon on Host A. 1
  167. 167. 167 ReSerVation Protocol (RSVP) R4 R5 R3R2 R1 PATH PATH RESV RESV 5 5. The Host B RSVP daemon generates a RESV message that is sent to the next hop RSVP router, R4, in the direction of the source address, 24.1.70.210. 6 6. The RESV message continues to follow the next hop path through R5 and R1 until it gets to Host A. Each router on the path makes a resource reservation. 4. An application on Host B communicates with the local RSVP daemon and asks for a reservation in session 128.32.32.69/4078. The daemon checks for and finds existing session state. 4 Host A 24.1.70.210 Host B 128.32.32.69
  168. 168. − HTTP generally runs on TCP/IP and is the protocol upon which World-Wide-Web data is transmitted. − Defines a “stateless” connection between receiver and sender. − Sends and receives MIME-like messages and handles caching, etc. − No provisions for latency or QoS guarantees. 168 Hyper-Text Transport Protocol (HTTP)
  169. 169. 169 Real-time Streaming Protocol (RTSP) RTSPMeta FilesMedia file download A “network remote control” for multimedia servers. − Establishes and controls either a single or several time-synchronized streams of continuous media such as audio and video. − Supports the following operations: • Requests a presentation from a media server. • Invite a media server to join a conference and playback or record. • Notify clients that additional media is available for an existing presentation.
  170. 170. 170 RTSP Media file download Meta Files Real-time Streaming Protocol (RTSP)
  171. 171. 171 Real-time Streaming Protocol (RTSP) RTSP - Example
  172. 172. − How do we handle the special cases of • unicasting? • Multicasting? − What about • packet-loss? • Quality of service? • Congestion? We’ll look at some solutions... 172 How Do We Stream Video Over the Internet?
  173. 173. − HTTP was not designed for streaming multimedia, nevertheless because of its widespread deployment via Web browsers, many applications stream via HTTP. − It uses a custom browser plug-in which can start decoding video as it arrives, rather than waiting for the whole file to download. − Operates on TCP so it doesn’t have to deal with errors, but the side effect is high latency and large inter- arrival jitter. − Usually a receive buffer is employed which can buffer enough data (usually several seconds) to compensate for latency and jitter. − Not applicable to two-way communication! − Firewalls are not a problem with HTTP. 173 HTTP Streaming
  174. 174. − RTP was designed for streaming multimedia. − Does not resend lost packets since this would add latency and a late packet might as well be lost in streaming video. − Used by Intel Videophone, Microsoft NetMeeting, Netscape LiveMedia, RealNetworks, etc. − Forms the basis for network video conferencing systems (ITU-T H.323) − Subject to packet loss, and has no quality of service guarantees. − Can deal with network congestion via RTCP reports under some conditions: • Should be encoding real time so video rate can be changed dynamically. • Needs a payload defined for each media it carries. 174 RTP Streaming
  175. 175. − Payloads must be defined in the IETF(Internet Engineering Task Force) for all media carried by RTP. − A payload has been defined for H.263 and H.263+. − An RTP packet typically consists of... − The H.263 payload header contains redundant information about the H.263 bit stream which can assist a payload handler and decoder in the event that related packets are lost. − Slice mode of H.263+ aids RTP packetization by allowing fragmentation on MB boundaries (instead of MB rows) and restricting data dependencies between slices. − But what do we do when packets are lost or arrive too late to use? 175 H.263 Payload for RTP RTP Header H.263 Payload Header H.263 Payload (bit stream)
  176. 176. 176 Video Source Decompress (Decode) Compress (Encode) Video Display Coded video ENCODER + DECODER = CODEC
  177. 177. − Depends on network topology. − On the Mbone • 2-5% packet loss • single packet loss most common − For end-to-end transmission, loss rates of 10% not uncommon. − For ISPs, loss rates may be even higher during high periods of congestion. 177 Internet Packet Loss
  178. 178. 178 Distribution of length of loss bursts observed at a receiver 0.0001 0.001 0.01 0.1 1 0 5 10 15 20 25 30 35 40 45 50 length of loss bursts, b Probabilityofbursts oflengthb Conditional loss probability 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 2 4 6 8 10 12 Number of consecutive packets lost, n Probabilityoflosingpacketn+1 Internet Packet Loss Packet Loss Burst Lengths
  179. 179. Error resiliency and compression have conflicting requirements. − Video compression attempts to remove as much redundancy out of a video sequence as possible. − Error resiliency techniques at some point must reconstruct data that has been lost and must rely on extrapolations from redundant data. 179 Error Resiliency + - REDUNDANCY CompressionResiliency
  180. 180. − Errors tend to propagate in video compression because of its predictive nature. − 180 Error Resiliency I or P frame P frame One block is lost. Error propagates to two blocks in the next frame.
  181. 181. There are essentially two approaches to dealing with errors from packet loss: • Error Redundancy Methods • They are preventative measures that add extra infromation at the encoder to make it easier to recover when data is lost. • The extra overhead decreases compression efficiency but should improve overall quality in the presence of packet loss. • Error Concealment Techniques • They are the methods that are used to hide errors that occur once packets are lost. − Usually both methods are employed. 181 Error Resiliency
  182. 182. 182 Intra Coding Resiliency 20 25 30 35 40 45 20 40 60 80 100 120 140 160 180 Data Rate (kbps) AveragePSNR resil 0 loss 0 resil 5 loss 0 resil 10 loss 0 resil 0 loss 10- 20 resil 5 loss 10- 20 resil 10 loss 10- 20
  183. 183. − Increasing the number of Intra coded blocks that the encoder produces will reduce error propagation since Intra blocks are not predicted. − Blocks that are lost at the decoder are simply treated as empty Inter coded blocks (Skipped Blocks). − The block is simply copied from the previous frame. − Very simple to implement. 183 Simple Intra Coding & Skipped Blocks
  184. 184. 184 Reference Picture Selection (RPS) Mode of H.263+ I or P frame P frame P frame Last acknowledged error-free frame. In RPS Mode, a frame is not used for prediction in the encoder until it’s been acknowledged to be error free. No acknowledgment received yet - not used for prediction. − Select one of several picture memories/prediction structures to reduce error propagation. Bad picture
  185. 185. • Back channel message types – Neither: no back channel is returned form decoder to encoder – ACK: decoder returns only acknowledgement messages – NACK: decoder returns only non-acknowledgement messages – ACK+NACK: decoder returns both types of messages • Channel for Back channel messages – Separate Logical Channel: uses separate logical channel in the multiplex layer of system – VideoMux: sends back-channel data within forward video data of a video stream coded data ACK-based: a picture is assumed to contain errors, and thus is not used for prediction unless an ACK is received. NACK-based: a picture will be used for prediction unless a NACK is received, in which case the previous picture that didn’t receive a NACK will be used. 185 Reference Picture Selection (RPS) Mode of H.263+
  186. 186. 186 Coding Control (CC) T Q Q T p t qz q v Video in To video multiplex coder -1 -1 P AP1 AP2 APn Reference Picture Selection (RPS) Mode of H.263+
  187. 187. Reference pictures are interleaved to create two or more independently decodable threads. − If a frame is lost, the frame rate drops to 1/2 rate until a sync frame is reached. − Same syntax as Reference Picture Selection, but without ACK/NACK. − Adds some overhead since prediction is not based on most recent frame. 187 Multi-threaded Video 1 3 2 5 7 9 4 6 8 10 I P P P P P P P P I
  188. 188. − A video encoder contains a decoder (called the loop decoder) to create decoded previous frames which are then used for motion estimation and compensation. − The loop decoder must stay in sync with the real decoder, otherwise errors propagate. 188 Conditional Replenishment ME/MC DCT, etc. Decoder Decoder Encoder
  189. 189. − One solution is to discard the loop decoder. − Can do this if we restrict ourselves to just two macroblock types: • Intra coded • Empty (just copy the same block from the previous frame) − The technique is to check if the current block has changed substantially since the previous frame and then code it as Intra if it has changed. Otherwise mark it as empty. − A periodic refresh of Intra coded blocks ensures all errors eventually disappear. 189 Conditional Replenishment ME/MC DCT, etc. Decoder Decoder Encoder
  190. 190. − Lost macroblocks are reported back to the encoder using a reliable back-channel. − The encoder catalogs spatial propagation of each macroblock over the last M frames. − When a macroblock is reported missing, the encoder calculates the accumulated error in each MB of the current frame. − If an error threshold is exceeded, the block is coded as Intra. − Additionally, the erroneous macroblocks are not used as prediction for future frames in order to contain the error. 190 Error Tracking Appendix II, H.263
  191. 191. − Some parts of a bit stream contribute more to image artifacts than others if lost. − The bit stream can be prioritized and more protection can be added for higher priority portions. 191 Prioritized Encoding AC Coefficients DC Coefficients MB Information Motion Vectors Picture Header Increasing Error Protection Unprotected Encoding Prioritized Encoding (23% Overhead) Prioritized Encoding Demo VideosusedwithpermissionofICSI,UCBerkeley
  192. 192. − To hide the image degradation from the viewer. − The main idea behind error concealment is to replace the damaged pixels with pixels from some parts of the video that have maximum resemblance. − In general, pixel substitution may come from the same frame or from the previous frame. − These are called intraframe and interframe error concealment, respectively 192 Error Concealment by Interpolation d1 d2 Lost block Take the weighted average of 4 neighboring pixels.
  193. 193. Error Concealment with • Least Square Constraints • Bayesian Estimators • Polynomial Interpolation • Edge-Based Interpolation • Multi-directional Recursive Nonlinear Filter (MRNF) 193 Other Error Concealment Techniques MPQT@0.5 bpp, block loss:10% MRNF-GMLOS, PSNR=34.94dB Example: MRNF Filtering
  194. 194. − Most multimedia applications place the burden of rate adaptivity on the source. − For multicasting over heterogeneous networks and receivers, it’s impossible to meet the conflicting requirements which forces the source to encode at a least-common denominator level. − The smallest network pipe dictates the quality for all the other participants of the multicast session. − If congestion occurs, the quality of service degrades as more packets are lost. 194 Network Congestion

×