Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi media chapter 1_2_3

Multimedia Technology
n Overview
q Introduction
q Chapter 1: Background of compression techniques
q Chapter 2: Multimedia technologies
n MPEG-1/MPEG -2 Audio & Video
n MPEG-4
n MPEG-7 (brief introduction)
n HDTV (brief introduction)
n H261/H263 (brief introduction)
n Model base coding (MBC) (brief introduction)
q Chapter 3: Some real-world systems
n CATV systems
n DVB systems
q Chapter 4: Multimedia Network

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Multi media chapter 1_2_3

  1. 1. Multimedia Technology Introduction (2)n Overview n Multimedia network q Introduction q The Internet was designed in the 60s for low-speed inter- q Chapter 1: Background of compression techniques networks with boring textual applications à High delay, q Chapter 2: Multimedia technologies high jitter. n JPEG q à Multimedia applications require drastic modifications n MPEG-1/MPEG -2 Audio & Video of the INTERNET infrastructure. n MPEG-4 q Many frameworks have been being investigated and n MPEG-7 (brief introduction) deployed to support the next generation multimedia n HDTV (brief introduction) Internet. (e.g. IntServ, DiffServ) n H261/H263 (brief introduction) q In the future, all TVs (and PCs) will be connected to the n Model base coding (MBC) (brief introduction) Internet and freely tuned to any of millions broadcast q Chapter 3: Some real-world systems stations all over the World. n CATV systems q At present, multimedia networks run over ATM (almost n DVB systems obsolete), IPv4, and in the future IPv6 à should guarantee QoS (Quality of Service) !! q Chapter 4: Multimedia Network4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 1 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 3Introduction Chapter 1: Background of compressionn The importance of Multimedia technologies: à Multimedia everywhere !! q On PCs: techniques n Real Player, QuickTime, Windows Media. n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg, n Why compression ? mov, ra, ram, mid, DIVX, etc) q For communication: reduce bandwidth in multimedia n Video/Audio Conferences. network applications such as Streaming media, Video-on- n Webcast / Streaming Applications Demand (VOD), Internet Phone n Distance Learning (or Tele-Education) q Digital storage (VCD, DVD, tape, etc) à Reduce size & n Tele-Medicine cost, increase media capacity & quality. n Tele-xxx (Let’s imagine !!) q On TVs and other home electronic devices: n Compression factor or compression ratio n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting – q Ratio between the source data and the compressed data. Terrestrial/Cable/Satellite) à shows MPEG -2 superior quality over traditional analog TV !! (e.g. 10:1) n Interactive TV à Internet applications (Mail, Web, E -commerce) on a TV !! n 2 types of compression: à No need to wait for a PC to startup and shutdown !! n CD/VCD/DVD/Mp3 players q Lossless compression q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !! q Lossy compression4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 2 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 4
  2. 2. Information content and redundancy Lossy Compressionn Information rate n The data from the expander is not identical to q Entropy is the measure of information content. the source data but the difference can not be n à Expressed in bits/source output unit (such as bits/pixel). The more information in the signal, the higher the q distinguished auditorily or visually. entropy. q Suitable for audio and video compression. q Lossy compression reduce entropy while lossless q Compression factor is much higher than that of compression does not. lossless. (up to 100:1)n Redundancy q The difference between the information rate and bit n Based on the understanding of rate. psychoacoustic and psychovisual perception. q Usually the information rate is much less than the bit rate. n Can be forced to operate at a fixed q Compression is to eliminate the redundancy. compression factor.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 5 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 7Lossless Compression Process of Compressionn The data from the decoder is identical to the n Communication (reduce the cost of the data source data. link) q Example: archives resulting from utilities such as q Data ? Compressor (coder) ? transmission pkzip or Gzip channel ? Expander (decoder) ? Data q Compression factor is around 2:1. n Recording (extend playing time: in proportionn Can not guarantee a fix compression ratio à to compression factor The output data rate is variable à problems q Data ? Compressor (coder) ? Storagedevice for recoding mechanisms or communication (tape, disk, RAM, etc.) ? Expander (decoder) channel. ? Data‘4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 6 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 8
  3. 3. Sampling and quantization Statistical coding: the Huffman coden Why sampling? n Assign short code to the most probable data q Computer can not process analog signal directly. pattern and long code to the less frequentn PCM data pattern. q Sample the analog signal at a constant rate and use a fixed number of bits (usually 8 or 16) to n Bit assignment based on statistic of the represent the samples. source data. q bit rate = sampling rate * number of bits per n The statistics of the data should be known sample prior to the bit assignment.n Quantization q Map the sampled analog signal (generally, infinite precision) to discrete level (finite precision). q Represent each discrete level with a number.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 9 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 11Predictive coding Drawbacks of compressionn Prediction n Sensitive to data error q Use previous sample(s) to estimate the current q Compression eliminates the redundancy which is essential to making data resistant to errors. sample. q For most signal, the difference of the prediction n Concealment required for real time application Error correction code is required, hence, adds redundancy and actual values is small. à We can use smaller q to the compressed data. number of bits to code the difference while maintaining the same accuracy !! n Artifacts q Artifacts appear when the coder eliminates part of the q Noise is completely unpredictable entropy. n Most codec requires the data being preprocessed or q The higher the compression factor, the more the artifacts. otherwise it may perform badly when the data contains noise.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 10 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 12
  4. 4. A coding example: Clustering color pixels Motion Compensated Prediction n More data in Frame-Differential Coding cann In an image, pixel values are clustered in several be eliminated by comparing the present peaks pixel to the location of the same object in the previous frame. (à not to then Each cluster representing the color range of one same spatial location in the previous frame) object in the image (e.g. blue sky) n The encoder estimates the motion in the image to find the corresponding area in an Coding process: previous frame. 1. Separate the pixel values into a limited number of data n The encoder searches for a portion of a previous frame which is similar to the part clusters (e.g., clustered pixels of sky blue or grass green) of the new frame to be transmitted. 2. Send the average color of each cluster and an n It then sends (as side information) a identifying number for each cluster as side information. motion vector telling the decoder what portion of the previous frame it will use to 3. Transmit, for each pixel: predict the new frame. n The number of the average cluster color that it is close to. n It also sends the prediction error so that n Its difference from that average cluster color. (à can be the exact new frame may be reconstituted coded to reduce redundancy since the differences are often n See top figure à without motion similar !!) à Prediction compensation – Bottom figure à With motion compensation4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 13 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 15Frame-Differential Coding Unpredictable Informationn Frame-Differential Coding = prediction from a n Unpredictable information from the previous previous video frame.n A video frame is stored in the encoder for frame: comparison with the present frame à causes 1. Scene change (e.g. background landscape encodinglatency of one frame time. change)n For still images: 2. Newly uncovered information due to object q Data can be sent only for the first instance of a frame q All subsequent prediction error values are zero. motion across a background, or at the edges of a q Retransmit the frame occasionally to allow receivers that panned scene. (e.g. a soccer ’s face uncovered have just been turned on to have a starting point. by a flying ball)n à FDC reduces the information for still images, but leaves significant data for moving images (e.g. a movement of the camera)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 14 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 16
  5. 5. Dealing with unpredictable Information Types of picture transform codingn Scene change n Types of picture coding: q à An Intra-coded picture (MPEG I picture) must be sent q Discrete Fourier (DFT) for a starting point à require more data than Predicted q Karhonen-Loeve picture (P picture) q Walsh-Hadamard q I pictures are sent about twice per second àTheir time and q Lapped orthogonal sending frequency may be adjusted to accommodate scene changes q Discrete Cosine (DCT) à used in MPEG-2 ! q Wavelets à New !n Uncovered information q Bi-directionally coded type of picture, or B picture. n The differences between transform coding methods: q There must be enough frame storage in the system to wait q The degree of concentration of energy in a few coefficients for the later picture that has the desired information. q The region of influence of each coefficient in the q To limit the amount of decoder’s memory, the encoder reconstructed picture stores pictures and sends the required reference q The appearance and visibility of coding noise due to coarse pictures before sending the B picture. quantization of the coefficients4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 17 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 19Transform Coding DCT Lossy Codingn Convert spatial image pixel values to n Lossless coding cannot obtain high transform coefficient values compression ratio (4:1 or less)n à the number of coefficients produced is n Lossy coding = discard selective information equal to the number of pixels transformed. so that the reproduction is visually or aurallyn Few coefficients contain most of the indistinguishable from the source or having energy in a picture à coefficients may be least artifacts. further coded by lossless entropy coding n Lossy coding can be achieved by:n The transform process concentrates the energy into particular coefficients q Eliminating some DCT coefficients (generally the “low frequency” coefficients ) q Adjusting the quantizing coarseness of the coefficients à better !!4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 18 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 20
  6. 6. Masking Run-Level codingn Masking make certain types of coding n "Run-Level" coding = Coding a run-length of noise invisible or inaudible due to some zeros followed by a nonzero level. psycho-visual/acoustical effect. q à Instead of sending all the zero values individually, the length of the run is sent. q In audio, a pure tone will mask energy of higher frequency and also lower frequency (with weaker q Useful for any data with long runs of zeros. effect). q Run lengths are easily encoded by Huffman code q In video, high contrast edges mask random noise.n Noise introduced at low bit rates falls in the frequency, spatial, or temporal regions4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 21 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 23Variable quantization Key points:n Variable quantization is the main technique of lossy n Compression process coding à greatly reduce bit rate. n Quantization & Samplingn Coarsely quantizing the less significant coefficients in a transform (à less noticeable / low energy / less n Coding: visible/audible) q Lossless & lossy codingn Can be applied to a complete signal or to individual q Frame-Differential Coding frequency components of a transformed signal. q Motion Compensated Predictionn VQ also controls instantaneous bit rate in order to: q Variable quantization q Match average bit rate to a constant channel bit rate. q Run level coding q Prevent buffer overflow or underflow. n Masking4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 22 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 24
  7. 7. Chapter 2: Multimedia technologies JPEG – Zig-zag scanning q Roadmap n JPEG n MPEG-1/MPEG-2 Video n MPEG-1 Layer 3 Audio (mp3) n MPEG-4 n MPEG-7 (brief introduction) n HDTV (brief introduction) n H261/H263 (brief introduction) n Model base coding (MBC) (brief introduction)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 25 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 27JPEG (Joint Photographic Experts Group) JPEG - DCTn JPEG encoder n DCT is similar to the Discrete Fourier Transform à q Partitions image into blocks of 8 * 8 pixels transforms a signal or image from the spatial domain to q Calculates the Discrete Cosine Transform (DCT) of each block. the frequency domain. q A quantizer roundsoff the DCT coefficients according to the n DCT requires less multiplications than DFT quantizationmatrix. à lossy but allows for large compression ratios. q Produces a series of DCT coefficients using Zig-zag scanning q Uses a variablelengthcode(VLC) on these DCT coefficients q Writes the compressed data stream to an output file (*.jpg or *.jpeg).n JPEG decoder n Input image A: q File à input data stream à Variable length decoder à IDCT (Inverse q The input image A is N2 pixels wide by N1 pixels high; DCT) à Image q A(i,j) is the intensity of the pixel in row i and column j; n Output image B: q B(k1,k2) is the DCT coefficient in row k1 and column k2 of the DCT matrix4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 26 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 28
  8. 8. JPEG - Quantization Matrix MPEG (Moving Picture Expert Group)n The quantization matrix is the 8 by 8 matrix of step sizes (sometimescalled quantums) - one element for each DCT n MPEG is the heart of: coefficient. q Digital television set-top boxesn Usually symmetric. q HDTV decodersn Step sizes will be: q Small in the upper left (low frequencies), q DVD players q Large in the upper right (high frequencies) q Video conferencing q A step size of 1 is the most precise.n The quantizer divides the DCT coefficient by its corresponding q Internet video, etc quantum, then rounds to the nearest integer. n MPEG standards:n Large quantums drive small coefficients down to zero.n The result: q MPEG-1, MPEG-2, MPEG-4, MPEG-7 q Many high frequency coefficients become zero à remove easily. q (MPEG-3 standard was abandoned and became q The low frequency coefficients undergo only minor adjustment. an extension of MPEG-2)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 29 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 31JPEG Coding process illustrated MPEG standards n MPEG-1 (Obsolete) 1255 -15 43 58 -12 1 -4 -6 78 -1 4 4 -1 0 0 0 q A standard for storage and retrieval of moving pictures and audio on storage media 11 -65 80 -73 -27 -1 -5 1 1 -5 6 -4 -1 0 0 0 q application: VCD (video compact disk) -49 37 -87 8 12 6 10 8 -4 3 -5 0 0 0 0 0 n MPEG-2 (Widely implemented) 27 -50 29 13 3 13 -6 5 Q 2 -3 1 0 0 0 0 0 q A standard for digital television -16 21 -11 -10 10 -21 9 -6 -1 1 0 0 0 0 0 0 q Applications: DVD (digital versatile disk), HDTV (high definition TV), DVB (European Digital Video Broadcasting Group), etc. 3 -14 0 14 -14 16 -8 4 0 0 0 0 0 0 0 0 -4 -1 8 -13 12 -9 5 -1 0 0 0 0 0 0 0 0 n MPEG-4 (Newly implemented – still being researched) -4 2 -2 6 -7 6 -1 3 0 0 0 0 0 0 0 0 q A standard for multimedia applications DCT Coefficients Quantization result q Applications: Internet, cable TV, virtual studio, etc. Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0 n MPEG-7 (Future work – ongoing research) q Content representation standard for information search 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB ( “Multimedia Content Description Interface”) à Easily coded by Run-length Huffman coding q Applications: Internet, video search engine, digital library4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 30 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 32
  9. 9. MPEG-2 formal standards Pixel & Blockn The international standard ISO/IEC 13818-2 n Pixel = "picture element". "Generic Coding of Moving Pictures and q A discrete spatial point sample of an image. Associated Audio Information” q A color pixel may be represented digitally as an ATSC (Advanced Television Systems number of bits for each of three primary color Committee) document A/54 "Guide to the Use of values the ATSC Digital Television Standard” n Block q = 8 x 8 array of pixels. q A block is the fundamental unit for the DCT coding (discrete cosine transform).4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 33 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 35MPEG video data structure Macroblockn The MPEG 2 video data stream is constructed in n A macroblock = 16 x 16 array of luma (Y) pixels ( = layers from lowest to highest as follows: 4 blocks = 2 x 2 block array). q PIXEL is the fundamental unit n The number of chroma pixels (Cr, Cb) will vary q BLOCK is an 8 x 8 array of pixels depending on the chroma pixel structure indicated in the sequence header (e.g. 4:2:0, etc) q MACROBLOCK consists of 4 luma blocks and 2 chroma blocks n The macroblock is the fundamental unit for motion q Field DCT Coding and Frame DCT Coding compensation and will have motion vector(s) associated with it if is predictively coded. q SLICE consists of a variable number of macroblocks q PICTURE consists of a frame (or field) of slices n A macroblock is classified as q Field coded (à An interlaced frame consists of 2 field) q GROUP of PICTURES (GOP) consists of a variable number of pictures q Frame coded à depending on how the four blocks are extracted from the q SEQUENCE consists of a variable number of GOP’s macroblock. q PACKETIZED ELEMENTARY STREAM (opt)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 34 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 36
  10. 10. Slice I, P, B Pictures Encoded pictures are classified into 3 types: I, P, and B.n Pictures are divided into slices. n I Pictures = Intra Coded Picturesn A slice consists of an arbitrary number of q All macroblocks coded without prediction q Needed to allow receiver to have a "starting point" for prediction after successive macroblocks (going left to right), a channel change and to recover from errors but is typically an entire row of macroblocks. n P Pictures = Predicted Pictures Macroblocks may be coded with forward prediction from references A slice does not extend beyond one row. q made from previous I and P pictures or may be intra codedn The slice header carries address information n B Pictures = Bi-directionally predicted pictures q Macroblocks may be coded with forward prediction from previous I that allows the Huffman decoder to or P references resynchronize at slice boundaries q Macroblocks may be coded with backward prediction from next I or P reference q Macroblocks may be coded with interpolated prediction from past and future I or P references q Macroblocks may be intra coded (no prediction)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 37 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 39Picture Group of pictures (GOP) n The group of pictures layer is optional in MPEG-2.n A source picture is a contiguous rectangular array of pixels. n GOP begins with a start code and a headern A picture may be a complete frame of video ("frame picture") or n The header carries one of the interlaced fields from an interlaced source ("field q time code information picture"). q editing informationn A field picture does not have any blank lines between its active q optional user data lines of pixels. n First encoded picture in a GOP is always an I picturen A coded picture (also called a video access unit) begins with a n Typical length is 15 pictures with the following structure (in display order): start code and a header. The header consists of: q I B B P B B P B B P B B P B B à Provides an I picture with sufficient q picture type (I, B, P) frequency to allow a decoder to decode correctly q temporal reference information Forward motion compensation q motion vector search range q optional user datan A frame picture consists of: I B B P B B P B B P B Time q a frame of a progressive source or q a frame (2 spatially interlaced fields) of an interlaced source Bidirectional motion compensation4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 38 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 40
  11. 11. Sequence Transport streamn A sequence begins with a unique 32 bit start code followed by n Transport packets (fixedlength) are formed from a PES stream, a header. including: q The PES headern The header carries: q Transport packet header. q picture size q Successive transport packet’s payloads are filled by the remaining q aspect ratio PES packet content until the PES packet is all used. q frame rate and bit rate q The final transport packet is filled to a fixed length by stuffing with 0xFF bytes (all ones). q optional quantizer matrices n Each PES packet header includes: q required decoder buffer size q An 8-bit stream ID identifying the source of the payload. q chroma pixel structure q Timing references: PTS (presentation time stamp), the time q optional user data at which a decoded audio or video access unit is to ben The sequence information is needed for channel changing. presented by the decodern The sequence length depends on acceptable channel change q DTS (decoding time stamp) the time at which an access unit delay. is decoded by the decoder q ESCR (elementary stream clock reference).4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 41 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 43Packetized Elementary Stream (PES) Intra Frame Coding n Intra coding only concern with information within the currentn Video Elementary Stream (video ES), consists of all frame, (not relative to any other frame in the video sequence) the video data for a sequence, including the sequence n MPEG intra-frame coding block diagram (See bottom Fig) à header and all the subparts of a sequence. Similar to JPEG (àLet’s review JPEG coding mechanism !!)n An ES carries only one type of data (video or audio) n Basic blocks of Intra frame coder: from a single video or audio encoder. q Video filtern A PES, consists of a single ES which has been split q Discrete cosine transform (DCT) into packets, each starting with an added packet q DCT coefficient quantizer header. q Run-length amplitude/variable length coder (VLC)n A PES stream contains only one type of data from one source, e.g. from one video or audio encoder.n PES packets have variable length, not corresponding to the fixed packet length of transport packets, and may be much longer than a transport packet.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 42 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 44
  12. 12. Video Filter MPEG Profiles & levels n MPEG-2 is classified into several profiles.n Human Visual System (HVS) is n Main profile features: q Most sensitive to changes in luminance, q 4:2:0 chroma sampling format q Less sensitive to variations in chrominance. q I, P, and B picturesn MPEG uses the YCbCr color space to represent the q Non-scalable data values instead of RGB, where: n Main Profile is subdivided into levels. q Y is the luminance signal, q MP@ML (Main Profile Main Level): n Designed with CCIR601 standard for interlaced standard digital q Cb is the blue color difference signal, video. q Cr is the red color difference signal. n 720 x 576 (PAL) or 720 x 483 (NTSC)n What is “4:4:4”, “4:2:0”, etc, video format ? n 30 Hz progressive, 60 Hz interlaced n Maximum bit rate is 15 Mbits/s q 4:4:4 is full bandwidth YCbCr video à each macroblock q MP@HL (Main Profile High Level): consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks à n Upper bounds: waste of bandwidth !! n 1152 x 1920, 60Hz progressive q 4:2:0 is most commonly used in MPEG-2 n 80 Mbits/s4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 45 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 47Applications of chroma formats MPEG encoder/ decoderchroma_for Multiplex order (time) Application mat within macroblock4:2:0 ØMain stream television, YYYYCbCr(6 blocks) ØConsumer entertainment. ØStudio production4:2:2 environments YYYYCbCrCbCr(8 blocks) ØProfessional editing equipment,4:4:4 YYYYCbCrCbCrCbCrCbCr ØComputer graphics(12 blocks)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 46 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 48
  13. 13. Prediction DCT and IDCT formulas n Backward prediction is done by n DCT: storing pictures until the desired anchor picture is available before q Eq 1 à Normal form encoding the current stored frames. q Eq 2 à Matrix form n The encoder can decide to use: n IDCT: q Forward prediction from a previous q Eq 3 à Normal form picture, q Eq 4 à Matrix form q Backward prediction from a following picture, n Where: q or Interpolated prediction q F(u,v) = two-dimensional NxN DCT. à to minimize prediction error. q u,v,x,y = 0,1,2,...N-1 n The encoder must transmit pictures in an order differ from that of source q x,y are spatial coordinates in the sample domain. pictures so that the decoder has the anchor pictures before decoding q u,v are frequency coordinates predicted pictures. (See next slide) in the transform domain. n The decoder must have two frame q C(u), C(v) = 1/(square root stored. (2)) for u, v = 0. q C(u), C(v) = 1 otherwise.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 49 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 51I P B Picture Reordering DCT versus DFTn Pictures are coded and decoded in a different order n The DCT is conceptually similar to the DFT, except: than they are displayed. q DCT concentrates energy into lower order coefficientsn à Due to bidirectional prediction for B pictures. better than DFT. q DCT is purely real, the DFT is complex (magnitude andn For example we have a 12 picture long GOP: phase).n Source order and encoder input order: q A DCT operation on a block of pixels produces coefficients q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) that are similar to the frequency domain coefficients B(12) I(13) produced by a DFT operation. n An N-point DCT has the same frequency resolution as a 2N-n Encoding order and order in the coded bitstream: point DFT. q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11) n The N frequencies of a 2N point DFT correspond to N points B(12) on the upper half of the unit circle in the complex frequencyn Decoder output order and display order (same as plane. input): q Assuming a periodic input, the magnitude of the DFT q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) coefficients is spatially invariant (phase of the input does not matter). This is not true for the DCT. B(12) I(13)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 50 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 52
  14. 14. Quantization matrix MPEG scanning n Note à DCT n Left à Zigzag scanning (like JPEG) coefficients are: q Small in the upper left n Right à Alternate scanning à better for interlaced frames ! (low frequencies), q Large in the upper right (high frequencies) à Recall the JPEG mechanism !! n Why ? q HVS is less sensitive to errors in high frequency coefficients than it is for lower frequencies q à higherfrequencies should be more coarsely quantized !!4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 53 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 55Result DCT matrix (example) Huffman/ Run-Level Coding n After adaptive n Huffman coding in combination with Run-Level coding and zig-zag scanning is applied to quantization, the quantized DCT coefficients. result is a matrix n "Run-Level" = A run-length of zeros followed by a containing many non-zero level. zeros. n Huffman coding is also applied to various types of side information. n A Huffman code is an entropy code which is optimally achieves the shortest average possible code word length for a source. n à This average code word length is >= the entropy of the source.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 54 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 56
  15. 15. Huffman/ Run-Level coding illustrated MPEG Data Transport n MPEG packages all data into fixed-size 188-byte packets for transport. Zero MPEG n Using the DCT output n Video or audio payload data placed in PES packets before is broken up Amplitude matrix in previous slide, Run-Length Code Value into fixed length transport packet payloads. N/A 8 (DC Value) 110 1000 after being zigzag n A PES packet may be much longer than a transport packet à Require scanned à the output segmentation: 0 4 00001100 will be a sequence of q The PES header is placed immediately following a transport header 0 4 00001100 number: 4, 4, 2, 2, 2, 1, q Successive portions of the PES packet are then placed in the pay loads of 1, 1, 1, 0 (12 zeros), 1, 0 transport packets. 0 2 01000 (41 zeros) q Remaining space in the final transport packet payload is filled with stuffing 0 2 01000 bytes = 0xFF (all ones). n These values are looked q Each transport packet starts with a sync byte = 0x47. 0 2 01000 up in a fixed table of q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not 0 1 110 variable length codes processed, but is replaced by a different sync symbol especially suited to RF q à The most probable transmission. 0 1 110 occurrence is given a q The transport packet header contains a 13-bit PID (packet ID), which corresponds to a particular elementary stream of video, audio, o r other program 0 1 110 relatively short code, element. 0 1 110 q à The least probable q PID 0x0000 is reserved for transport packets carrying a program association occurrence is given a table (PAT). 12 1 0010 0010 0 relatively long code. q The PAT points to a Program Map Table (PMT)à points to particular elements EOB EOB 10 of a program4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 57 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 59Huffman/ Run-Level coding illustrated (2) MPEG Transport packetn à The first run of 12 zeroes has been efficiently coded by only 9 bitsn à The last run of 41 zeroes has been entirely eliminated, represented only with a 2-bit End Of Block (EOB) indicator.n à The quantized DCT coefficients are now PCR_flag n Adaptation Field: q represented by a sequence of 61 binary bits (See q 8 bits specifying the length of the q OPCR_flag adaptation field. q splicing_point_flag the table). q transport_private_data_flag q The first group of flags consists of eight 1-bit flags: q adaptation_field_extension_flagn Considering that the original 8x8 block of 8-bit q The optionalfields are present if q discontinuity_indicator pixels required 512 bits for full representation, à q random_access_indicator q indicated by one of the preceding flags. The remainder of the adaptation field is the compression rate is approx. 8,4:1. q elementary_stream_priority_in filled with stuffing bytes (0xFF, all ones). dicator4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 58 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 60
  16. 16. Demultiplexing a Transport Stream (TS) Timing - Synchronizationn Demultiplexing a transport stream involves: n The decoder is synchronized with the encoder by time stamps 1. Finding the PAT by selecting packets with PID = 0x0000 n The encoder contains a master oscillator and counter, called the 2. Reading the PIDs for the PMTs System Time Clock (STC). (See previous block diagram.) 3. Reading the PIDs for the elements of a desired program q à The STC belongs to aparticular program and is the master clock of the video and audio encoders for that program. from its PMT (for example, a basic program will have a q à Multiple programs, each with its own STC, can also be PID for audio and a PID for video) multiplexed into a single stream. 4. Detecting packets with the desired PIDs and routing them n A program component can even have no time stamps à but to the decoders can not be synchronized with other components.q A MPEG-2 transport stream can carry: n At encoder input, (Point A), the time of occurrence of an input § Video stream video picture or audio block is noted by sampling the STC. § Audio stream n A total delay of encoder and decoder buffer (constant) is added to STC, creating a Presentation Time Stamp (PTS), § Any type of data q à PTS is then inserted in the first of the packet(s ) representing à MPEG-2 TS is the packet format for CATV downstream that picture or audio block, at Point B. data communication.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 61 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 63Timing & buffer control n Point A: Encoder input Timing – Synchronization (2) à Constant/specifi n Decode Time Stamp (DTS) can optionally combined into the bit edrate stream à represents the time at which the data should be taken n Point B: instantaneously from the decoder buffer and decoded. Encoder outputà q DTS and PTS are identical except in the case of picture reordering for B Variable rate pictures. n Point C: q The DTS is only used where it is needed because of reordering. Encoderbuffer Whenever DTS is used, PTS is also coded. output à Constant rate q PTS (or DTS) inserted interval = 700 m S. n Point D: q In ATSC à PTS (or DTS) must be inserted at the beginning of each Communication coded picture (access unit ). channel + decoderbuffer n In addition, the output of the encoder buffer (Point C) is time à Constant stamped with System Time Clock (STC) values, called: rate q System Clock Reference (SCR) in a Program Stream. n Point E: Decoder input q Program Clock Reference (PCR) in a Transport Stream. à Variable rate n PCR time stamp interval = 100mS. n Point F: Decoderoutput n SCR time stamp interval = 700mS. à n PCR and/or the SCR are used to synchronize the decoder STC Constant/specifi edrate with the encoder STC.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 62 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 64
  17. 17. Timing – Synchronization (3) HDTV (2)n All video and audio streams included in a program must get their n HDTV proposals are for a screen which is wider than the conventional time stamps from a common STC so that synchronization of the TV image by about 33%. It is generally agreed that the HDTV aspect video and audio decoders with each other may be accomplished. ratio will be 16:9, as opposed to the 4:3 ratio of conventional TVn The data rate and packet rate on the channel (at the multiplexer systems. This ratio has been chosen because psychological tests have output) can be completely asynchronous with the System Time shown that it best matches the human visual field. Clock (STC) n It also enables use of existing cinema film formats as additional source material, since this is the same aspect ratio used in normal 35 mm film.n PCR time stamps allows synchronizations of different Figure 16.6(a) shows how the aspect ratio of HDTV compares with that multiplexed programs having different STCs while allowing STC of conventional television, using the same resolution, or the same recovery for each program. surface area as the comparison metric.n If there is no buffer underflow or overflow à delays in the buffers n To achieve the improved resolution the video image used in HDTV and transmission channel for both video and audio are must contain over 1000 lines, as opposed to the 525 and 625 provided constant. by the existing NTSC and PAL systems. This gives a much improved vertical resolution. The exact value is chosen to be a simple multiple ofn The encoder input and decoder output run at equal and constant one or both of the vertical resolutions used in conventional TV. rates . n However, due to the higher scan rates the bandwidth requirement forn Fixedend-to-end delay from encoder input to decoder output analogue HDTV is approximately 12 MHz, compared to the nominal 6n If exact synchronization is not required, the decoder clock can be MHz of conventional TV free running à video frames can be repeated / skipped as necessary to prevent buffer underflow / overflow , respectively.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 65 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 67HDTV (High definition television) HDTV (3)n High definition television (HDTV) first came to n The introduction of a non-compatible TV transmission format for HDTV would require the viewer either to buy a new receiver, or to public attention in 1981, when NHK, the buy a converter to receive the picture on their old set. Japanese broadcasting authority, first n The initial thrust in Japan was towards an HDTV format which is compatible with conventional TV standards, and which can be demonstrated it in the United States. received by conventional receivers, with conventional quality. However, to get the full benefit of HDTV, a new wide screen, highn HDTV is defined by the ITU-R as: resolution receiver has to be purchased. q A system designed to allow viewing at about n One of the principal reasons that HDTV is not already common is three times the picture height, such that the that a general standard has not yet been agreed. The 26th CCIR plenary assembly recommended the adoption of a single, worldwide system is virtually, or nearly, transparent to the standard for high definition television. quality or portrayal that would have been n Unfortunately, Japan, Europe and North America are all investing perceived in the original scene ... by a discerning significant time and money in their own systems based on their own, viewer with normal visual acuity. current, conventional TV standards and other national considerations.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 66 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 68
  18. 18. H261- H263 H261-H263 (3)n The H.261 algorithm was developed for the purpose of image n H.261 is widely used on 176x 144 pixel images. transmission rather than image storage. n The ability to select a range of output rates for the algorithmn It is designed to produce a constant output of p x 64 kbivs, where allows it to be used in different applications. p is an integer in the range 1 to 30. q This allows transmission over a digital network or data link of n Low output rates ( p = 1 or 2) are only suitable for face-to-face varying capacity. (videophone) communication. H.261 is thus the standard used in q It also allows transmission over a single 64 kbit/s digital many commercial videophone systems such as the UK telephone channel for low quality video-telephony, or at higher bit BT/Marconi Relate 2000 and the US ATT 2500 products. rates for improved picture quality. n Video-conferencing would require a greater output data rate ( p >n The basic coding algorithm is similar to that of MPEG in that it is 6) and might go as high as 2 Mbit/s for high quality transmission a hybrid of motion compensation, DCT and straightforward with larger image sizes. DPCM (intra-frame coding mode), without the MPEG I, P, B frames. n A further development of H.261 is H.263 for lower fixedn The DCT operation is performed at a low level on 8 x 8 blocks of transmission rates. error samples from the predicted luminance pixel values, with n This deploys arithmetic coding in place of the variable length sub-sampled blocks of chrominance data. coding (See H261 diagram), with other modifications, the data rate is reduced to only 20 kbit/s.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 69 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 71H261-H263 (2) Model Based Coding (MBC) n At the very low bit rates (20 kbit/s or less) associated with video telephony, the requirements for image transmission stretch the compression techniques described earlier to their limits. n In order to achieve the necessary degree of compression they often require reduction in spatial resolution or even the elimination of frames from the sequence. n Model based coding (MBC) attempts to exploit a greater degree of redundancy in images than current techniques, in order to achieve significant image compression but without adversely degrading the image content information. n It relies upon the fact that the image quality is largely subjective. Providing that the appearance of scenes within an observed image is kept at a visually acceptable level, it may not matter that the observed image is not a precise reproduction of reality.4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 70 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 72
  19. 19. Model Based Coding (2) Model based coding (4) n A synthetic image is created by texture mapping detail from an n One MBC method for producing an artificial image of a head sequence initial full-face source image, over the wire-frame, Facial utilizes a feature codebook where a range of facial expressions, sufficient to create an animation, are generated from sub-images or movement can be achieved by manipulation of the vertices of the templates which are joined together to form a complete face. wire-frame. n The most important areas of a face, for conveying an expression, are n Head rotation requires the use of simple matrix operations upon the eyes and mouth, hence the objective is to create an image in which the coordinate array. Facial expression requires the manipulation the movement of the eyes and mouth is a convincing approximation to of the features controlling the vertices. the movements of the original subject. n This model based feature codebook approach suffers from the n When forming the synthetic image, the feature template vectors which drawback of codebook formation. form the closest match to those of the original moving sequence are selected from the codebook and then transmitted as low bit rate coded n This has to be done off-line and, consequently, the image is addresses. required to be prerecorded, with a consequent delay. n By using only 10 eye and 10 mouth templates, for instance, a total of n However, the actual image sequence can be sent at a very low 100 combinations exists implying that only a 6 -bit codebook address data rate. For a codebook with 128 entries where 7 bits are need be transmitted. required to code each mouth, a 25 frameh sequence requires n It has been found that there are only 13 visually distinct mouth shapes less than 200 bit/s to code the mouth movements. for vowel and consonant formation during speech. n When it is finally implemented, rates as low as 1 kbit/s are n However, the number of mouth sub-images is usually increased, to confidently expected from MBC systems, but they can only include intermediate expressions and hence avoid step changes in the transmit image sequences which match the stored model, e.g. image. head and shoulders displays. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 73 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 75 Model Based Coding (3) Key points:n Another common way of representing objects in three- dimensional computer graphics is by a net of n JPEG coding mechanism à DCT/ Zigzag Scanning/ Adaptive interconnecting polygons. Quantization / VLCn A model is stored as a set of linked arrays which specify n MPEG layered structure: the coordinates of each polygon vertex, with the lines q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice, connecting the vertices together forming each side of a Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream polygon. (PES)n To make realistic models, the polygon net can be shaded to reflect the presence of light sources. n MPEG compression mechanism:n The wire-frame model [Welch 19911 can be modified to q Prediction fit the shape of a persons head and shoulders. The q Motion compensation wire-frame, composed of over 100 interconnecting q Scanning triangles, can produce subjectively acceptable synthetic q YCbCr formats (4:4:4, 4:2:0, etc) images, providing that the frame is not rotated by more than 30" from the full -face position. q Profiles @ Leveln The model, (see the Figure) uses smaller triangles in q I,P,B pictures & reordering areas associated with high degrees of curvature where q Encoder/ Decoder process & Block diagram significant movement is required. n MPEG Data transportn Large flat areas, such as the forehead, contain fewer n MPEG Timing & Buffer control triangles. q STC/SCR/DTSn A second wire-frame is used to model the mouth interior. q PCR/PTS 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 74 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 76
  20. 20. Technical terms A Brief History:n Macro blocksn HVS = Human Visual System q CATV appeared in the 60s in the US, where highn GOP = Group of Pictures buildings are the great obstacles for then VLC = Variable Length Coding/Coder propagation of TV signal.n IDCT/DCT = (Inverse) Discrete Cosine Transformn PES = Packetized ElementaryStream q Old CATV networks àn MP@ML = Main profile @ Main Level n Coaxial onlyn PCR = Program Clock Reference n Tree-and-Branch onlyn SCR = System Clock Reference n TV onlyn STC = System Time Clockn PTS = Presentation Time Stamp n No return path (à high-pass filters are installed inn DTS = Decode Time Stamp customer’s houses to block return low frequency noise)n PAT = Program Association Tablen PMT = Program Map Table4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 77 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 79Chapter 3. CATV systems Modern CATV networks n Key elements: q CO orn Overview: Master Headend q A brief history q Headends/ Hub q Modern CATV networks q Server complex q CMTS q CATV systems and equipments q TV content provider q Optical Nodes q Taps q Amplifiers (GNA/TNA/L E)4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 78 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 80
  21. 21. Modern CATV networks (2) CATV systems and equipmentsn Based on Hybrid Fiber-Coaxial architecture à also referred to as “HFC networks”n The optical section is based on modern optical communication technologies à q Star/ring/mesh, etc topologies q SDH/SONET for digital fibers q Various architectures à digital, analog or mixed fiber cabling systems.n Part of forward path spectrum is used for high-speed Internet accessn Return path is exploited for Digital data communication à the root of new problems !! q 5-60 MHz band for upstream q 88-860 MHz band for downstream n 88-450 MHz for analog/digital TV channels n 450-860 MHz for Internet access q FDM4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 81 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 83Spectrum allocation of CATV networks Vocabulary n Perception = Su nhan thuc n Lap = Phu len4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 82 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 84