### SlideShare for iOS

by Linkedin Corporation

FREE - On the App Store

- Total Views
- 6,798
- Views on SlideShare
- 6,798
- Embed Views

- Likes
- 0
- Downloads
- 235
- Comments
- 0

No embeds

Uploaded via SlideShare as Adobe PDF

© All Rights Reserved

- 1. Source Coding Wireless Ad Hoc Networks University of Tehran, Dept. of E&CE, Fall 2007, Farshad Lahouti Media Basics Contents: Brief introduction to digital media (Audio/Video) Digitization Compression Representation Standards 1
- 2. Signal Digitization Pulse Code Modulation (PCM) Sampling Sampling theory – Nyquist theorem the discrete time sequence of a sampled continuous function { V(tn) } contains enough information to reproduce the function V=V(t) exactly provided that the sampling rate is at least twice that of the highest frequency contained in the original signal V(t) Analog signal sampled at a constant rate telephone: 4 kHz signal BW, 8,000 samples/sec CD music: 22 kHz signal BW, 44,100 samples/sec 2
- 3. Quantization Discretization along energy axis Every time interval the signal is converted to a digital equivalent Using 2 bits the following signal can be digitized Digitization Examples Each sample quantized, Example: 8,000 samples/sec, i.e., rounded 256 quantized values --> 64,000 bps e.g., 28 possible quantized values Receiver converts it back to analog signal: Each quantized value some quality reduction represented by bits Example rates 8 bits for 256 values CD: 1.411 Mbps – 16 bits/sample stereo Internet telephony: 5.3 - 13 kbps MP3: 96, 128, 160 kbps 3
- 4. Approximate size for 1 second audio Channels Resolution Fs File Size Mono 8bit 8Khz 64Kb Stereo 8bit 8Khz 128Kb Mono 16bit 8Khz 128Kb Stereo 16bit 16Khz 256Kb Stereo 16bit 44.1Khz 1441Kb* Stereo 24bit 44.1Khz 2116Kb 1CD 700M 70-80 mins Lossy and lossless Compression Lossless compression (more later) Data Compression APE (MonkeyAudio) Image compression for biomedical applications … Lossy compression Hide errors where humans will not see or hear it Study hearing and vision system to understand how we see/hear Perceptual Coding 4
- 5. Requirements for Compression Algorithms Lossless compression Decoded signal is mathematically equivalent to the original one Drawback : achieves only a small or modest level of compression Lossy compression Decoded signal is of a lower quality than the original one Advantage: achieves very high degree of compression Objective: maximize the degree of compression with a certain quality General compression requirements Ensure a good quality of decoded signal Achieve high compression ratios Minimize the complexity of the encoding and decoding process Support multiple channels Support various data rates Give small delay in processing Compression Tools Transform Coding Variable Rate Coding Entropy Coding Huffman Coding Run-length Coding Predictive Coding DPCM ADPCM 5
- 6. Variable Length Coding Ignores semantics of input data and compresses media streams by regarding them as sequences of digits or symbols Examples: run length encoding, Huffman encoding , ... - Run-length encoding A compression technique that replaces consecutive occurrences of a symbol with the symbol followed by the number of times it is repeated a a a a a => 5a 000000000000000000001111111 => 0x20 1x7 Most useful where symbols appear in long runs: e.g., for images that have areas where the pixels all have the same value, fax and cartoons for examples. Entropy coding A few words about Entropy Entropy A measure of information content Entropy of the English Language How much information does each character in “typical” English text contain? From a probability view If the probability of a binary event is 0.5 (like a coin), then, on average, you need one bit to represent the result of this event. As the probability of a binary event The figure is expressing that unless an increases or decreases, the number of bits event is totally random, you can convey you need, on average, to represent the the information of the event in fewer bits, result decreases on average, than it might first appear 6
- 7. Entropy (Shannon 1948) For a set of messages S with probability p(s), s ∈S, the self information of s is: 1 i ( s) = log = − log p( s) p ( s) measured in bits if the log is base 2. The lower the probability, the higher the self-information Entropy is the weighted average of self information. 1 H ( S ) = ∑ p( s) log s∈S p( s) Entropy Example p(S ) = {0.25, 0.25, 0.25, 0.125, 0.125} H (S ) = 3 × 0.25 log 4 + 2 × 0.125 log 8 = 2.25 p(S ) = {0.5, 0.125, 0.125, 0.125, 0.125} H (S ) = 0.5 log 2 + 4 × 0.125 log 8 = 2 p(S ) = {0.75, 0.0625, 0.0625, 0.0625, 0.0625} H (S ) = 0.75 log(4 / 3) + 4 × 0.0625 log 16 = 1.3 7
- 8. Statistical (Entropy) Coding Entropy Coding • Lossless coding • Takes advantage of the probabilistic nature of information • Example: Huffman coding, arithmetic coding Theorem (Shannon) (lower bound): For any probability distribution p(S) with associated uniquely decodable code C, H ( S ) ≤ la (C ) Recall Huffman coding… Huffman Coding A popular compression technique that assigns variable length codes to symbols, so that the most frequently occurring symbols have the shortest codes Huffman coding is particularly effective where the data are dominated by a small number of symbols Suppose to encode a source of N =8 symbols: {a,b,c,d,e,f,g,h} The probabilities of these symbols are: P(a) = 0.01, P(b)=0.02, P(c)=0.05, P(d)=0.09, P(e)=0.18, P(f)=0.2, P(g)=0.2, P(h)=0.25 If we assign 3 bits per symbol (N =2^3=8), the average length of the symbols is: The theoretical lowest average length – entropy H(P) = - ∑ iN=0 P(i)log2P(i) = 2.57 bits /symbol If we use Huffman encoding, the average length = 2.63 bits/symbol 8
- 9. Huffman Coding (Cont’d) The Huffman code assignment procedure is based on a binary tree structure. This tree is developed by a sequence of pairing operations in which the two least probable symbols are joined at a node to form two branches of a tree. More precisely: 1. The list of probabilities of the source symbols are associated with the leaves of a binary tree. 2. Take the two smallest probabilities in the list and generate an intermediate node as their parent and label the branch from parent to one of the child nodes 1 and the branch from parent to the other child 0. 3. Replace the probabilities and associated nodes in the list by the single new intermediate node with the sum of the two probabilities. If the list contains only one element, quit. Otherwise, go to step 2. Huffman Coding (Cont’d) 9
- 10. Huffman Coding (Cont’d) The new average length of the source is The efficiency of this code is How do we estimate the P(i) ? Relative frequency of the symbols How to decode the bit stream ? Share the same Huffman table How to decode the variable length codes ? Prefix codes have the property that no codeword can be the prefix (i.e., an initial segment) of any other codeword. Huffman codes are prefix codes ! 11010000000010001 => ? Does the best possible codes guarantee to always reduce the size of sources? No. Worst case exists. Huffman coding is better averagely. Huffman coding is particularly effective where the data are dominated by a small number of symbols Transform Coding Frequency analysis ? Time domain ? Not easy! Time domain -> Transform domain Sequence to be coded is converted into new sequence using a transformation rule. New sequence - transform coefficients. Process is reversible - get back to original sequence using inverse transformation. Example - Fourier transform (FT) Coefficients represent proportion of energy contributed by different frequencies. 10
- 11. Transform Coding (Cont…) In transform coding - choose transformation such that only subset of coefficients have significant values. Energy confined to subset of ‘important’ coefficients. Known as ‘energy compaction’. Example - FT of bandlimited signal: Differential Coding – DPCM & ADPCM Based on the fact that neighboring samples … x(n-1), x(n), x(n+1), … in a discrete time sequence changes slowly in many applications, e.g., voice, audio, … A differential PCM coder (DPCM) quantizes and encodes the difference d(n) = x(n) – x(n-1) Advantage of using difference d(n) instead of the actual value x(n) Reduce the number of bits to represent a sample General DPCM: d(n) = x(n) – a1x(n-1) - a2x(n-2) -…- akx(n-k) a1, a2, …ak are fixed Adaptive DPCM: a1, a2, …ak are dynamically changed with signal 11
- 12. Psychoacoustic Human aural response Psychoacoustic Model Basically: If you can’t hear the sound, don’t encode it Natural Bandlimiting Audio perception is 20-20 kHz but most sounds in low frequencies (e.g., 2 kHz to 4 kHz) Human frequency response: Frequency masking: If a stronger sound and weaker sound compete, you can’t hear the weaker sound. Don’t encode it. Temporal masking: After a loud sound, there’s a while before we can hear a soft sound. Stereo redundancy: At low frequencies, we can’t detect where the sound is coming from. Encode it mono. 12
- 13. Perceptual Coding: Examples MP3 = MPEG 1/2 layer 3 audio; achieves CD quality in about 192 kbps (a 3.7:1 compression ratio): higher compression possible Sony MiniDisc uses Adaptive Transform Coding (ATRAC) to achieve a 5:1 compression ratio (about 141 kbps) http://www.mpeg.org http://www.minidisc.org/aes_atrac.html Artefacts of compression Some areas of the spectrum are lost in the encoding process MP3 encoded recordings rarely sound identical to original uncompressed audio files On small or PC speakers, however, MP3 compressed audio can be acceptable 13
- 14. Examples (1.12MB) 128kbps (105KB) 96Kbps(78.9KB) 64kbps (52.6KB) WAV File (34Mb) 14
- 15. Mp3 file (3Mb) LPC and Parametric Coding LPC and Parametric Coding LPC (Linear Predictive Coding) Based on the human utterance organ model s(n) = a1s(n-1) + a2s(n-2) +…+ aks(n-k) + e(n) Estimate a1, a2, …ak and e(n) for each piece (frame) of speech Encode and transmit/store a1, a2, …ak and type of e(n) Decoder reproduce speech using a1, a2, …ak and e(n) - very low bit rate but relatively low speech quality Parametric coding: Only coding parameters of sound generation model LPC is an example where parameters are a1, a2, …ak , e(n) Music instrument parameters: pitch, loudness, timbre, … 15
- 16. Speech Compression Speech Compression Handling speech with other media information such as text, images, video, and data is the essential part of multimedia applications The ideal speech coder has a low bit-rate, high perceived quality, low signal delay, and low complexity. Delay Less than 150 ms one way end to end delay for a - - - conversation Processing (coding) delay, network delay Over Internet, ISDN, PSTN, ATM, … Complexity Computational complexity of speech coders depends on algorithms Contributes to achievable bit rate and processing delay - G.72x Speech Coding Standards G.72x Speech Coding Standards Quality “intelligible” - >“natural” or “subjective” quality Depending on bit rate - Bit-rate 16
- 17. G.72x Audio Coding Standards G.72x Audio Coding Standards Silence Compression - detect the "silence", similar to run-length coding Adaptive Differential Pulse Code Modulation (ADPCM) e.g., in CCITT G.721 -- 16 or 32 Kb/s. (a) Encodes the difference between two or more consecutive signals; the difference is then quantized- - > hence the loss (b) Adapts at quantization so fewer bits are used when the value is smaller. It is necessary to predict where the waveform is headed- - >difficult Linear Predictive Coding (LPC) fits signal to speech model and then transmits parameters of model --> sounds like a computer talking, 2.4 Kb/s. Video Digitization and Compression Video is sequence of images (frames) displayed at constant frame rate e.g. 24 images/sec Digital image is a 2-D array of pixels Each pixel represented by bits R:G:B Y:U:V Y = 0.299R + 0.587G + 0.114B (Luminance or Brightness) U = B - Y (Chrominance 1, color difference) V = R - Y (Chrominance 2, color difference) Redundancy spatial Temporal 17
- 18. Intra-frame coding Transform Quantize Encode JPEG (Joint Photographic Experts Group) Original size 640x480x3=922KB JPEG Compression Ratios: 30:1 to 50:1 compression is possible with small to moderate defects 100:1 compression is quite feasible for low-quality purposes JPEG Steps 1 Block Preparation: From RGB to YUV (YIQ) planes 8x8 blocks 2 Transform: 2-D Discrete Cosine Transform (DCT) on blocks (lossy?) 3 Quantization: Quantize DCT Coefficients (lossy) 4 Encoding of Quantized Coefficients (lossless) Zigzag Scan Differential Pulse Code Modulation (DPCM) on DC component Run Length Encoding (RLE) on AC Components Entropy Coding: Huffman or Arithmetic 18
- 19. JPEG Transform Quantize Encode Block Preparation Transform Quantize Decompression: Encode Reverse the order (1) Block Preparation RGB Input Data After Block Preparation Input image: 640 x 480 RGB (24 bits/pixel) transformed to three planes: Y: (640 x 480, 8-bit/pixel) Luminance (brightness) plane. U, V: (320 X 240 8-bits/pixel) Chrominance (color) planes. 19
- 20. (2) Discrete Cosine Transform (DCT) A transformation from spatial domain to frequency domain (similar to FFT) Definition of 8-point DCT: F[0,0] is the DC component and other F[u,v] define AC components of DCT The 64 (8 x 8) DCT Basis Functions u DC Component v Block-based 2-D DCT •Karhunen-Loeve (KL) transform ? 20
- 21. 8x8 DCT Example or v or u DC Component Original values of an 8x8 block Corresponding DCT coefficients (in spatial domain) (in frequency domain) (3) Quantized q(u,v) DCT Coefficients Uniform quantization: Divide by constant N and round result. In JPEG, each DCT F[u,v] is divided by a constant q(u,v). - quantization table (filter ?) F[u,v] Rounded F[u,v]/ q(u,v) 21
- 22. (4) Zigzag Scan Maps an 8x8 block into a 1 x 64 vector Zigzag pattern group low frequency coefficients in top of vector. (5) Encoding of Quantized DCT Coefficients DC Components: DC component of a block is large and varied, but often close to the DC value of the previous block. Encode the difference of DC component from previous 8x8 blocks using Differential Pulse Code Modulation (DPCM). AC components: The 1x64 vector has lots of zeros in it. Using RLE, encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component. Send (0,0) as end-of-block value. 22
- 23. (6) Runlength Coding A typical 8x8 block of quantized DCT coefficients. Most of the higher order coefficients have been quantized to 0. 12 34 0 54 0 0 0 0 87 0 0 12 3 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zig-zag scan: the sequence of DCT coefficients to be transmitted: 12 34 87 16 0 0 54 0 0 0 0 0 0 12 0 0 3 0 0 0 ..... DC coefficient (12) is sent via a separate Huffman table. Runlength coding remaining coefficients: 34 | 87 | 16 | 0 0 54 | 0 0 0 0 0 0 12 | 0 0 3 | 0 0 0 ..... Further compression: statistical (entropy) coding Quantization Table Used Compressed Image JPEG Compression Ratio: 7.7 Example Compression Ratio: 12.3 Original Image Compression Ratio: 33.9 Blocking artifact (JPEG 2000 ?) Compression Ratio: 60.1 23
- 24. MPEG: Inter-Frame Coding Predicted Intra-coded P-frame I-frame Motion Estimation + Compesentation 24
- 25. Video compression: A big picture Bi-Directional Prediction Intra-Coded I-Frame Bi-directional I B B P B B P B B P B B I Predicted B-Frame Group of frames (GOF) Q: 3D Transform Coding ? 25
- 26. VBR vs CBR: Rate Control Variable-Bit-Rate Rate Controller Fixed quantizer Qp Qp “Constant” quality CBR Raw Video VBR Smoothing E.g. RMVB Encoder Buffer Constant-Bit-Rate Adaptive quanitzer “Constant” rate – easier control Difference (compared to target rate can be 0.5% or less) E.g. RM, MPEG-1 Rate-distortion optimization Recall that transport layer also has rate control … Standardization Organizations ITU-T VCEG (Video Coding Experts Group) standards for advanced moving image coding methods appropriate for conversational and non-conversational audio/visual applications. ISO/IEC MPEG (Moving Picture Experts Group) standards for compression and coding, decompression, processing, and coded representation of moving pictures, audio, and their combination WG - work group Relation SG – sub group ITU-T H.262~ISO/IEC 13818-2(mpeg2) ISO/IEC JTC 1/SC 29/WG 1 Generic Coding of Moving Pictures and Coding of Still Pictures Associated Audio. ISO/IEC JTC 1/SC 29/WG 11 ITU-T H.263~ISO/IEC 14496-2(mpeg4) 26
- 27. Coding Rate and Standards Mobile Videophone ISDN Video CD Digital TV HDTV videophone over PSTN videophone 8 16 64 384 1.5 5 20 kbit/s Mbit/s Very low bitrate Low bitrate Medium bitrate High bitrate MPEG-4 H.263 H.261 MPEG-1 MPEG-2 ISO MPEG-1 (Moving Pictures Experts Group). MPEG-1 Progressively scanned video for multimedia applications, at a bit rate 1.5Mb/s access time for CD-ROM players. Video format: near VHS quality 27
- 28. ISO MPEG-2 MPEG-2 Standard for Digital Television, DVD 4 to 8 Mb/s / 10 to 15 Mb/s >> MPEG -1 Supports various modes of scalability (Spatial, temporal, SNR) There are differences in quantization and better Variable length codes tables for progressive video sequences. ISO MPEG-4 A much broader standard. MPEG-4 was aimed primarily at low bit rate video communication, but not limited to Applications: 1. Digital television 2. Interactive graphics applications 3. Interactive multimedia (World Wide Web) Two version: Divx 3 and Divx 4 (Internet world) Important concept Video object 28
- 29. MPEG-4 Object Video Instead of ”frames”: Video Object Planes Shape Adaptive DCT A video frame Alpha map VOP SA DCT Background VOP VOP MPEG-4 Structure A/V Decoder object Compositor A/V Decoder object Bitstream Audio/Video scene MUX A/V Decoder object 29
- 30. Example Object 3 Object 1 Object 4 Object 2 Problems, comments? Another Example 30
- 31. Status Microsoft, RealVideo, QuickTime, ... But only recentagular frame based H.264 = MPEG-4 part 10 (2003) Shape coding Synthetic scene H.264 H.26x (x=1,2,3) ITU-T Recommendations Real time video communication applications. MPEG Standards Video storage, broadcast video, video streaming applications H.26 L = ITU-T + MPEG = JVT coding Latest project of Joint Video Team formed by ITU-T SG16 Q6 ( VCEG) and the ISO/IEC JTC 1/SC 29 WG 11 ( MPEG ) Basic configuration similar to H.263 and MPEG-4 Part 2 31
- 32. H.264 Design Goals Enhanced Compression performance Provision of network friendly packet based video representation addressing the conversational and non- conversational applications Conceptual Separation between Video Coding Layer ( VCL) and Network Adaptation Layer ( NAL) H.264 Design ( Contd. ) Video Coding Layer Control Data Macro-block Data Partitioning Slice/Partition Network Adaptation Layer 32
- 33. H.264 Design ( Contd.) Video Coding Layer Core High compression representation Block based motion compensated transform video coder New features enabled to achieve significant improvement in coding efficiency. Network Adaptation Layer Provides the ability to customize the format of the VCL data over a variety of networks Unique packet based interface Packetisation and appropriate signaling is a part of NAL specification Video Coding Evolution H.264 Y. Wang, J. Ostermann, Y.-Q. Zhang, Digital Video Processing and Communication. Prentice Hall, 2001. 33

Full NameComment goes here.