Lec5 Compression


Published on

Published in: Technology, Art & Photos
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lec5 Compression

  1. 1. Compression For sending and storing information Text, audio, images, videos
  2. 2. Common Applications <ul><li>Text compression </li></ul><ul><ul><li>loss-less, gzip uses Lempel-Ziv coding, 3:1 compression </li></ul></ul><ul><ul><li>better than Huffman </li></ul></ul><ul><li>Audio compression </li></ul><ul><ul><li>lossy, mpeg 3:1 to 24:1 compression </li></ul></ul><ul><ul><li>MPEG = motion picture expert group </li></ul></ul><ul><li>Image compression </li></ul><ul><ul><li>lossy, jpeg 3:1 compression </li></ul></ul><ul><ul><li>JPEG = Joint photographic expert group </li></ul></ul><ul><li>Video compression </li></ul><ul><ul><li>lossy, mpeg 27:1 compression </li></ul></ul>
  3. 3. Text Compression <ul><li>Prefix code: one, of many, approaches </li></ul><ul><ul><li>no code is prefix of any other code </li></ul></ul><ul><ul><li>constraint: loss-less </li></ul></ul><ul><ul><li>tasks </li></ul></ul><ul><ul><ul><li>encode: text (string) -> code </li></ul></ul></ul><ul><ul><ul><li>decode: code --> text </li></ul></ul></ul><ul><ul><li>main goal: maximally reduce storage, measured by compression ratio </li></ul></ul><ul><ul><li>minor goals: </li></ul></ul><ul><ul><ul><li>simplicity </li></ul></ul></ul><ul><ul><ul><li>efficiency: time and space </li></ul></ul></ul><ul><ul><ul><ul><li>some require code dictionary or 2 passes of data </li></ul></ul></ul></ul>
  4. 4. Simplest Text Encoding <ul><li>Run-length encoding </li></ul><ul><li>Requires special character, say @ </li></ul><ul><li>Example Source: </li></ul><ul><ul><li>ACCCTGGGGGAAAACCCCCC </li></ul></ul><ul><li>Encoding: </li></ul><ul><ul><li>A@C3T@G5@4A@C6 </li></ul></ul><ul><li>Method </li></ul><ul><ul><li>any 3 or more characters are replace by @char# </li></ul></ul><ul><li>+: simple </li></ul><ul><li>-: special characters, non-optimal </li></ul>
  5. 5. Shannon’s Information theory (1948) How well can we encode? <ul><li>Shannon’s goal: reduce size of messages for improved communication </li></ul><ul><li>What messages would be easiest/hardest to send? </li></ul><ul><ul><li>Random bits hardest - no redundancy or pattern </li></ul></ul><ul><li>Formal definition: S, a set of symbols si </li></ul><ul><li>Information content of S = -sum pi*log(pi) </li></ul><ul><ul><li>measure of randomness </li></ul></ul><ul><ul><li>more random, less predictable, higher information content! </li></ul></ul><ul><li>Theorem: only measure with several natural properties </li></ul><ul><li>Information is not knowledge </li></ul><ul><li>Compression relies on finding regularities or redundancies. </li></ul>
  6. 6. Example <ul><li>Send ACTG each occurring 1/4 of the time </li></ul><ul><li>Code: A--00, C--01, T--10, G--11 </li></ul><ul><li>2 bits per letters: no surprise </li></ul><ul><li>Average message length: </li></ul><ul><ul><li>prob(A)*codelength(A)+prob(B)*codelength(B) +… </li></ul></ul><ul><ul><li>1/4*2+…. = 2 bits. </li></ul></ul><ul><li>Now suppose: </li></ul><ul><ul><li>prob(A) = 13/16 and other 1/16 </li></ul></ul><ul><ul><li>Codes: A - 1; C-00, G-010, T-011 (prefix) </li></ul></ul><ul><ul><li>13/16*1+ 1/16*2+ 1/16*3+1/16*3=21/16 = 1.3+ </li></ul></ul><ul><li>What is best result? Part of the answer: </li></ul><ul><li>The information content! But how to get it? </li></ul>
  7. 7. Understanding Entropy/Information
  8. 8. The Shannon-Fano Algorithm <ul><li>Earliest algorithm: Heuristic divide and conquer </li></ul><ul><li>Illustration: source text with only letters ABCDE </li></ul><ul><li>Symbol A B C D E </li></ul><ul><li>---------------------------------- </li></ul><ul><li>Count 15 7 6 6 5 </li></ul><ul><li>Intuition: frequent letters get short codes </li></ul><ul><li>1. Sort symbols according to their frequencies/probabilities, i.e. ABCDE. </li></ul><ul><li>2. Recursively divide into two parts, each with approx. same number of counts. </li></ul><ul><li>This is instance of “balancing” which is NP-complete. </li></ul><ul><li>Note: variable length codes. </li></ul>
  9. 9. Shannon-Fano Tree 0 1
  10. 10. Result for this distribution <ul><li>Symbol Count -log(1/p) Code (# of bits) </li></ul><ul><li>------ ----- -------- --------- -------------------- </li></ul><ul><li>A 15 1.38 00 30 </li></ul><ul><li>B 7 2.48 01 14 </li></ul><ul><li>C 6 2.70 10 12 </li></ul><ul><li>D 6 2.70 110 18 </li></ul><ul><li>E 5 2.96 111 15 </li></ul><ul><li>TOTAL (# of bits): 89 </li></ul><ul><li>average message length = 89/39=2.3 </li></ul><ul><li>Note: Prefix property for decoding </li></ul><ul><li>Can you do better? </li></ul><ul><li>Theoretical optimum = -sum pi*log(pi) = entropy </li></ul>
  11. 11. Code Tree Method/Analysis <ul><li>Binary tree method </li></ul><ul><li>Internal nodes have left/right references: </li></ul><ul><ul><li>0 means go to the left </li></ul></ul><ul><ul><li>1 means go to the right </li></ul></ul><ul><li>Leaf nodes store the value </li></ul><ul><li>Decode time-cost is O(logN) </li></ul><ul><li>Decode space-cost is O(N) </li></ul><ul><ul><li>quick argument: number of leaves > number of internal nodes. </li></ul></ul><ul><ul><li>Proof: induction on ….. </li></ul></ul><ul><ul><ul><li>number of internal nodes. </li></ul></ul></ul><ul><li>Prefix Property: each prefix uniquely defines char. </li></ul>
  12. 12. Code Encode(character) <ul><li>Again can use binary prefix tree </li></ul><ul><li>For encode and decode could use hashing </li></ul><ul><ul><li>yields O(1) encode/decode time </li></ul></ul><ul><ul><li>O(N) space cost ( N is size of alphabet) </li></ul></ul><ul><li>For compression, main goal is reducing storage size </li></ul><ul><ul><li>in example it’s the total number of bits </li></ul></ul><ul><ul><li>code size for single character = depth of tree </li></ul></ul><ul><ul><li>code size for document = sum of (frequency of char * depth of character) </li></ul></ul><ul><ul><li>different trees yield different storage efficiency </li></ul></ul><ul><ul><li>What’s the best tree? </li></ul></ul>
  13. 13. Huffman Code <ul><li>Provably optimal: i.e. yields minimum storage cost </li></ul><ul><li>Algorithm: CodeTree huff(document) </li></ul><ul><ul><li>1. Compute the frequency and a leaf node for each char </li></ul></ul><ul><ul><ul><li>leaf node has countfield and character </li></ul></ul></ul><ul><ul><li>2. Remove the 2 nodes with least counts and create a new node with count equal to the sum of counts and sons, the removed nodes. </li></ul></ul><ul><ul><ul><li>internal node has 2 node ptrs and count field </li></ul></ul></ul><ul><ul><li>3. Repeat 2 until only 1 node left. </li></ul></ul><ul><ul><li>4. That’s it! </li></ul></ul>
  14. 14. Bad code example
  15. 15. Tree, a la Huffman
  16. 16. Tree with codes: note Prefix property
  17. 17. Tree Cost
  18. 18. Analysis <ul><li>Intuition: least frequent chars get longest codes or most frequent chars get shortest codes. </li></ul><ul><li>Let T be a minimal code tree. (Induction) </li></ul><ul><ul><li>All nodes have 2 sons. (by construction) </li></ul></ul><ul><ul><li>Lemma: if c1 and c2 be least frequently used then they are at the deepest depth </li></ul></ul><ul><ul><ul><li>Proof: </li></ul></ul></ul><ul><ul><ul><ul><li>if not deepest nodes, exchange and total cost (number of bits) goes down </li></ul></ul></ul></ul>
  19. 19. Analysis (continued) <ul><li>Sk : Huffman algorithm on k chars produces optimal code. </li></ul><ul><ul><li>S2: obvious </li></ul></ul><ul><ul><li>Sk => Sk+1 </li></ul></ul><ul><ul><ul><li>Let T be optimal code on k+1 chars </li></ul></ul></ul><ul><ul><ul><li>By lemma, two least freq chars are deepest </li></ul></ul></ul><ul><ul><ul><li>Replace two least freq char by new char with freq equal to sum </li></ul></ul></ul><ul><ul><ul><li>Now have tree with k nodes </li></ul></ul></ul><ul><ul><ul><li>By induction, Huffman yields optimal tree. </li></ul></ul></ul>
  20. 20. Lempel-Ziv <ul><li>Input: string of characters </li></ul><ul><li>Internal: dictionary of (codewords, words) </li></ul><ul><li>Output: string of codewords and characters. </li></ul><ul><li>Codewords are distinct from characters. </li></ul><ul><li>In algorithm, w is a string, c is character and w+c means concatenation. </li></ul><ul><li>When adding a new word to the dictionary, a new code word needs to be assigned. </li></ul>
  21. 21. Lempel-Ziv Algorithm <ul><li>w = NIL; </li></ul><ul><li>while ( read a character c ) </li></ul><ul><li>{ </li></ul><ul><li>if w+c exists in the dictionary </li></ul><ul><li>w = w+c; </li></ul><ul><li>else </li></ul><ul><li>add w+c to the dictionary; </li></ul><ul><li>output the code for w; </li></ul><ul><li>w = k; </li></ul><ul><li>} </li></ul>
  22. 22. Adaptive Encoding <ul><li>Webster has 157,000 entries: could encode in X bits </li></ul><ul><ul><li>but only works for this document </li></ul></ul><ul><ul><li>Don’t want to do two passes </li></ul></ul><ul><li>Adaptive Huffman </li></ul><ul><ul><li>modify model on the fly </li></ul></ul><ul><li>Zempel-Liv 1977 </li></ul><ul><li>ZLW Zempel-Liv Welsh </li></ul><ul><ul><li>1984 used in compress (UNIX) </li></ul></ul><ul><ul><li>uses dictionary method </li></ul></ul><ul><ul><li>variable number of symbols to fixed length code </li></ul></ul><ul><ul><li>better with large documents- finds repetitive patterns </li></ul></ul>
  23. 23. Audio Compression <ul><li>Sounds can be represented as a vector valued function </li></ul><ul><li>At any point in time, a sound is a combination of different frequencies of different strengths </li></ul><ul><li>For example, each note on a piano yields a specific frequency. </li></ul><ul><li>Also, our ears, like pianos, have cilia that responds to specific frequencies. </li></ul><ul><li>Just like sin(x) can be approximated by small number of terms, e.g. x -x^3/3+x^5/120…, so can sound. </li></ul><ul><li>Transforming a sound into its “spectrum” is done mathematically by a fourier transform. </li></ul><ul><li>The spectrum can be played back, as on computer with sound cards. </li></ul>
  24. 24. Audio <ul><li>Using many frequencies, as in CDs, yields a good approximation Using few frequenices, as in telephones, a poor approximation </li></ul><ul><li>Sampling frequencies yields compresssion ratios between 6 to 24, depending on sound and quality </li></ul><ul><li>High-priced electronic pianos store and reuse “samples” of concert pianos </li></ul><ul><li>High filter: removes/reduces high frequencies, a common problem with aging </li></ul><ul><li>Low filter: removes/reduces low frequencies </li></ul><ul><li>Can use differential methods: </li></ul><ul><ul><li>only report change in sounds </li></ul></ul>
  25. 25. Image Compression <ul><li>with or without loss, mostly with </li></ul><ul><ul><li>who cares about what the eye can’t see </li></ul></ul><ul><li>Black and white images can regarded as functions from the plane (R^2) into the reals (R), as in old TVs </li></ul><ul><ul><li>positions vary continuous, but our eyes can’t see the discreteness around 100 pixels per inch. </li></ul></ul><ul><li>Color images can be regarded as functions from the plane into R^3, the RGB space. </li></ul><ul><ul><li>Colors are vary continuous, but our eyes sample colors with only 3 difference receptors (RGB) </li></ul></ul><ul><li>Mathematical theories yields close approximation </li></ul><ul><ul><li>there are spatial analogues to fourier transforms </li></ul></ul>
  26. 26. Image Compression <ul><li>faces can be done with eigenfaces </li></ul><ul><ul><li>images can be regarded a points in R^(big) </li></ul></ul><ul><ul><li>choose good bases and use most important vectors </li></ul></ul><ul><ul><li>i.e. approximate with fewer dimensions: </li></ul></ul><ul><ul><li>JPEG, MPEG, GIF are compressed images </li></ul></ul>
  27. 27. Video Compression <ul><li>Uses DCT (discrete cosine transform) </li></ul><ul><ul><li>Note: Nice functions can be approximated by </li></ul></ul><ul><ul><ul><li>sum of x, x^2,… with appropriate coefficients </li></ul></ul></ul><ul><ul><ul><li>sum of sin(x), sin(2x),… with right coefficients </li></ul></ul></ul><ul><ul><ul><li>almost any infinite sum of functions </li></ul></ul></ul><ul><ul><li>DCT is good because few terms give good results on images. </li></ul></ul><ul><ul><li>Differential methods used: </li></ul></ul><ul><ul><ul><li>only report changes in video </li></ul></ul></ul>
  28. 28. Summary <ul><li>Issues: </li></ul><ul><ul><li>Context: what problem are you solving and what is an acceptable solution. </li></ul></ul><ul><ul><li>evaluation: compression ratios </li></ul></ul><ul><ul><li>fidelity, if loss </li></ul></ul><ul><ul><ul><li>approximation, quantization, transforms, differential </li></ul></ul></ul><ul><ul><li>adaptive, if on-the-fly, e.g. movies, tv </li></ul></ul><ul><ul><li>Different sources yield different best approaches </li></ul></ul><ul><ul><ul><li>cartoons versus cities versus outdoors </li></ul></ul></ul><ul><ul><li>code book separate or not </li></ul></ul><ul><ul><li>fixed or variable length codes </li></ul></ul>