Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Chapter 6  Entropy and Shannon’s First Theorem
Information Axioms :  I ( p ) =  the amount of information in the occurrence of  an event of probability  p A. I ( p )  ≥ ...
<ul><li>Uniqueness :  </li></ul><ul><li>Suppose  I ′( p ) satisfies the axioms.  Since  I ′( p ) ≥ 0, take any 0 <  p 0  <...
Entropy <ul><li>The average amount of information received on a  per symbol  basis from a source  S  = { s 1 , …,  s q } o...
<ul><li>Consider the function  f ( p ) =  p  log (1/ p ). Use natural logarithms:  </li></ul><ul><li>f′ ( p ) = (- p  log ...
Gibbs Inequality Basic information about log function: Tangent line to  y  = ln  x   at  x  = 1 is ( y     ln 1) = (ln) ′...
<ul><li>Minimum Entropy occurs when one  p i  = 1 and all others are 0. </li></ul><ul><li>Maximum Entropy occurs when?  Co...
Entropy Examples <ul><li>S  = { s 1 } p 1  = 1 H ( S ) = 0 (no information) </li></ul><ul><li>S  = { s 1 , s 2 } p 1  =  p...
Entropy as a Lower Bound for Average Code Length <ul><li>Given an instantaneous code with length  l i  in radix  r , let <...
Shannon-Fano Coding The simplest variable length method.  Less efficient than Huffman, but allows one to code symbol  s i ...
Example :  p ’ s :  ¼, ¼, ⅛, ⅛, ⅛, ⅛  l ’ s : 2, 2, 3, 3, 3, 3  K  = 1  H 2 ( S ) = 2.5 L  = 5/2 0 0 0 0 0 1 1 1 1 1 6.6
Recall: The  n th  extension of a source  S  = { s 1 , …,  s q } with probabilities  p 1 ,  …,  p q  is the set of symbols...
6.8    H ( S n ) =  n ∙ H ( S ) Hence the average S-F code length  L n  for  T  satisfies: H ( T )     L n  <  H ( T ) +...
Extension Example <ul><li>S  = { s 1 ,  s 2 }  p 1  = 2/3  p 2   = 1/3  H 2 ( S ) = (2/3)log 2 (3/2) + (1/3)log 2 (3/1)  ~...
Extension cont. ( 2 + 1) n   = 3 n 6.9 2 n  3 n -1   *
Markov Process Entropy 6.10
Example 6.11 0, 0 1, 0 0, 1 1, 1 .8 .8 .5 .5 .5 .5 .2 .2 equilibrium probabilities  p (0,0) = 5/14 =  p (1,1)  p (0,1) = 2...
Base Fibonacci The  golden ratio      = (1+√5)/2  is a solution to  x 2   −  x  − 1 = 0 and is  equal to the limit of the...
The Adjoint System For simplicity, consider a first-order Markov system,  S Goal:  bound the entropy by a source with zero...
Upcoming SlideShare
Loading in …5
×

Datacompression1

935 views

Published on

Published in: Technology, Business

Datacompression1

  1. 1. Chapter 6 Entropy and Shannon’s First Theorem
  2. 2. Information Axioms : I ( p ) = the amount of information in the occurrence of an event of probability p A. I ( p ) ≥ 0 for any event p B. I ( p 1 ∙ p 2 ) = I ( p 1 ) + I ( p 2 ) p 1 & p 2 are independent events C. I ( p ) is a continuous function of p Existence : I ( p ) = log_(1/ p ) units of information : in base 2 = a bit in base e = a nat in base 10 = a Hartley 6.2 A quantitative measure of the amount of information any probabilistic event represents. Cauchy functional equation source single symbol
  3. 3. <ul><li>Uniqueness : </li></ul><ul><li>Suppose I ′( p ) satisfies the axioms. Since I ′( p ) ≥ 0, take any 0 < p 0 < 1, any base k = (1/ p 0 ) (1/ I′ ( p 0 )) . So k I′ ( p 0 ) = 1/ p 0 , and hence log k (1/ p 0 ) = I′ ( p 0 ). Now, any z  (0,1) can be written as p 0 r , r a real number  R + ( r = log p 0 z ). The Cauchy Functional Equation implies that I ′( p 0 n ) = n I ′( p 0 ) and m  Z + , I ′( p 0 1/ m ) = (1/ m ) I ′( p 0 ), which gives I ′( p 0 n / m ) = ( n / m ) I ′( p 0 ), and hence by continuity I ′( p 0 r ) = r I ′( p 0 ). Hence I ′( z ) = r ∙log k (1/ p 0 ) = log k  (1/ p 0 r ) = log k (1/ z ).  </li></ul><ul><li>Note : In this proof, we introduce an arbitrary p 0 , show how any z relates to it, and then eliminate the dependency on that particular p 0 . </li></ul>6.2
  4. 4. Entropy <ul><li>The average amount of information received on a per symbol basis from a source S = { s 1 , …, s q } of symbols, s i has probability p i . It is measuring the rate . </li></ul><ul><li>For radix r </li></ul><ul><li>when all the probabilities are independent. </li></ul><ul><li>Entropy is the amount of information in the probability distribution. </li></ul><ul><li>Alternative approach: consider a long message of N symbols from S   = { s 1 , …, s q } with probabilities p 1 ,   …,   p q . You expect s i to appear Np i times, and the probability of this typical message is </li></ul>6.3
  5. 5. <ul><li>Consider the function f ( p ) = p log (1/ p ). Use natural logarithms: </li></ul><ul><li>f′ ( p ) = (- p log p ) ′ = - p (1/ p ) – log p = -1 + log (1/ p ) </li></ul><ul><li>f ″ ( p ) = p (- p -2 ) = -1/ p < 0 for p    (0,1)  f is concave down </li></ul>f (1) = 0 1/ e f 0 1/ e 1 p 6.3 f ′(0) = ∞ f′ (1) = -1 f′ ( 1/ e ) = 0 f (1/ e ) = 1/ e
  6. 6. Gibbs Inequality Basic information about log function: Tangent line to y = ln x at x = 1 is ( y  ln 1) = (ln) ′ x =1 ( x  1)  y = x  1 (ln x )″ = (1/ x )′ = -(1/ x 2 ) < 0  x    ln x is concave down. Therefore, ln x  x  1 0 -1 1 ln x x y = x  1 6.4
  7. 7. <ul><li>Minimum Entropy occurs when one p i = 1 and all others are 0. </li></ul><ul><li>Maximum Entropy occurs when? Consider </li></ul>Fundamental Gibbs inequality 6.4
  8. 8. Entropy Examples <ul><li>S = { s 1 } p 1 = 1 H ( S ) = 0 (no information) </li></ul><ul><li>S = { s 1 , s 2 } p 1 = p 2 = ½ H 2 ( S ) = 1 (1 bit per symbol) </li></ul><ul><li>S = { s 1 , …, s r } p 1 = … = p r = 1/ r H r ( S ) = 1 but H 2 ( S ) = log 2 r . </li></ul><ul><li>Run length coding (for instance, in predictive coding) (binary) </li></ul><ul><li>p = 1  q probability of 0 H 2 ( S ) = p log 2 (1/ p ) + q log 2 (1/ q ) </li></ul><ul><li>As q  0 the term q log 2 (1/ q ) dominates (compare slopes). </li></ul><ul><li>1/ q = average run length; log 2 (1/ q ) = # of bits needed (on average); </li></ul><ul><li>q log 2 (1/ q ) = average # of bits of information per bit of original code. </li></ul>
  9. 9. Entropy as a Lower Bound for Average Code Length <ul><li>Given an instantaneous code with length l i in radix r , let </li></ul>By the McMillan inequality, this hold for all uniquely decodable codes. Equality occurs when K = 1 (the decoding tree is complete) and 6.5
  10. 10. Shannon-Fano Coding The simplest variable length method. Less efficient than Huffman, but allows one to code symbol s i with length l i directly from p i . Given source symbols s 1 , …, s q with probabilities p 1 , …, p q pick l i  =   log r (1/ p i )  . Hence, Summing this inequality over i : Kraft inequality is satisfied, therefore there is an instantaneous code with these lengths. 6.6
  11. 11. Example : p ’ s : ¼, ¼, ⅛, ⅛, ⅛, ⅛ l ’ s : 2, 2, 3, 3, 3, 3 K = 1 H 2 ( S ) = 2.5 L = 5/2 0 0 0 0 0 1 1 1 1 1 6.6
  12. 12. Recall: The n th extension of a source S = { s 1 , …, s q } with probabilities p 1 ,  …, p q is the set of symbols T = S n = { s i 1 ∙∙∙ s i n | s i j  S 1  j  n } where t i   = s i 1 ∙∙∙ s i n has probability p i 1 ∙∙∙ p i n = Q i assuming independent probabilities. The entropy is: [Letting i = ( i 1 , …, i n ) q , an n -digit number base q ] The Entropy of Code Extensions concatenation multiplication 6.8
  13. 13. 6.8  H ( S n ) = n ∙ H ( S ) Hence the average S-F code length L n for T satisfies: H ( T )     L n  <  H ( T ) + 1  n ∙ H ( S )  L n < n ∙ H ( S ) + 1  H ( S )    ( L n / n ) <  H ( S ) + 1/ n
  14. 14. Extension Example <ul><li>S = { s 1 , s 2 } p 1 = 2/3 p 2 = 1/3 H 2 ( S ) = (2/3)log 2 (3/2) + (1/3)log 2 (3/1) ~ 0.9182958 … </li></ul><ul><li>Huffman coding: s 1 = 0 s 2 = 1 Avg. coded length = (2/3)∙1+(1/3)∙1 = 1 </li></ul><ul><li>Shannon-Fano: l 1 = 1 l 2 = 2 Avg. coded length = (2/3)∙1+(1/3)∙2 = 4/3 </li></ul><ul><li>2nd extention: p 11 = 4/9 p 12 = 2/9 = p 21 p 22 = 1/9 S-F : </li></ul><ul><li>l 11 =  log 2 (9/4)  = 2 l 12 = l 21 =  log 2 (9/2)  = 3 l 22 =  log 2 (9/1)  = 4 </li></ul><ul><li>L SF (2) = avg. coded length = (4/9)∙2+(2/9)∙3∙2+(1/9)∙4 = 24/9 = 2.666… </li></ul><ul><li>S n = ( s 1 + s 2 ) n , whose probabilities are corresponding terms in ( p 1 + p 2 ) n </li></ul>6.9
  15. 15. Extension cont. ( 2 + 1) n = 3 n 6.9 2 n 3 n -1 *
  16. 16. Markov Process Entropy 6.10
  17. 17. Example 6.11 0, 0 1, 0 0, 1 1, 1 .8 .8 .5 .5 .5 .5 .2 .2 equilibrium probabilities p (0,0) = 5/14 = p (1,1) p (0,1) = 2/14 = p (1,0) previous state next state S i 1 S i 2 S i p ( s i | s i 1 , s i 2 ) p ( s i 1 , s i 2 ) p ( s i 1 , s i 2 , s i ) 0 0 0 0.8 5/14 4/14 0 0 1 0.2 5/14 1/14 0 1 0 0.5 2/14 1/14 0 1 1 0.5 2/14 1/14 1 0 0 0.5 2/14 1/14 1 0 1 0.5 2/14 1/14 1 1 0 0.2 5/14 1/14 1 1 1 0.8 5/14 4/14
  18. 18. Base Fibonacci The golden ratio  = (1+√5)/2 is a solution to x 2 − x − 1 = 0 and is equal to the limit of the ratio of adjacent Fibonacci numbers. 0 … r − 1 H 2 = log 2 r 1/ r 0 1 1/  1/  2 1 0 1 st order Markov process: 0 10 1/  1/  2 1/  1/  2 1 0 1/  + 1/  2 = 1 Think of source as emitting variable length symbols: Entropy = (1/  )∙log  + ½ (1/  ² )∙log  ² = log  which is maximal take into account variable length symbols 1/  1/  2 0
  19. 19. The Adjoint System For simplicity, consider a first-order Markov system, S Goal: bound the entropy by a source with zero memory, yet whose probabilities are the equilibrium probabilities. Let p ( s i ) = equilibrium prob. of s i p ( s j ) = equilibrium prob. of s j p ( s j , s i ) = equilibrium probability of getting s j s i . with = only if p ( s j ,   s i )   =   p ( s i )  ·  p ( s j ) Now, p ( s j , s i ) = p ( s i | s j ) · p ( s j ). = = = 6.12 (skip)

×