Noise info theory and Entrophy


Published on


Published in: Technology, Education
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Noise info theory and Entrophy

  1. 1. Noise, Information Theory,and EntropyCS414 – Spring 2007By Roger Cheng(Huffman coding slides courtesyof Brian Bailey)
  2. 2. Why study noise?It’s present in allsystems of interest,and we have to dealwith itBy knowing itscharacteristics, wecan fight it betterCreate models toevaluate analytically
  3. 3. Communication systemabstractionInformationsourceEncoder ModulatorChannelOutput signal Decoder DemodulatorSender sideReceiver side
  4. 4. The additive noise channelTransmitted signal s(t)is corrupted by noisesource n(t), and theresulting received signalis r(t)Noise could result formmany sources, includingelectronic componentsand transmissioninterferencen(t)+s(t) r(t)
  5. 5. Random processesA random variable is the result of a singlemeasurementA random process is a indexed collection ofrandom variables, or equivalently a non-deterministic signal that can be described bya probability distributionNoise can be modeled as a random process
  6. 6. WGN (White Gaussian Noise)Properties At each time instant t = t0, the value of n(t)is normally distributed with mean 0,variance σ2(ie E[n(t0)] = 0, E[n(t0)2] = σ2) At any two different time instants, thevalues of n(t) are uncorrelated (ieE[n(t0)n(tk)] = 0) The power spectral density of n(t) hasequal power in all frequency bands
  7. 7. WGN continuedWhen an additive noise channel has a whiteGaussian noise source, we call it an AWGNchannelMost frequently used model incommunicationsReasons why we use this model It’s easy to understand and compute It applies to a broad class of physical channels
  8. 8. Signal energy and powerEnergy is defined asPower is defined asMost signals are either finite energy and zeropower, or infinite energy and finite powerNoise power is hard to compute in time domain Power of WGN is its variance σ22= | ( ) | dtx x tε∞−∞∫/22/21= lim | ( ) |TxTTP x t dtT→∞−∫
  9. 9. Signal to Noise Ratio (SNR)Defined as the ratio of signal power to thenoise power corrupting the signalUsually more practical to measure SNR on adB scaleObviously, want as high an SNR as possible
  10. 10. SNR example - AudioOriginal sound file isCD quality (16 bit,44.1 kHz samplingrate)Original 40 dB20 dB 10 dB
  11. 11. Analog vs. DigitalAnalog system Any amount of noise will create distortion at theoutputDigital system A relatively small amount of noise will cause noharm at all Too much noise will make decoding of receivedsignal impossibleBoth - Goal is to limit effects of noise to amanageable/satisfactory amount
  12. 12. Information theory andentropyInformation theory tries tosolve the problem ofcommunicating as muchdata as possible over anoisy channelMeasure of data isentropyClaude Shannon firstdemonstrated thatreliable communicationover a noisy channel ispossible (jump-starteddigital age)
  13. 13. Entropy definitionsShannon entropyBinary entropy formulaDifferential entropy
  14. 14. Properties of entropyCan be defined as the expectation of log p(x)(ie H(X) = E[-log p(x)])Is not a function of a variable’s values, is afunction of the variable’s probabilitiesUsually measured in “bits” (using logs of base2) or “nats” (using logs of base e)Maximized when all values are equally likely(ie uniform distribution)Equal to 0 when only one value is possible Cannot be negative
  15. 15. Joint and conditional entropyJoint entropy is the entropy of thepairing (X,Y)Conditional entropy is the entropy of X ifthe value of Y was knownRelationship between the two
  16. 16. Mutual informationMutual information is how muchinformation about X can be obtained byobserving Y
  17. 17. Mathematical model of achannelAssume that our input to the channel isX, and the output is YThen the characteristics of the channelcan be defined by its conditionalprobability distribution p(y|x)
  18. 18. Channel capacity and rateChannel capacity is defined as themaximum possible value of the mutualinformation We choose the best f(x) to maximize CFor any rate R < C, we can transmitinformation with arbitrarily smallprobability of error
  19. 19. Binary symmetric channelCorrect bit transmitted with probability 1-pWrong bit transmitted with probability p Sometimes called “cross-over probability”Capacity C = 1 - H(p,1-p)
  20. 20. Binary erasure channelCorrect bit transmitted with probability 1-p“Erasure” transmitted with probability pCapacity C = 1 - p
  21. 21. Coding theoryInformation theory only gives us an upperbound on communication rateNeed to use coding theory to find a practicalmethod to achieve a high rate2 types Source coding - Compress source data to asmaller size Channel coding - Adds redundancy bits to maketransmission across noisy channel more robust
  22. 22. Source-channel separationtheoremShannon showed that when dealingwith one transmitter and one receiver,we can break up source coding andchannel coding into separate stepswithout loss of optimalityDoes not apply when there are multipletransmitters and/or receivers Need to use network information theoryprinciples in those cases
  23. 23. Huffman EncodingUse probability distribution to determinehow many bits to use for each symbol higher-frequency assigned shorter codes entropy-based, block-variable codingscheme
  24. 24. Huffman EncodingProduces a code which uses a minimumnumber of bits to represent each symbol cannot represent same sequence using fewer realbits per symbol when using code words optimal when using code words, but this may differslightly from the theoretical lower limitBuild Huffman tree to assign codes
  25. 25. Informal Problem DescriptionGiven a set of symbols from an alphabet andtheir probability distribution assumes distribution is known and stableFind a prefix free binary code with minimumweighted path length prefix free means no codeword is a prefix of anyother codeword
  26. 26. Huffman AlgorithmConstruct a binary tree of codes leaf nodes represent symbols to encode interior nodes represent cumulative probability edges assigned 0 or 1 output codeConstruct the tree bottom-up connect the two nodes with the lowest probabilityuntil no more nodes to connect
  27. 27. Huffman ExampleConstruct theHuffman coding tree(in class)Symbol(S)P(S)A 0.25B 0.30C 0.12D 0.15E 0.18
  28. 28. Characteristics of SolutionLowest probability symbol isalways furthest from rootAssignment of 0/1 to childrenedges arbitrary other solutions possible; lengthsremain the same If two nodes have equalprobability, can select any twoNotes prefix free code O(nlgn) complexitySymbol(S)CodeA 11B 00C 010D 011E 10
  29. 29. Example Encoding/DecodingEncode “BEAD”⇒001011011⇒Decode “0101100”Symbol(S)CodeA 11B 00C 010D 011E 10
  30. 30. Entropy (Theoretical Limit)= -.25 * log2 .25 +-.30 * log2 .30 +-.12 * log2 .12 +-.15 * log2 .15 +-.18 * log2 .18H = 2.24 bits∑=−=Niii spspH1)(log)( 2Symbol P(S)CodeA 0.25 11B 0.30 00C 0.12 010D 0.15 011E 0.18 10
  31. 31. Average Codeword Length= .25(2) +.30(2) +.12(3) +.15(3) +.18(2)L = 2.27 bits∑==Niii scodelengthspL1)()(Symbol P(S)CodeA 0.25 11B 0.30 00C 0.12 010D 0.15 011E 0.18 10
  32. 32. Code Length Relative toEntropyHuffman reaches entropy limit when allprobabilities are negative powers of 2 i.e., 1/2; 1/4; 1/8; 1/16; etc.H <= Code Length <= H + 1∑=−=Niii spspH1)(log)( 2∑==Niii scodelengthspL1)()(
  33. 33. ExampleH = -.01*log2.01 +-.99*log2.99= .08L = .01(1) +.99(1)= 1Symbol P(S) CodeA 0.01 1B 0.99 0
  34. 34. LimitationsDiverges from lower limit when probability ofa particular symbol becomes high always uses an integral number of bitsMust send code book with the data lowers overall efficiencyMust determine frequency distribution must remain stable over the data set