Successfully reported this slideshow.
Upcoming SlideShare
×

# Medium Information Quantity

445 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Medium Information Quantity

1. 1. Medium Information Quantity Vitalie Scurtu
2. 2. History ● Shannon - "A Mathematical Theory of Communication“ - 1948 ● Lossless compression (Shannon-Fano, Adaptive Huffman) ● Today it is used in cryptography, analysis of DNA sequences (http://pnylab. com/pny/papers/cdna/cdna/index.html), in Natural Language Processing applications
3. 3. What is information quantity? The information quantity of one phenomenon depends on its frequency ● Low information quantity ○ It rains in London ○ The economy is in crisis ○ Berlusconi went with Escorts Low Information quantity: I am telling things you already heard of many times, nothing new ● High information quantity ○ Today it snows in Rome ○ Dentists on strike High Information quantity: I am telling things you never or rarely heard of, much new information
4. 4. Entropy and the medium information quantity ● The formula of information quantity ○ H (m ) = -Σ p(m) * log p(m) ○ V(m) = 1/n Σ 1- p(m) ■ p(m) - probability that m will happen H(m) - Entropy or information quantity (IQ) V(m) - Medium information quantity (MQI)
5. 5. Probability coefficients lim p(x) -> 1 x=1..n y=p(x)
6. 6. Logarithm coefficients lim log(x) -> -∞ x=0..1 y=log(x)
7. 7. Coefficients of Shannon Entropy p(x) log(p(x))*-1 ● Very likely words: in, the, is, has,of ● Very unlikely words: APT, x=p(1..n) y=x*log(x)*-1
8. 8. Documents distribution based on its entropy Zipf distribution (long tail) Few in superior extremity and many in inferior extremity ● MIN=0 ● MAX=1700(no limit) x=doc(1..n) y=H(doc(1..n)
9. 9. Documents distribution based on its medium information quantity Gaussian Distribution Few in extremity and the majority inside the medium values MIN=0 MAX=1.0 x=doc(1...n) Y=V(doc(1...n))
10. 10. Documents distribution based on its medium quantity x=doc(1...n) y=V(doc(1...n))
11. 11. Entropy depends on the text length Correlation: 0.99 - the highest correlation, almost identical
12. 12. Conclusions ● Very low correlation of MIQ with text lengths 0.05 vs. 0.985 ● Correlation of MIQ vs. IQ is 0.57 ● Entropy depends on text length, MIQ does not depend on text length therefore it find anomalies ● MIQ: information about text style ● MIQ compensates IQ
13. 13. The End ¿Questions? email to scurtu19@gmail.com