Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
×

Medium Information Quantity

470 views

Published on

Published in: Technology, Business
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Medium Information Quantity

1. 1. Medium Information Quantity Vitalie Scurtu
2. 2. History ● Shannon - "A Mathematical Theory of Communication“ - 1948 ● Lossless compression (Shannon-Fano, Adaptive Huffman) ● Today it is used in cryptography, analysis of DNA sequences (http://pnylab. com/pny/papers/cdna/cdna/index.html), in Natural Language Processing applications
3. 3. What is information quantity? The information quantity of one phenomenon depends on its frequency ● Low information quantity ○ It rains in London ○ The economy is in crisis ○ Berlusconi went with Escorts Low Information quantity: I am telling things you already heard of many times, nothing new ● High information quantity ○ Today it snows in Rome ○ Dentists on strike High Information quantity: I am telling things you never or rarely heard of, much new information
4. 4. Entropy and the medium information quantity ● The formula of information quantity ○ H (m ) = -Σ p(m) * log p(m) ○ V(m) = 1/n Σ 1- p(m) ■ p(m) - probability that m will happen H(m) - Entropy or information quantity (IQ) V(m) - Medium information quantity (MQI)
5. 5. Probability coefficients lim p(x) -> 1 x=1..n y=p(x)
6. 6. Logarithm coefficients lim log(x) -> -∞ x=0..1 y=log(x)
7. 7. Coefficients of Shannon Entropy p(x) log(p(x))*-1 ● Very likely words: in, the, is, has,of ● Very unlikely words: APT, x=p(1..n) y=x*log(x)*-1
8. 8. Documents distribution based on its entropy Zipf distribution (long tail) Few in superior extremity and many in inferior extremity ● MIN=0 ● MAX=1700(no limit) x=doc(1..n) y=H(doc(1..n)
9. 9. Documents distribution based on its medium information quantity Gaussian Distribution Few in extremity and the majority inside the medium values MIN=0 MAX=1.0 x=doc(1...n) Y=V(doc(1...n))
10. 10. Documents distribution based on its medium quantity x=doc(1...n) y=V(doc(1...n))
11. 11. Entropy depends on the text length Correlation: 0.99 - the highest correlation, almost identical
12. 12. Conclusions ● Very low correlation of MIQ with text lengths 0.05 vs. 0.985 ● Correlation of MIQ vs. IQ is 0.57 ● Entropy depends on text length, MIQ does not depend on text length therefore it find anomalies ● MIQ: information about text style ● MIQ compensates IQ
13. 13. The End ¿Questions? email to scurtu19@gmail.com