Medium Information
Quantity
Vitalie Scurtu
History
● Shannon - "A Mathematical Theory of Communication“ -
1948
● Lossless compression (Shannon-Fano, Adaptive
Huffman...
What is information quantity?
The information quantity of one phenomenon depends on
its frequency
● Low information quanti...
Entropy and the medium information
quantity
● The formula of information quantity
○ H (m ) = -Σ p(m) * log p(m)
○ V(m) = 1...
Probability coefficients
lim p(x) -> 1
x=1..n
y=p(x)
Logarithm coefficients
lim log(x) -> -∞
x=0..1
y=log(x)
Coefficients of Shannon Entropy
p(x) log(p(x))*-1
● Very likely words: in,
the, is, has,of
● Very unlikely words: APT,
x=p...
Documents distribution based on its
entropy
Zipf distribution
(long tail)
Few in superior
extremity and many in
inferior e...
Documents distribution based on its
medium information quantity
Gaussian
Distribution
Few in extremity and
the majority in...
Documents distribution based on its
medium quantity
x=doc(1...n)
y=V(doc(1...n))
Entropy depends on the text length
Correlation: 0.99 - the highest correlation, almost identical
Conclusions
● Very low correlation of MIQ with text lengths
0.05 vs. 0.985
● Correlation of MIQ vs. IQ is 0.57
● Entropy d...
The End
¿Questions?
email to scurtu19@gmail.com
Upcoming SlideShare
Loading in …5
×

Medium Information Quantity

409 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
409
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Medium Information Quantity

  1. 1. Medium Information Quantity Vitalie Scurtu
  2. 2. History ● Shannon - "A Mathematical Theory of Communication“ - 1948 ● Lossless compression (Shannon-Fano, Adaptive Huffman) ● Today it is used in cryptography, analysis of DNA sequences (http://pnylab. com/pny/papers/cdna/cdna/index.html), in Natural Language Processing applications
  3. 3. What is information quantity? The information quantity of one phenomenon depends on its frequency ● Low information quantity ○ It rains in London ○ The economy is in crisis ○ Berlusconi went with Escorts Low Information quantity: I am telling things you already heard of many times, nothing new ● High information quantity ○ Today it snows in Rome ○ Dentists on strike High Information quantity: I am telling things you never or rarely heard of, much new information
  4. 4. Entropy and the medium information quantity ● The formula of information quantity ○ H (m ) = -Σ p(m) * log p(m) ○ V(m) = 1/n Σ 1- p(m) ■ p(m) - probability that m will happen H(m) - Entropy or information quantity (IQ) V(m) - Medium information quantity (MQI)
  5. 5. Probability coefficients lim p(x) -> 1 x=1..n y=p(x)
  6. 6. Logarithm coefficients lim log(x) -> -∞ x=0..1 y=log(x)
  7. 7. Coefficients of Shannon Entropy p(x) log(p(x))*-1 ● Very likely words: in, the, is, has,of ● Very unlikely words: APT, x=p(1..n) y=x*log(x)*-1
  8. 8. Documents distribution based on its entropy Zipf distribution (long tail) Few in superior extremity and many in inferior extremity ● MIN=0 ● MAX=1700(no limit) x=doc(1..n) y=H(doc(1..n)
  9. 9. Documents distribution based on its medium information quantity Gaussian Distribution Few in extremity and the majority inside the medium values MIN=0 MAX=1.0 x=doc(1...n) Y=V(doc(1...n))
  10. 10. Documents distribution based on its medium quantity x=doc(1...n) y=V(doc(1...n))
  11. 11. Entropy depends on the text length Correlation: 0.99 - the highest correlation, almost identical
  12. 12. Conclusions ● Very low correlation of MIQ with text lengths 0.05 vs. 0.985 ● Correlation of MIQ vs. IQ is 0.57 ● Entropy depends on text length, MIQ does not depend on text length therefore it find anomalies ● MIQ: information about text style ● MIQ compensates IQ
  13. 13. The End ¿Questions? email to scurtu19@gmail.com

×