0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Introduction to power laws

764

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
764
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
28
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. An introduction to power laws Colin Gillespie November 15, 2012
• 2. Talk outline1 Introduction to power laws2 Distributional properties3 Parameter inference4 Power law generating mechanisms http://xkcd.com/
• 3. Classic example: distribution of US cities q 2000 No. of Cities 1500 Some data sets vary over enormous 1000 range 500 US towns & cities: q q q qqq qq qq q q q q q Dufﬁeld (pop 52) 0 q qqq qq q q 5 New York City (pop 8 mil) 10 4.5 10 105.5 106 106.5 107 City population The data is highly right-skewed Cumulative No. of Cities q When the data is plotted on a 103 q logarithmic scale, it seems to follow a q q 102 q straight line qq q qqqq q qqq This observation is attributed to Zipf 101 q q qq q q q 100 q 105 105.5 106 106.5 City population
• 5. What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec
• 6. What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec Distributions of this form are said to follow a power law The constant α is called the exponent of the power law We typically don’t care about c.
• 7. The power law distribution Name f (x ) Notes Power law x −α Pareto distribution Exponential e − λx 1 (ln x −µ)2 log-normal x exp(− 2σ 2 ) Power law x −α Zeta distribution Power law x −α x = 1, . . . , n, Zipf’s dist’ Γ (x ) Yule Γ (x + α ) Poisson λx /x !
• 8. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages
• 9. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages The number of hits on web pages The number of papers scientist write The number of citations received by papers Annual incomes Sales of books, music; in fact anything that can be sold
• 10. Zipf plots Blackouts Fires Flares 100 10−2 10−4 10−6 10−8 1−P(x) Moby Dick Terrorism Web links 100 10−2 10−4 10−6 10−8 100 102 104 106 100 102 104 106 100 102 104 106 x
• 11. Distributional properties
• 12. The power law distribution The power-law distribution is p (x ) ∝ x − α where α, the scaling parameter, is a constant The scaling parameter typically lies in the range 2 < α < 3, although there are some occasional exceptions Typically, the entire process doesn’t obey a power law Instead, the power law applies only for values greater than some minimum xmin
• 13. Power law: PDF & CDF α 1.50 1.75 2.00 2.25 2.50 For the continuous PL, the pdf is 1.5 PDF −α α−1 x 1.0 p (x ) = xmin xmin 0.5 where α > 1 and xmin > 0. 0.0 CDF The CDF is: 1.5 − α +1 1.0 x P (x ) = 1 − xmin 0.5 0.0 0.0 2.5 5.0 7.5 10.0 x
• 14. Power law: PDF & CDF α For the discrete power law, the pmf is 1.50 1.75 2.00 2.25 2.50 PDF x −α 1.5 p (x ) = ζ (α, xmin ) 1.0 where 0.5 ∞ 0.0 ζ (α, xmin ) = ∑ (n + xmin )−α 1.5 CDF n =0 1.0 is the generalised zeta function 0.5 When xmin = 1, ζ (α, 1) is the standard zeta function 0.0 0.0 2.5 5.0 7.5 10.0 x
• 15. MomentsMoments: ∞ α−1 x m = E [X m ] = x m p (x ) = xm xmin α − 1 − m minHence, when m ≥ α − 1, we have diverging moments
• 16. MomentsMoments: ∞ α−1 x m = E [X m ] = x m p (x ) = xm xmin α − 1 − m minHence, when m ≥ α − 1, we have diverging momentsSo when α < 2, all moments are inﬁnite α < 3, all second and higher-order moments are inﬁnite α < 4, all third order and higher-order moments are inﬁnite ....
• 17. Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xmin
• 18. Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: ∞ − α +2 x1 / 2 xp (x )dx x1/2 ∞ = = 2−(α−2)/(α−1) xp (x )dx xmin xminprovided α > 2, the integrals converge
• 19. Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: ∞ − α +2 x1 / 2 xp (x )dx x1/2 ∞ = = 2−(α−2)/(α−1) xp (x )dx xmin xminprovided α > 2, the integrals convergeWhen the wealth distribution was modelled using a power-law, α wasestimated to be 2.1, so 2−0.091 94% of the wealth is in the hands of thericher 50% of the population
• 20. Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes
• 21. Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes For example, the distribution of world GDP Population quantile Income Richest 20% 82.70% Second 20% 11.75% Third 20% 2.30% Fourth 20% 1.85% Poorest 20% 1.40%Other examples are: 80% of your proﬁts come from 20% of your customers 80% of your complaints come from 20% of your customers 80% of your proﬁts come from 20% of the time you spend
• 22. Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale
• 23. Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale For any b, we have p (bx ) = g (b )p (x ) That is, if we increase the scale by which we measure x by a factor of b, the shape of the distribution p (x ) is unchanged, except for a multiplicative constant The PL distribution is the only distribution with this property
• 24. Random numbersFor the continuous case, we can generate random numbers using thestandard inversion method: x = xmin (1 − u )−1/(α−1)where U ∼ U (0, 1)
• 25. Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search
• 26. Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search So for a given u, we ﬁrst bound the solution to the equation via: 1: x2 := xmin 2: repeat 3: x1 := x2 4: x2 := 2x1 5: until P (x2 ) < 1 − u Basically, the algorithm tests whether u ∈ [x , 2x ), starting with x = xmin Once we have the region we use a binary search
• 27. Fitting power law distributions
• 28. Fitting power law distributionsSuppose we know xmin and wish to estimate the exponent α.
• 29. Method 1 1 Bin your data: [xmin , xmin + x ), [xmin + x , xmin + 2 x) 2 Plot your data on a log-log plot 3 Use least squares to estimate α Bin size: 0.01 Bin size: 0.1 Bin size: 1.0 100 10−1 10−2 CDF 10−3 10−4 10−5 100 101 102 103 100 101 102 103 100 101 102 103 xYou could also use logarithmic binning (which is better) or should I say not asbad?
• 30. Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea
• 31. Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α
• 32. Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α On the bright side you do get a good R 2 value
• 33. Method 3: Log-Likelihood The log-likelihood isn’t that hard to deriveContinuous: n xi (α|x , xmin ) = n log(α − 1) − n log(xmin ) − α ∑ log i =1 xminDiscrete: n (α|x , xmin ) = −n log[ζ (α, xmin )] − α ∑ log(xi ) i =1 xmin −1 n = −n log[ζ (α)] + n log ∑ xi − α ∑ log(xi ) i =1 i =1
• 34. MLEsMaximising the log-likelihood gives −1 n xi ˆ α = 1+n ∑ ln xxmin i =1An estimate of the associated error is α−1 σ= √ n
• 35. MLEsMaximising the log-likelihood gives −1 n xi ˆ α = 1+n ∑ ln xxmin i =1An estimate of the associated error is α−1 σ= √ nThe discrete case is a bit more tricky and involves ignoring higher order terms,to get: −1 n xi ˆ α 1+n ∑ ln xxmin − 0.5 i =1
• 36. Estimating xmin Recall that the power-law pdf is −α α−1 x p (x ) = xmin xmin where α > 1 and xmin > 0 xmin isn’t a parameter in the usual since - it’s a cut-off in the state space Typically power-laws are only present in the distributional tails. So how much of the data should we discard so our distribution ﬁts a power-law?
• 37. Estimating xmin : method 1 The most common way is just look at the log-log plot What could be easier! Blackouts Fires Flares 100 10−2 10−4 10−6 10−8 1−P(x) Moby Dick Terrorism Web links 100 10−2 10−4 10−6 10−8 100 102 104 106 100 102 104 106 100 102 104 106 x
• 38. Estimating xmin : method 2 Use a "Bayesian approach" - the BIC: −2 + k ln n = −2 + xmin ln n Increasing xmin increases the number of parameters Only suitable for discrete distributions
• 39. Estimating xmin : method 3 Minimise the distance between the data and the ﬁtted model CDFs: D = max |S (x ) − P (x )| x ≥xmin where S (x ) is the CDF of the data and P (x ) is the theoretical CDF (the Kolmogorov-Smirnov statistic) Our estimate xmin is then the value of xmin that minimises D Use some form of bootstrapping to get a handle on uncertainty of xmin
• 40. Mechanisms for generating PL distributions
• 41. Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m http://activerain.com/
• 42. Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m The distribution of word frequency has http://activerain.com/ the form p (x ) ∼ x −α
• 43. Relationship between α value and Zipf’s principle of leasteffort.α value Examples in literature Least effort forα < 1.6 Advanced schizophrenia1.6 ≤ α < 2 Military combat texts, Wikipedia, Web Annotator pages listed on the open directory projectα=2 Single author texts Equal effort levels2 < α ≤ 2.4 Multi author texts Audienceα > 2.4 Fragmented discourse schizophrenia
• 44. Random walks Suppose we have a 1d random walk At each unit of time, we move ±1 4 q q q q 2 q q q q q q q q q q q q Position 0 q q q q q q q q q q q q q q q q q −2 q q q q −4 0 10 20 30 Time
• 45. Random walks Suppose we have a 1d random walk At each unit of time, we move ±1 4 q q q q 2 q q q q q q q q q q q q Position 0 q q q q q q q q q q q q q q q q q −2 q q q q −4 0 10 20 30 Time If we start at n = 0, what is the probability for the ﬁrst return time at time t
• 46. Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2!
• 47. Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2! Tenuous link to phylogenetics
• 48. Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc
• 49. Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc When p is small, s is independent of the lattice size When p is large, s depends on the lattice size
• 50. Phase transitions and critical phenomena p=0.3 As we increase p, the value of s also increases For some p, s starts to increase with the lattice size p=0.5927... This is know as the critical value, and is p = pc = 0.5927462.. If we calculate the distribution of p (s ), then when p = pc , p (s ) follows a power-law distribution p=0.9
• 51. Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc
• 52. Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc This is an example of self-organised criticality
• 53. Future work There isn’t even an R package for power law estimation Writing this talk I have (more or less) written one Use a Bayesian change point model to estimate xmin in a vaguely sensible way RJMCMC to change between the power law and other heavy tailed distributionsReferences A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-lawdistributionsinempiricaldata. http://arxiv.org/abs/0706.1062 MEJ Newman. Powerlaws,ParetodistributionsandZipf’slaw. http://arxiv.org/abs/cond-mat/0412004