1.
An introduction to power laws Colin Gillespie November 15, 2012
2.
Talk outline1 Introduction to power laws2 Distributional properties3 Parameter inference4 Power law generating mechanisms http://xkcd.com/
3.
Classic example: distribution of US cities q 2000 No. of Cities 1500 Some data sets vary over enormous 1000 range 500 US towns & cities: q q q qqq qq qq q q q q q Dufﬁeld (pop 52) 0 q qqq qq q q 5 New York City (pop 8 mil) 10 4.5 10 105.5 106 106.5 107 City population The data is highly right-skewed Cumulative No. of Cities q When the data is plotted on a 103 q logarithmic scale, it seems to follow a q q 102 q straight line qq q qqqq q qqq This observation is attributed to Zipf 101 q q qq q q q 100 q 105 105.5 106 106.5 City population
4.
Distribution of world cities World city populations for 8 countries logsize vs logrank 107.5 New York Mumbai (Bombay) São Paulo Delhi Djakarta Los Angeles Shanghai Kolkata (Calcutta) Moscou Lagos Log Population Pékin (Beijing) Rio de Janeiro Chicago Ruhr 7 10 Hong Kong (Xianggang) Washington Chongqing ChennaiBoston San Francisco − San José (Madras) ShenyangDallas − Fort Worth TianjinBangalore Hyderabad Philadelphie Detroit Bandung Houston Miami Canton (Guangzhou) Atlanta Ahmadabad Belo Horizonte Pune San Diego − Tijuana Ibadan Xian Saint−Petersbourg Harbin Wuhan Shantou Chengdu Hangzhou Phoenix Kano Nanjing Medan − Saint−Petersburg Seattle Tampa Alegre Berlin Surabaya Porto Recife Minneapolis Salvador CuritibaKanpur Jinan 106.5 BrasiliaFortaleza CincinnatiCleveland Hambourg Francfort Surat Changchun Jaipur Lucknow Denver Shijiazhuang Saint−Louis Dalian Taiyuan Zibo Brownsville − McAllen − Matamoros − Reynosa Orlando Nagpur Patna Campinas Portland − Ciudad Juarez Qingdao Tangshan El Paso Guiyang Pittsburgh Kunming Sacramento Charlotte Belem Munich Stuttgart City Anshan Salt Lake Changsha Bénin Wuxi Zhengzhou Nanchang Palembang Goiânia San Antonio Indianapolis Kansas City Columbus Indore Las Vegas Mirat Harcourt Kaduna Jilin Lanzhou Port Niznij Novgorod Santos Pandang (Macassar) Manaus Oshogbo Raleigh VadodaraUjung Bhopal Cirebon −Xinyang Nashik Bhubaneswar Ludhiana Beach − Norfolk − Corée du Nord) Durham Agra ZhanjiangVirginia Austin Coimbatore Nashville Dandong−Sinuiju (Chine Vitoria Greensboro − Winston−SalemXuzhou Luoyang Yogyakarta VisakhapatnamUrumqi Nanning Semarang Tanjungkarang (Bandar Lampung)Fuzhou (Bénarès) Kochi Mannheim HuainanVaranasi Rajkot Novosibirsk BielefeldBaotou Aba Volgograd Onitsha Suzhou Hefei Qiqihar Denpasar Samara Handan Leipzig−Halle São Luis Louisville GrandAsansolRostov Madurai Datong Rapids Iekaterinburg Allahabad Bengbu Mataram Jacksonville Ningbo Greenville − Jamshedpur Memphis City Spartanburg Oklahoma Natal Surakarta Jabalpur Richmond Tcheliabinsk BirminghamWenzhou Nuremberg Tegal Dhanbad Maisuru Chemnitz−ZwickauRongcheng OgbomoshoAmritsar Brême Buffalo Maceio Aurangabad Hohhot Nouvelle−Orléans RochesterMaiduguri Daqing Zhangjiakou TeresinaVijayawada Saarbruck−Forbach Hanovre Albany (France)Omsk Abuja Bhilai AomenSholapur SaratovKazan BaodingSrinagar Dresde Pingxiang Thiruvananthapuram Benxi Pessoa Zhenjiang Xianyang 106 Chandigarh Ranchi Guwahati Fresno Krasnojarsk Joao Kozhikkod Knoxville Ufa Samarinda Malang Ilorin Tucson 100 100.5 101 101.5 102 Log Rankhttp://brenocon.com/blog/2009/05/zipfs-law-and-world-city-populations/
5.
What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec
6.
What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec Distributions of this form are said to follow a power law The constant α is called the exponent of the power law We typically don’t care about c.
7.
The power law distribution Name f (x ) Notes Power law x −α Pareto distribution Exponential e − λx 1 (ln x −µ)2 log-normal x exp(− 2σ 2 ) Power law x −α Zeta distribution Power law x −α x = 1, . . . , n, Zipf’s dist’ Γ (x ) Yule Γ (x + α ) Poisson λx /x !
8.
Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages
9.
Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages The number of hits on web pages The number of papers scientist write The number of citations received by papers Annual incomes Sales of books, music; in fact anything that can be sold
12.
The power law distribution The power-law distribution is p (x ) ∝ x − α where α, the scaling parameter, is a constant The scaling parameter typically lies in the range 2 < α < 3, although there are some occasional exceptions Typically, the entire process doesn’t obey a power law Instead, the power law applies only for values greater than some minimum xmin
13.
Power law: PDF & CDF α 1.50 1.75 2.00 2.25 2.50 For the continuous PL, the pdf is 1.5 PDF −α α−1 x 1.0 p (x ) = xmin xmin 0.5 where α > 1 and xmin > 0. 0.0 CDF The CDF is: 1.5 − α +1 1.0 x P (x ) = 1 − xmin 0.5 0.0 0.0 2.5 5.0 7.5 10.0 x
14.
Power law: PDF & CDF α For the discrete power law, the pmf is 1.50 1.75 2.00 2.25 2.50 PDF x −α 1.5 p (x ) = ζ (α, xmin ) 1.0 where 0.5 ∞ 0.0 ζ (α, xmin ) = ∑ (n + xmin )−α 1.5 CDF n =0 1.0 is the generalised zeta function 0.5 When xmin = 1, ζ (α, 1) is the standard zeta function 0.0 0.0 2.5 5.0 7.5 10.0 x
15.
MomentsMoments: ∞ α−1 x m = E [X m ] = x m p (x ) = xm xmin α − 1 − m minHence, when m ≥ α − 1, we have diverging moments
16.
MomentsMoments: ∞ α−1 x m = E [X m ] = x m p (x ) = xm xmin α − 1 − m minHence, when m ≥ α − 1, we have diverging momentsSo when α < 2, all moments are inﬁnite α < 3, all second and higher-order moments are inﬁnite α < 4, all third order and higher-order moments are inﬁnite ....
17.
Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xmin
18.
Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: ∞ − α +2 x1 / 2 xp (x )dx x1/2 ∞ = = 2−(α−2)/(α−1) xp (x )dx xmin xminprovided α > 2, the integrals converge
19.
Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: ∞ − α +2 x1 / 2 xp (x )dx x1/2 ∞ = = 2−(α−2)/(α−1) xp (x )dx xmin xminprovided α > 2, the integrals convergeWhen the wealth distribution was modelled using a power-law, α wasestimated to be 2.1, so 2−0.091 94% of the wealth is in the hands of thericher 50% of the population
20.
Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes
21.
Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes For example, the distribution of world GDP Population quantile Income Richest 20% 82.70% Second 20% 11.75% Third 20% 2.30% Fourth 20% 1.85% Poorest 20% 1.40%Other examples are: 80% of your proﬁts come from 20% of your customers 80% of your complaints come from 20% of your customers 80% of your proﬁts come from 20% of the time you spend
22.
Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale
23.
Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale For any b, we have p (bx ) = g (b )p (x ) That is, if we increase the scale by which we measure x by a factor of b, the shape of the distribution p (x ) is unchanged, except for a multiplicative constant The PL distribution is the only distribution with this property
24.
Random numbersFor the continuous case, we can generate random numbers using thestandard inversion method: x = xmin (1 − u )−1/(α−1)where U ∼ U (0, 1)
25.
Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search
26.
Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search So for a given u, we ﬁrst bound the solution to the equation via: 1: x2 := xmin 2: repeat 3: x1 := x2 4: x2 := 2x1 5: until P (x2 ) < 1 − u Basically, the algorithm tests whether u ∈ [x , 2x ), starting with x = xmin Once we have the region we use a binary search
28.
Fitting power law distributionsSuppose we know xmin and wish to estimate the exponent α.
29.
Method 1 1 Bin your data: [xmin , xmin + x ), [xmin + x , xmin + 2 x) 2 Plot your data on a log-log plot 3 Use least squares to estimate α Bin size: 0.01 Bin size: 0.1 Bin size: 1.0 100 10−1 10−2 CDF 10−3 10−4 10−5 100 101 102 103 100 101 102 103 100 101 102 103 xYou could also use logarithmic binning (which is better) or should I say not asbad?
30.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea
31.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α
32.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α On the bright side you do get a good R 2 value
33.
Method 3: Log-Likelihood The log-likelihood isn’t that hard to deriveContinuous: n xi (α|x , xmin ) = n log(α − 1) − n log(xmin ) − α ∑ log i =1 xminDiscrete: n (α|x , xmin ) = −n log[ζ (α, xmin )] − α ∑ log(xi ) i =1 xmin −1 n = −n log[ζ (α)] + n log ∑ xi − α ∑ log(xi ) i =1 i =1
34.
MLEsMaximising the log-likelihood gives −1 n xi ˆ α = 1+n ∑ ln xxmin i =1An estimate of the associated error is α−1 σ= √ n
35.
MLEsMaximising the log-likelihood gives −1 n xi ˆ α = 1+n ∑ ln xxmin i =1An estimate of the associated error is α−1 σ= √ nThe discrete case is a bit more tricky and involves ignoring higher order terms,to get: −1 n xi ˆ α 1+n ∑ ln xxmin − 0.5 i =1
36.
Estimating xmin Recall that the power-law pdf is −α α−1 x p (x ) = xmin xmin where α > 1 and xmin > 0 xmin isn’t a parameter in the usual since - it’s a cut-off in the state space Typically power-laws are only present in the distributional tails. So how much of the data should we discard so our distribution ﬁts a power-law?
37.
Estimating xmin : method 1 The most common way is just look at the log-log plot What could be easier! Blackouts Fires Flares 100 10−2 10−4 10−6 10−8 1−P(x) Moby Dick Terrorism Web links 100 10−2 10−4 10−6 10−8 100 102 104 106 100 102 104 106 100 102 104 106 x
38.
Estimating xmin : method 2 Use a "Bayesian approach" - the BIC: −2 + k ln n = −2 + xmin ln n Increasing xmin increases the number of parameters Only suitable for discrete distributions
39.
Estimating xmin : method 3 Minimise the distance between the data and the ﬁtted model CDFs: D = max |S (x ) − P (x )| x ≥xmin where S (x ) is the CDF of the data and P (x ) is the theoretical CDF (the Kolmogorov-Smirnov statistic) Our estimate xmin is then the value of xmin that minimises D Use some form of bootstrapping to get a handle on uncertainty of xmin
41.
Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m http://activerain.com/
42.
Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m The distribution of word frequency has http://activerain.com/ the form p (x ) ∼ x −α
43.
Relationship between α value and Zipf’s principle of leasteffort.α value Examples in literature Least effort forα < 1.6 Advanced schizophrenia1.6 ≤ α < 2 Military combat texts, Wikipedia, Web Annotator pages listed on the open directory projectα=2 Single author texts Equal effort levels2 < α ≤ 2.4 Multi author texts Audienceα > 2.4 Fragmented discourse schizophrenia
44.
Random walks Suppose we have a 1d random walk At each unit of time, we move ±1 4 q q q q 2 q q q q q q q q q q q q Position 0 q q q q q q q q q q q q q q q q q −2 q q q q −4 0 10 20 30 Time
45.
Random walks Suppose we have a 1d random walk At each unit of time, we move ±1 4 q q q q 2 q q q q q q q q q q q q Position 0 q q q q q q q q q q q q q q q q q −2 q q q q −4 0 10 20 30 Time If we start at n = 0, what is the probability for the ﬁrst return time at time t
46.
Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2!
47.
Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2! Tenuous link to phylogenetics
48.
Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc
49.
Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc When p is small, s is independent of the lattice size When p is large, s depends on the lattice size
50.
Phase transitions and critical phenomena p=0.3 As we increase p, the value of s also increases For some p, s starts to increase with the lattice size p=0.5927... This is know as the critical value, and is p = pc = 0.5927462.. If we calculate the distribution of p (s ), then when p = pc , p (s ) follows a power-law distribution p=0.9
51.
Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc
52.
Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc This is an example of self-organised criticality
53.
Future work There isn’t even an R package for power law estimation Writing this talk I have (more or less) written one Use a Bayesian change point model to estimate xmin in a vaguely sensible way RJMCMC to change between the power law and other heavy tailed distributionsReferences A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-lawdistributionsinempiricaldata. http://arxiv.org/abs/0706.1062 MEJ Newman. Powerlaws,ParetodistributionsandZipf’slaw. http://arxiv.org/abs/cond-mat/0412004
Be the first to comment