Are Powerlaws Useful?

Are Power Laws Useful?

Michael P.H. Stumpf

Theoretical Systems Biology Group

07/03/2013

Are Power Laws Useful? Michael P.H. Stumpf 1 of 28

Outline

Power Laws: Ubiquituous, Universal, Useful?

Critical Phenomena and Scaling Behaviour

Empirical Power Laws

Simple Models of Network Evolution

More Normal Than Normal

Summary

Are Power Laws Useful? Michael P.H. Stumpf 2 of 28

What Do We Mean By A Powerlaw?

Power Law Relationships

Y ∝ Xλ

Are Power Laws Useful? Michael P.H. Stumpf Power Laws: Ubiquituous, Universal, Useful? 3 of 28



log(Y )

Y ∝ Xλ

log(X )




log(Y ) log(p(X ))

Y ∝ Xλ

log(X ) log(X )


Power Laws as Physical Scaling Relationships

Laminar Flow

r
R
x

Are Power Laws Useful? Michael P.H. Stumpf Critical Phenomena and Scaling Behaviour 4 of 28


Laminar Flow

r
R
x

Universality in Flow
For sufficiently low Reynolds number the velocity profile is universal
for all pipes and fluids:

v (r ) r2
= 1−
v (0) R2



1D Josephson-Junction Array in insulating and super conducting phase
Sondhi et al., Rev.Mod.Phys. 69:315 (1997).

As the critical temperature is approached from above, the correlation
length increases as some power of the reduced temperature,
−ν
T − Tc
ξ∝
Tc

Making Sense of Power Laws in Physics

The Ising Model

↑ ↓ ↑ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↑ ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑
↑ ↓ ↓ ↓ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↑ ↑ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
T > TC T ≈ TC T < TC



The Ising Model

↓ ↓ ↑ ↑ ↓ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↑ ↓ ↓ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↓ ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↓ ↑ ↓ ↑ ↑ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
↑ ↓ ↓ ↓ ↓ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
T > TC T ≈ TC T < TC

Critical Exponents
For θC = T −TC we have for any macroscopic variable F (θC ) in the
TC
vicinity of the critical point θC ≈ 0,

F (θC ) = const. × θ−λ .
C



Renormalization Group Theory

↑ ↓ ↑ ↑ ↑ ↑
↓ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↓
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑

Renormalization Group Transformation (RGT)
The partition function, Z of a physical system must be invariant under
the RGT:

Z= exp (−βH [Si ]) = exp −βH [Sα ] = Z
[Si ] [Sα ]




↑ ↓ ↑ ↑ ↑ ↑
↑ ↑
↓ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↓
↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑

the RGT:

[Si ] [Sα ]




↑ ↓ ↑ ↑ ↑ ↑
↑ ↑
↓ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↓
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑

the RGT:

[Si ] [Sα ]




↑ ↓ ↑ ↑ ↑ ↑
↓ ↑ ↑
↓ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↓
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑

the RGT:

[Si ] [Sα ]




↑ ↓ ↑ ↑ ↑ ↑
↓ ↑ ↑
↓ ↑ ↑ ↑ ↑ ↑
↑ ↑
↑ ↑ ↑ ↑ ↑ ↓
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑
↑ ↑ ↑
↑ ↑ ↑ ↑ ↑ ↑

the RGT:

[Si ] [Sα ]


Behaviour Around Critical Points

The fixed points of the RGT define the possible macroscopic states of
the system, e.g. ferromagnet vs. paramagnet.
Of particular interest are the non-trivial fixed points, 0 < Tc , J < ∞,
which mark the occurrence of a phase transition.



In this case changing the scale does not change the physics. For our
spin-system, for example, there is long-range correlation which
extends beyond the lattice spacing.



In this case changing the scale does not change the physics. For our
spin-system, for example, there is long-range correlation which
extends beyond the lattice spacing.

Critical Points
At the critical point the correlation length diverges as a power law

ξ ∝ θ−ν
c

Note that knowledge of the critical point does not tell us necessarily
what the state of the system on either side is.


Heureka!

Mason Porter, http://www.quickmeme.com/meme/3sqh80/
Are Power Laws Useful? Michael P.H. Stumpf Empirical Power Laws 7 of 28

Simple Scaling Laws

West et al., PNAS, 99:2473 (2002).
May & Stumpf, Science, 290:2084 (2000).


Simple Scaling Laws

West et al., PNAS, 99:2473 (2002).
May & Stumpf, Science, 290:2084 (2000).

Plausible Theories and Simple Models
Even when simple, plausible physical arguments can be put forward
In an area that is twice as large, we can accommodate four
times as many species, N ∝ A2

the empirical results are better described by other phenomenological
distributions.

Illusions of Invariance
We consider off-spring weaning weight, w, and maternal weight, m,
and seek to understand w /m. If this ratio is invariant then log(w )
plotted against log(m) should have a regression slope of 1.0.


The Inverse is not true! A slope of 1.0 does not imply invariance.


For example, consider the model where log(w ) = log(m) + log(c )
where log(c ) is a non-normally distributed error, , c ∼ U[0, 1].


We then have
Var[log(w )] − Var[log(c )] Var[log(m)]
R2 = =
Var[log(w )] Var[log(m)] + Var[log(c )]


We then have
Var[log(w )] − Var[log(c )] Var[log(m)]
R2 = =
Var[log(w )] Var[log(m)] + Var[log(c )]

Nee et al., Science, 309:1236 (2005).

Scale-Free Networks
The Beginnings

A:

Actor collaboration; B: WWW; C: Power grid

Barabasi & Albert, Science, 286:510 (1999).

Ever since many real-world
networks have been
“discovered” to be
Partial alignment of Human and Fly protein-interaction network (PIN).
scale-free.


Scale-Free Networks
The Beginnings

A:

Actor collaboration; B: WWW; C: Power grid

Barabasi & Albert, Science, 286:510 (1999).

Ever since many real-world
networks have been
“discovered” to be
Partial alignment of Human and Fly protein-interaction network (PIN).
scale-free.

What Are “Scale-Free” Networks?
Typically this means that the degree distributions is scale-free, i.e.
Pr(αk )
= const. ∀k .
Pr(k )

Are Networks Scale-Free?

Saccharomyces cerevisiae PIN


If D = {d1 , . . . , dn } is the empirical degree
distribution and Prm (k |θ) the probability to
observe degree k for model m ∈ M, with
{M1 , . . . , Mq }, and parameter θ then the
likelihood is given by
n
Lm (θm ) = Prm (k |θ).
i =1

This allows us to compare different models in
light of the data (using e.g. the AIC or BIC to
enforce parsimony).


If D = {d1 , . . . , dn } is the empirical degree
distribution and Prm (k |θ) the probability to
observe degree k for model m ∈ M, with
{M1 , . . . , Mq }, and parameter θ then the
likelihood is given by
n
Lm (θm ) = Prm (k |θ).
i =1

This allows us to compare different models in
light of the data (using e.g. the AIC or BIC to
enforce parsimony).
So far, no network has been found to be
scale-free when proper statistical analysis was
applied.
Stumpf & Ingram, Europhys. Lett. 71:152 (2005); Tanaka et al., FEBS Lett. 579:5140 (2005); Khanin & Wit, J.Comp.Biol.
13:810 (2006).

Real Networks Are Scale Rich

Tanaka, Phys.Rev.Lett. 94:168101 (2005).


Other Statistical Challenges
Incomplete and Noisy Data

Sub-nets of scale
free networks are
not scale-free.
Stumpf et al., PNAS 102:4221,
(2005);
Wiuf & Stumpf, Proc.Roy.Soc. A
462:1181 (2006).


Other Statistical Challenges
Incomplete and Noisy Data

Sub-nets of scale
free networks are
not scale-free.
Stumpf et al., PNAS 102:4221,
(2005);
Wiuf & Stumpf, Proc.Roy.Soc. A
462:1181 (2006).

Truncated Power Laws
Pr(k ) ∝ k −λ for klow < k < khigh
It is hard to see what is gained by this: the statistical power is lower
than that of other mixture models, and the elegance of the
interpretation is no longer given.


Evolving Networks
α

δ

δ
α

δ

γ

δ
Are Power Laws Useful? Michael P.H. Stumpf
γ
Simple Models of Network Evolution 14 of 28

Evolving Networks

Model-Based Evolutionary Analysis
• For sequence data we use models of nucleotide substitution in
order to infer phylogenies in a likelihood or Bayesian framework.
• None of these models — even the general time-reversible model
— are particularly realistic; but by allowing for complicating factors
e.g. rate variation we capture much of the variability observed
across a phylogenetic panel.
• Modes of network evolution will be even more complicated and
exhibit high levels of contingency; moreover the structure and
function of different parts of the network will be intricately linked.
• Nevertheless we believe that modelling the processes underlying
the evolution of networks can provide useful insights; in particular
we can study how functionality is distributed across groups of
genes.

Are Power Laws Useful? Michael P.H. Stumpf Simple Models of Network Evolution 14 of 28

Network Evolution Models

(a) Duplication attachment (b) Duplication attachment
with complimentarity

wj
(c) Linear preferential wi
(d) General scale-free
attachment

ABC on Networks

Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
evolution, but this does not allow us to
calculate likelihoods for all but very
trivial models.
• There is also no sufﬁcient statistic that
would allow us to summarize networks,
so ABC approaches require some
thought.
• Many possible summary statistics of
networks are expensive to calculate.
Full likelihood: Wiuf et al., PNAS (2006).
ABC: Ratman et al., PLoS Comp.Biol. (2008).
Stumpf & Wiuf, J. Roy. Soc. Interface (2010).


Graph Spectrum
c a b c d e
 
0 1 1 1 0 a
 
a d e 
 1 0 1 1 0 b

A = 1 1 0 0 0 c
 

 1 1 0 0 1 d

b 0 0 0 1 0 e

Graph Spectra
Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E
with i , j ∈ N, the adjacency matrix, A, of the graph is deﬁned by
1 if (i , j ) ∈ E ,
ai ,j =
0 otherwise.
The eigenvalues, λ, of this matrix provide one way of deﬁning the
graph spectrum.

Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

D (A, B ) = (ai ,j − bi ,j )2 .
i ,j


Spectral Distances

D (A, B ) = (ai ,j − bi ,j )2 .
i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 ,
i ,j


Spectral Distances

D (A, B ) = (ai ,j − bi ,j )2 .
i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 ,
i ,j

Given a spectrum (which is relatively cheap to compute) we have

(α) (β) 2
D (A, B ) = λl − λl
l


ABC using Graph Spectra

For an observed network, N, and a simulated network, Sθ , we use the
distance between the spectra

(N) (S) 2
D (N, Sθ ) = λl − λl ,
l

in our ABC SMC procedure. Note that this distance is a close lower
bound on the distance between the raw data; we therefore do not
have to bother with summary statistics.
Also, calculating graph spectra costs as much as calculating other
O (N 3 ) statistics (such as all shortest paths, the network diameter or
the within-reach distribution).
Thorne & Stumpf, J.Roy.Soc. Interface, 9:2653 (2012).


Protein Interaction Network Data
Species Proteins Interactions Genome size Sampling fraction
S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35

Thorne & Stumpf, J.Roy.Soc. Interface, 9:2653 (2012).


S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35

0.5

0.4
Model probability

Organism
0.3 S.cerevisae
D.melanogaster
H.pylori
E.coli

0.2

0.1

0.0

DA DAC LPA
Model
SF DACL DACR Thorne & Stumpf, J.Roy.Soc. Interface, 9:2653 (2012).


S.cerevisiae 5035 22118 6532 0.77
D. melanogaster 7506 22871 14076 0.53
H. pylori 715 1423 1589 0.45
E. coli 1888 7008 5416 0.35

Model Selection
0.5

• Inference here was based on all
0.4 the data, not summary
statistics.
Model probability

Organism
0.3 S.cerevisae
D.melanogaster
H.pylori
• Duplication models receive the
E.coli

0.2 strongest support from the data.
• Several models receive support
0.1
and no model is chosen
0.0
unambiguously.
DA DAC LPA
Model
SF DACL DACR Thorne & Stumpf, J.Roy.Soc. Interface, 9:2653 (2012).


PIN Model Evolution


Power Laws As Phenomenological Models
We have seen above that power laws emerge naturally in the context
of continuous phase transitions; but we have also seen that they offer
at best limited insights for ﬁnite systems.
Why do they nevertheless appear so often?

Are Power Laws Useful? Michael P.H. Stumpf More Normal Than Normal 22 of 28

Power Laws As Phenomenological Models
We have seen above that power laws emerge naturally in the context
of continuous phase transitions; but we have also seen that they offer
at best limited insights for ﬁnite systems.
Why do they nevertheless appear so often?
Revisiting the Renormalization Group
Let f (X ) be a probability distribution. Then the RGT acting on it is
deﬁned as ∞
Ta f (X = x ) := |a| f (ax − s)f (s)ds.
−∞

Ta f (X ) is the pdf of the random variable

X1 + X2
Y = with X1 , X2 ∼ f (X ).
a
Here a controls the qualitative properties of the transformation.
Calvo et al., J.Stat.Phys. 141:409 (2010).


Renormalization Group Transformation of PDFs
Here we are after the ﬁxed points, f0 , of the RGT,

Ta f0 (x ) = f0 (x )



Ta f0 (x ) = f0 (x )

When all moments of f (X ) exist and are ﬁnite then we can determine
the moments of the transformed distribution,
n
1 n
Eg [Y n ] = ETa f [Y n ] = Ef X i Ef X n−i
an i
i =0



Ta f0 (x ) = f0 (x )

n
1 n
an i
i =0

Some Fixed Points of the RGT
√ √ √
a< 2 a= 2 a> 2
(2n)!
lim ETa f0 [x 2 ] = ∞
m Ef0 [x 2n ] = Ef0 [x 2 ])n lim ETa f0 [x 2 ] = 0
m
m→∞ n!2n m→∞



Ta f0 (x ) = f0 (x )

n
1 n
an i
i =0

Some Fixed Points of the RGT
√ √ √
a< 2 a= 2 a> 2
(2n)!
lim ETa f0 [x 2 ] = ∞
m Ef0 [x 2n ] = Ef0 [x 2 ])n lim ETa f0 [x 2 ] = 0
m
m→∞ n!2n m→∞

— N(0, 1) δ(x )


Central Limit Theorems
The conventional central limit theorem emerges as the ﬁxed point of
the RGT for distributions where all moments are ﬁnite.


Now we look at the characteristics function, φ(t ) = Ef [eitX ] and obtain
for the ﬁxed points, φ0 (t ),

ϕ0 (t /a)2 = ϕ0 (t )


Now we look at the characteristics function, φ(t ) = Ef [eitX ] and obtain
for the fixed points, φ0 (t ),

ϕ0 (t /a)2 = ϕ0 (t )

General Fixed Points
´
The general fixed points of the RGT are given by the Levy-stable laws,

ϕ0 (t ) = Sα,A) (k ) := exp −A|t |α θ(t ) − A|t |α θ(−k )
¯

with |a| = 21/α , A the complex conjugate of A and θ(x ) the Heaviside
¯
step function.
For α = 2 we recover the Gaussian, and for all α < 2 we obtain the
heavy-tailed stable laws. For α < 1 the mean is infinite.


Power Law vs. Gaussian Distribution
From the above we see that the conventional CLT is in fact a very
special case of a much more general form of the CLT.
Thus we would expect fat-tailed distributions (by whichever sensible
deﬁnition) to occur frequently.


Gaussian Distributions are stable under aggregation and
marginalization.


marginalization.
Scaling Distributions are stable under aggregation, mixture,
maximisation and marginalization.


marginalization.
Scaling Distributions are stable under aggregation, mixture,
maximisation and marginalization.

More Normal Than Normal
In this sense scaling or fat-tailed distributions should be expected to
occur very frequently. For low-variability data we obtain the Gaussian
as a ﬁxed point.
Willinger et al., Proceedings of the 2004 Winter Simulation Conference, 130 (2004).

It is thus also easy to come up with simple mechanisms that give rise
to dispersed data.

Power Laws, Evidence, Usefulness
Statistical Support
Allometric
Scaling
Internet

Zipf’s Law

C. elegans
nervous Mechanistic
system Sophistication

S. cerevisiae PIN
Stumpf & Porter, Science, 335:665 (2012).

Are Power Laws Useful? Michael P.H. Stumpf Summary 26 of 28

How to Check if You Have a Power Law in Your Data?

Y ∝ X −λ YES or NO ?



1. Would a power law relationship offer profound new insights?



◦ Simply reporting a power law or scaling relationship is not exciting.



◦ Could you just have a very dispersed random variable, Y ?



2. Does it extend over at least three orders of magnitude in both
variables (sanity check)?



3. Does the power law relationship hold up in comparison to other
distributions (log-normal, stretched exponential, negative
binomial)?



binomial)?
4. Have you got a non-trivial and meaningful theoretical model that
gives rise to the power law and which yields mechanistic insights?



binomial)?
4. Have you got a non-trivial and meaningful theoretical model that
gives rise to the power law and which yields mechanistic insights?

A Useful Scientific Theory
Failing at any of these hurdles does not mean that the scientific
problem is boring or trivial. Power laws add a lot to the theory of
critical phenomena/fluid dynamics etc. but very little elsewhere.

Acknowledgements
Theoretical Systems Biology Group

• Imperial College
London
◦ Thomas Thorne
◦ William Kelly
• Oxford
University
◦ Robert May
◦ Mason Porter
• Kopenhagen
University
◦ Carsten Wiuf
www.theosysbio.bio.ic.ac.uk


Are Powerlaws Useful?

Recommended

Recommended

More Related Content

More from Michael Stumpf

More from Michael Stumpf (7)

Are Powerlaws Useful?