Power Laws:
Rich-Get-Richer Phenomena
Chapter 18: “Networks, Crowds and
Markets”
By Amir Shavitt
Popularity
• How do we measure it?
• How does it effect the network?
• Basic network models with popularity
Popularity Examples
• Books
• Movies
• Music
• Websites
Our Model
• The web as a directed graph
– Nodes are webpages
– Edges are hyperlinks
– #in-links = popularity
– 1 out-link per page
www.cs.tau.ac.il
www.tau.ac.il
www.eng.tau.ac.il
course homepage
Our Main Question
• As a function of k, what fraction of pages on
the web have k in-links?
Expected Distribution
• Normal distribution
– Each link addition is an experiment
Power Laws
• f(k)  1/kc
• Usually c > 2
• Tail decreases much slower than the normal
distribution
Examples
• Telephone numbers that receive k calls per
day
• Books bought by k people
• Scientific papers that receive k citations
Power Laws Graphs Examples
Power Laws vs Normal Distribution
• Normal distribution – many independent
experiments
• Power laws – if the data measured can be
viewed as a type of popularity
What causes power laws?
• Correlated decisions across a population
• Human tendency to copy decisions
Our Model
• The web as a directed graph
– Nodes are webpages
– Edges are hyperlinks
– #in-links = popularity
– 1 out-link per page
www.cs.tau.ac.il
www.tau.ac.il
www.eng.tau.ac.il
course homepage
Building a Simple Model
• Webpages are created in order 1,2,3,…,N
– Dynamic network growth
• When page j is created, with probability:
– p: Chooses a page uniformly at random among all
earlier pages and links to it
– 1-p: Chooses a page uniformly at random among
all earlier pages and link to its link
Example
p
1-p
Results
1
10
100
1000
10000
100000
1000000
1 10 100 1000 10000 100000 1000000
count
#in-degree ( k )
p=0.1
p=0.3
p=0.5
y = 78380k-2.025
y = 835043x-2.652
Conclusions
• p gets smaller → more copying → the
exponent c gets smaller
• More likely to see extremely popular pages
Rich-Get-Richer
• With probability (1-p), chooses a page k with
probability proportional to k’s #in-links
• A page that gets a small lead over others will
tend to extend this lead
Initial Phase
• Rich-get-richer dynamics amplifies differences
• Sensitive to disturbance
• Similar to information cascades
How Sensitive?
• Music download site with 48 songs
• 8 “parallel” copies of the site
• Download count for each song
• Same starting point
Subjects
Social influence
condition
Independent
condition
World 1
World 8
World
How many people
have chosen to
download this song?
Source: Music Lab, http://www.musiclab.columbia.edu/
Results
• Exp. 1: songs shown in
random order
• Exp. 2: songs shown in
order of download
popularity
Gini = A / (A + B)
The Lorenz curve is a graphical
representation of the cumulative
distribution function
Gini Coefficient
0
2
4
6
song
1
song
2
song
3
song
4
0
5
10
15
20
25
rank
≤ 1
rank
≤ 2
rank
≤ 3
rank
≤ 4
0
5
10
15
song
1
song
2
song
3
song
4
0
5
10
15
20
25
rank
≤ 1
rank
≤ 2
rank
≤ 3
rank
≤ 4
sum of downloads of all songs with rank  i
Findings
• Best songs rarely did poorly
• Worst songs rarely did well
• Anything else was possible
• The greater the social influence, the more
unequal and unpredictable the collective
outcomes become
Zipf Plot
10
0
10
1
10
2
10
3
10
4
10
0
10
1
10
2
10
3
10
4
node degree for AS20000102.m
Zipf plot - plotting the data on a log-log graph, with the axes being log (rank order)
and log (popularity)
rank order
popularity
The Long Tail
What number of items have popularity at least k?
1
10
100
1000
10000
100000
1000000
1 10 100 1000 10000 100000
popularity
(#in-degree)
rank
p=0.1
n=1,000,000
Tail contains
990,000 nodes
Why is it Important?
Search Tools and Recommendation
Systems
• How does it effect rich-get-richer dynamics?
• How do we find niche products?

powerlaws.pptx

  • 1.
    Power Laws: Rich-Get-Richer Phenomena Chapter18: “Networks, Crowds and Markets” By Amir Shavitt
  • 2.
    Popularity • How dowe measure it? • How does it effect the network? • Basic network models with popularity
  • 3.
    Popularity Examples • Books •Movies • Music • Websites
  • 4.
    Our Model • Theweb as a directed graph – Nodes are webpages – Edges are hyperlinks – #in-links = popularity – 1 out-link per page www.cs.tau.ac.il www.tau.ac.il www.eng.tau.ac.il course homepage
  • 5.
    Our Main Question •As a function of k, what fraction of pages on the web have k in-links?
  • 6.
    Expected Distribution • Normaldistribution – Each link addition is an experiment
  • 7.
    Power Laws • f(k) 1/kc • Usually c > 2 • Tail decreases much slower than the normal distribution
  • 8.
    Examples • Telephone numbersthat receive k calls per day • Books bought by k people • Scientific papers that receive k citations
  • 9.
  • 10.
    Power Laws vsNormal Distribution • Normal distribution – many independent experiments • Power laws – if the data measured can be viewed as a type of popularity
  • 11.
    What causes powerlaws? • Correlated decisions across a population • Human tendency to copy decisions
  • 12.
    Our Model • Theweb as a directed graph – Nodes are webpages – Edges are hyperlinks – #in-links = popularity – 1 out-link per page www.cs.tau.ac.il www.tau.ac.il www.eng.tau.ac.il course homepage
  • 13.
    Building a SimpleModel • Webpages are created in order 1,2,3,…,N – Dynamic network growth • When page j is created, with probability: – p: Chooses a page uniformly at random among all earlier pages and links to it – 1-p: Chooses a page uniformly at random among all earlier pages and link to its link
  • 14.
  • 15.
    Results 1 10 100 1000 10000 100000 1000000 1 10 1001000 10000 100000 1000000 count #in-degree ( k ) p=0.1 p=0.3 p=0.5 y = 78380k-2.025 y = 835043x-2.652
  • 16.
    Conclusions • p getssmaller → more copying → the exponent c gets smaller • More likely to see extremely popular pages
  • 17.
    Rich-Get-Richer • With probability(1-p), chooses a page k with probability proportional to k’s #in-links • A page that gets a small lead over others will tend to extend this lead
  • 18.
    Initial Phase • Rich-get-richerdynamics amplifies differences • Sensitive to disturbance • Similar to information cascades
  • 19.
    How Sensitive? • Musicdownload site with 48 songs • 8 “parallel” copies of the site • Download count for each song • Same starting point Subjects Social influence condition Independent condition World 1 World 8 World
  • 20.
    How many people havechosen to download this song? Source: Music Lab, http://www.musiclab.columbia.edu/
  • 21.
    Results • Exp. 1:songs shown in random order • Exp. 2: songs shown in order of download popularity Gini = A / (A + B) The Lorenz curve is a graphical representation of the cumulative distribution function
  • 22.
    Gini Coefficient 0 2 4 6 song 1 song 2 song 3 song 4 0 5 10 15 20 25 rank ≤ 1 rank ≤2 rank ≤ 3 rank ≤ 4 0 5 10 15 song 1 song 2 song 3 song 4 0 5 10 15 20 25 rank ≤ 1 rank ≤ 2 rank ≤ 3 rank ≤ 4 sum of downloads of all songs with rank  i
  • 23.
    Findings • Best songsrarely did poorly • Worst songs rarely did well • Anything else was possible • The greater the social influence, the more unequal and unpredictable the collective outcomes become
  • 24.
    Zipf Plot 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 node degreefor AS20000102.m Zipf plot - plotting the data on a log-log graph, with the axes being log (rank order) and log (popularity) rank order popularity
  • 25.
    The Long Tail Whatnumber of items have popularity at least k? 1 10 100 1000 10000 100000 1000000 1 10 100 1000 10000 100000 popularity (#in-degree) rank p=0.1 n=1,000,000 Tail contains 990,000 nodes
  • 27.
    Why is itImportant?
  • 28.
    Search Tools andRecommendation Systems • How does it effect rich-get-richer dynamics? • How do we find niche products?