SlideShare a Scribd company logo
Reference classes: a case study with the poweRlaw
package
Colin Gillespie
Newcastle University, UK
http://aperiodical.com/2013/01/log-log-whos-there-not-a-power-law/
The power law distribution
Name f (x) Notes
Power law x−α Pareto distribution
Log-normal 1
x
exp(−(ln(x)−µ)2
2σ2 )
Exponential e−λx
Power law x−α Zeta distribution
Power law x−α x = 1, . . . , n, Zipf’s dist’
Yule
Γ(x)
Γ(x+α)
Poisson λx
/x!
Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
Alleged power-law phenomena
The frequency of occurrence of unique words in the novel Moby Dick by
Herman Melville
The numbers of customers affected in electrical blackouts in the United
States between 1984 and 2002
The number of links to web sites found in a 1997 web crawl of about 200
million web pages
The number of hits on web pages
The number of papers scientist write
The number of citations received by papers
Annual incomes
Sales of books, music; in fact anything that can be sold
Zipf plots
Blackouts Fires Flares
Moby Dick Terrorism Web links
10−8
10−6
10−4
10−2
100
10−8
10−6
10−4
10−2
100
100
102
104
106
100
102
104
106
100
102
104
106
x
1−P(x)
The power law distribution
The power-law distribution is
p(x) ∝ x−α
where α, the scaling parameter, is constant
The scaling parameter typically lies in the range 2 < α < 3, although
there are some occasional exceptions
When α < 2, all moments are infinite
The power law distribution
The power-law distribution is
p(x) ∝ x−α
where α, the scaling parameter, is constant
The scaling parameter typically lies in the range 2 < α < 3, although
there are some occasional exceptions
When α < 2, all moments are infinite
Typically, the entire process doesn’t obey a power law
Instead, the power law applies only for values greater than some
minimum xmin
Power law: PMF & CMF
Discrete power law, the PMF is
p(x) =
x−α
ζ(α, xmin)
where α > 1, xmin ≥ 1 and
ζ(α, xmin) =
∞
∑
n=0
(n + xmin)−α
is the generalised zeta function
When xmin = 1, ζ(α, 1) is the standard
zeta function
PDF
CDF
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0 10 20 30 40 50
x
1.50 1.75 2.00 2.25 2.50
α
Fitting power laws
The main technique for fitting power laws comes from Clausett et al, 2009
This paper gets around ten new citations a week
Estimating α given xmin is straightforward - just use the mle
The lower cut-off, xmin, is estimated using a Kolmogorov-Smirnoff
approach
The poweRlaw package
The package is available on CRAN and at
https://github.com/csgillespie/poweRlaw
Makes fitting power laws easy to fit
Crucially, it makes fitting (to the tails) of the log normal, exponential,
Poisson equally easy
Consistent interface between distributions
Estimate parameter uncertainty
Compare distributions (statistically and visually)
Case study: Moby Dick
R> m_pl = displ$new(moby)
Case study: Moby Dick
R> m_pl = displ$new(moby)
R> plot(m_pl)
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
q
q
q
q
q
q
q
Words
CDF
100
101
102
103
104
10−4
10−3
10−2
10−1
100
Case study: Moby Dick
R> m_pl = displ$new(moby)
R> (est = estimate_xmin(m_pl))
$KS
[1] 0.009229
$xmin
[1] 7
$pars
[1] 1.95
attr(,"class")
[1] "estimate_xmin"
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
q
q
q
q
q
q
q
Words
CDF
100
101
102
103
104
10−4
10−3
10−2
10−1
100
Case study: Moby Dick
R> m_pl = displ$new(moby)
R> est = estimate_xmin(m_pl)
R> m_pl$setXmin(est)
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
q
q
q
q
q
q
q
Words
CDF
100
101
102
103
104
10−4
10−3
10−2
10−1
100
Case study: Moby Dick
R> m_pl = displ$new(moby)
R> est = estimate_xmin(m_pl)
R> m_pl$setXmin(est)
R> lines(m_pl)
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
q
q
q
q
q
q
q
Words
CDF
100
101
102
103
104
10−4
10−3
10−2
10−1
100
Case study: Moby Dick
R> m_pl = displ$new(moby)
R> est = estimate_xmin(m_pl)
R> m_pl$setXmin(est)
R> lines(m_pl)
R> m_ln = dislnorm$new(moby)
R> est = estimate_xmin(m_ln)
R> m_ln$setXmin(est)
R> lines(m_ln)
q
q
q
q
q
q
q
q
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q
q
q
q
q
q
q
q
q
Words
CDF
100
101
102
103
104
10−4
10−3
10−2
10−1
100
Why use objects?
Each distribution is represented by an object:
Parent class: distribution
Power-law: displ, log-normal: disln, . . .
Method dispatch on object class:
dist_pdf(m) returns the probability density function based on the class of
m
Consistent interface:
Bootstrapping:
R> bootstrap(m)
Model selection:
R> compare_distributions(m1, m2)
Simple interface that enables easy addition of new distributions (currently
there are seven available distributions to fit)
Reference classes
Reference classes behave like classes in C++, Python and many other
languages - not like standard R classes
You can use these classes with ordinary R expressions and functions
An extension to core R (October, 2010)
Big difference - mutable state
Mutable states
R> displ = setRefClass("displ", fields = "xmin")
R> d1 = displ$new(xmin = 1)
R> d1$xmin
[1] 1
Mutable states
R> displ = setRefClass("displ", fields = "xmin")
R> d1 = displ$new(xmin = 1)
R> d1$xmin
[1] 1
R> d2 = d1
R> d2$xmin = 100
R> d2$xmin
[1] 100
Mutable states
R> displ = setRefClass("displ", fields = "xmin")
R> d1 = displ$new(xmin = 1)
R> d1$xmin
[1] 1
R> d2 = d1
R> d2$xmin = 100
R> d2$xmin
[1] 100
R> d1$xmin
[1] 100
Mutable states
When estimating xmin, a naive implementation makes this calculation slow
Efficient caching speeds up calculations 100 fold
For example, using the call
R> m_pl$setXmin(10)
updates internal variables that makes future calculations quicker
Mutable states
When estimating xmin, a naive implementation makes this calculation slow
Efficient caching speeds up calculations 100 fold
For example, using the call
R> m_pl$setXmin(10)
updates internal variables that makes future calculations quicker
On creation of a distribution object, we make "multiple copies" of the data
R> x
R> cumsum(log(x))
using reference classes avoids constant copying and speeds up
calculations
R> pl_ref$xmin = 10
R> pl_s4@xmin = 10
Comments
Reference classes are still new
Code has now broken twice with R upgrades
roxygen2 and reference classes didn’t play well together
Very few questions on Stackoverflow on reference classes
Structuring code and files
Care has to be taken when using them with parallel computing
References
Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law
distributions in empirical data. SIAM review 51.4 (2009): 661–703.
poweRlaw package
https://github.com/csgillespie/poweRlaw

More Related Content

Similar to Reference classes: a case study with the poweRlaw package

CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talk
ilyaraz
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
Gaston Liberman
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
Don Sheehy
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
Teja Turk
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
HamzaHaji6
 
1519 differentiation-integration-02
1519 differentiation-integration-021519 differentiation-integration-02
1519 differentiation-integration-02
Dr Fereidoun Dejahang
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhad
Farhad Gholami
 
Many electrons atoms_2012.12.04 (PDF with links
Many electrons atoms_2012.12.04 (PDF with linksMany electrons atoms_2012.12.04 (PDF with links
Many electrons atoms_2012.12.04 (PDF with links
Ladislav Kocbach
 
Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)
shakimov
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...
sugeladi
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
Charles Deledalle
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Marina Santini
 
MarcoCeze_defense
MarcoCeze_defenseMarcoCeze_defense
MarcoCeze_defense
Marco Ceze
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
Steve Nouri
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23
Rahman Hakim
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
The Statistical and Applied Mathematical Sciences Institute
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
Data Science Warsaw
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
Kevlin Henney
 
Unit 1
Unit 1Unit 1
AbdoSummerANS_mod3
AbdoSummerANS_mod3AbdoSummerANS_mod3
AbdoSummerANS_mod3
Mohammad Abdo
 

Similar to Reference classes: a case study with the poweRlaw package (20)

CiE 2010 talk
CiE 2010 talkCiE 2010 talk
CiE 2010 talk
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
Bachelor_Defense
Bachelor_DefenseBachelor_Defense
Bachelor_Defense
 
Chapter-4 combined.pptx
Chapter-4 combined.pptxChapter-4 combined.pptx
Chapter-4 combined.pptx
 
1519 differentiation-integration-02
1519 differentiation-integration-021519 differentiation-integration-02
1519 differentiation-integration-02
 
slides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhadslides_low_rank_matrix_optim_farhad
slides_low_rank_matrix_optim_farhad
 
Many electrons atoms_2012.12.04 (PDF with links
Many electrons atoms_2012.12.04 (PDF with linksMany electrons atoms_2012.12.04 (PDF with links
Many electrons atoms_2012.12.04 (PDF with links
 
Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)Semantic Parsing with Combinatory Categorial Grammar (CCG)
Semantic Parsing with Combinatory Categorial Grammar (CCG)
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
MarcoCeze_defense
MarcoCeze_defenseMarcoCeze_defense
MarcoCeze_defense
 
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networksCheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
 
Mit2 092 f09_lec23
Mit2 092 f09_lec23Mit2 092 f09_lec23
Mit2 092 f09_lec23
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
Lambda? You Keep Using that Letter
Lambda? You Keep Using that LetterLambda? You Keep Using that Letter
Lambda? You Keep Using that Letter
 
Unit 1
Unit 1Unit 1
Unit 1
 
AbdoSummerANS_mod3
AbdoSummerANS_mod3AbdoSummerANS_mod3
AbdoSummerANS_mod3
 

More from Colin Gillespie

Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
Colin Gillespie
 
The tau-leap method for simulating stochastic kinetic models
The tau-leap method for simulating stochastic kinetic modelsThe tau-leap method for simulating stochastic kinetic models
The tau-leap method for simulating stochastic kinetic models
Colin Gillespie
 
Poster for Information, probability and inference in systems biology (IPISB 2...
Poster for Information, probability and inference in systems biology (IPISB 2...Poster for Information, probability and inference in systems biology (IPISB 2...
Poster for Information, probability and inference in systems biology (IPISB 2...
Colin Gillespie
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
Colin Gillespie
 
Moment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic ModelsMoment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic Models
Colin Gillespie
 
An introduction to moment closure techniques
An introduction to moment closure techniquesAn introduction to moment closure techniques
An introduction to moment closure techniques
Colin Gillespie
 
Speeding up the Gillespie algorithm
Speeding up the Gillespie algorithmSpeeding up the Gillespie algorithm
Speeding up the Gillespie algorithm
Colin Gillespie
 
Moment closure inference for stochastic kinetic models
Moment closure inference for stochastic kinetic modelsMoment closure inference for stochastic kinetic models
Moment closure inference for stochastic kinetic models
Colin Gillespie
 
WCSB 2012
WCSB 2012 WCSB 2012
WCSB 2012
Colin Gillespie
 
Bayesian inference for stochastic population models with application to aphids
Bayesian inference for stochastic population models with application to aphidsBayesian inference for stochastic population models with application to aphids
Bayesian inference for stochastic population models with application to aphids
Colin Gillespie
 

More from Colin Gillespie (10)

Bayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic ModelsBayesian Experimental Design for Stochastic Kinetic Models
Bayesian Experimental Design for Stochastic Kinetic Models
 
The tau-leap method for simulating stochastic kinetic models
The tau-leap method for simulating stochastic kinetic modelsThe tau-leap method for simulating stochastic kinetic models
The tau-leap method for simulating stochastic kinetic models
 
Poster for Information, probability and inference in systems biology (IPISB 2...
Poster for Information, probability and inference in systems biology (IPISB 2...Poster for Information, probability and inference in systems biology (IPISB 2...
Poster for Information, probability and inference in systems biology (IPISB 2...
 
Introduction to power laws
Introduction to power lawsIntroduction to power laws
Introduction to power laws
 
Moment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic ModelsMoment Closure Based Parameter Inference of Stochastic Kinetic Models
Moment Closure Based Parameter Inference of Stochastic Kinetic Models
 
An introduction to moment closure techniques
An introduction to moment closure techniquesAn introduction to moment closure techniques
An introduction to moment closure techniques
 
Speeding up the Gillespie algorithm
Speeding up the Gillespie algorithmSpeeding up the Gillespie algorithm
Speeding up the Gillespie algorithm
 
Moment closure inference for stochastic kinetic models
Moment closure inference for stochastic kinetic modelsMoment closure inference for stochastic kinetic models
Moment closure inference for stochastic kinetic models
 
WCSB 2012
WCSB 2012 WCSB 2012
WCSB 2012
 
Bayesian inference for stochastic population models with application to aphids
Bayesian inference for stochastic population models with application to aphidsBayesian inference for stochastic population models with application to aphids
Bayesian inference for stochastic population models with application to aphids
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 

Reference classes: a case study with the poweRlaw package

  • 1. Reference classes: a case study with the poweRlaw package Colin Gillespie Newcastle University, UK http://aperiodical.com/2013/01/log-log-whos-there-not-a-power-law/
  • 2. The power law distribution Name f (x) Notes Power law x−α Pareto distribution Log-normal 1 x exp(−(ln(x)−µ)2 2σ2 ) Exponential e−λx Power law x−α Zeta distribution Power law x−α x = 1, . . . , n, Zipf’s dist’ Yule Γ(x) Γ(x+α) Poisson λx /x!
  • 3. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages
  • 4. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages The number of hits on web pages The number of papers scientist write The number of citations received by papers Annual incomes Sales of books, music; in fact anything that can be sold
  • 5. Zipf plots Blackouts Fires Flares Moby Dick Terrorism Web links 10−8 10−6 10−4 10−2 100 10−8 10−6 10−4 10−2 100 100 102 104 106 100 102 104 106 100 102 104 106 x 1−P(x)
  • 6. The power law distribution The power-law distribution is p(x) ∝ x−α where α, the scaling parameter, is constant The scaling parameter typically lies in the range 2 < α < 3, although there are some occasional exceptions When α < 2, all moments are infinite
  • 7. The power law distribution The power-law distribution is p(x) ∝ x−α where α, the scaling parameter, is constant The scaling parameter typically lies in the range 2 < α < 3, although there are some occasional exceptions When α < 2, all moments are infinite Typically, the entire process doesn’t obey a power law Instead, the power law applies only for values greater than some minimum xmin
  • 8. Power law: PMF & CMF Discrete power law, the PMF is p(x) = x−α ζ(α, xmin) where α > 1, xmin ≥ 1 and ζ(α, xmin) = ∞ ∑ n=0 (n + xmin)−α is the generalised zeta function When xmin = 1, ζ(α, 1) is the standard zeta function PDF CDF 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0 10 20 30 40 50 x 1.50 1.75 2.00 2.25 2.50 α
  • 9. Fitting power laws The main technique for fitting power laws comes from Clausett et al, 2009 This paper gets around ten new citations a week Estimating α given xmin is straightforward - just use the mle The lower cut-off, xmin, is estimated using a Kolmogorov-Smirnoff approach
  • 10. The poweRlaw package The package is available on CRAN and at https://github.com/csgillespie/poweRlaw Makes fitting power laws easy to fit Crucially, it makes fitting (to the tails) of the log normal, exponential, Poisson equally easy Consistent interface between distributions Estimate parameter uncertainty Compare distributions (statistically and visually)
  • 11. Case study: Moby Dick R> m_pl = displ$new(moby)
  • 12. Case study: Moby Dick R> m_pl = displ$new(moby) R> plot(m_pl) q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q Words CDF 100 101 102 103 104 10−4 10−3 10−2 10−1 100
  • 13. Case study: Moby Dick R> m_pl = displ$new(moby) R> (est = estimate_xmin(m_pl)) $KS [1] 0.009229 $xmin [1] 7 $pars [1] 1.95 attr(,"class") [1] "estimate_xmin" q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q Words CDF 100 101 102 103 104 10−4 10−3 10−2 10−1 100
  • 14. Case study: Moby Dick R> m_pl = displ$new(moby) R> est = estimate_xmin(m_pl) R> m_pl$setXmin(est) q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q Words CDF 100 101 102 103 104 10−4 10−3 10−2 10−1 100
  • 15. Case study: Moby Dick R> m_pl = displ$new(moby) R> est = estimate_xmin(m_pl) R> m_pl$setXmin(est) R> lines(m_pl) q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q Words CDF 100 101 102 103 104 10−4 10−3 10−2 10−1 100
  • 16. Case study: Moby Dick R> m_pl = displ$new(moby) R> est = estimate_xmin(m_pl) R> m_pl$setXmin(est) R> lines(m_pl) R> m_ln = dislnorm$new(moby) R> est = estimate_xmin(m_ln) R> m_ln$setXmin(est) R> lines(m_ln) q q q q q q q q qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq q q q q q q q q q Words CDF 100 101 102 103 104 10−4 10−3 10−2 10−1 100
  • 17. Why use objects? Each distribution is represented by an object: Parent class: distribution Power-law: displ, log-normal: disln, . . . Method dispatch on object class: dist_pdf(m) returns the probability density function based on the class of m Consistent interface: Bootstrapping: R> bootstrap(m) Model selection: R> compare_distributions(m1, m2) Simple interface that enables easy addition of new distributions (currently there are seven available distributions to fit)
  • 18. Reference classes Reference classes behave like classes in C++, Python and many other languages - not like standard R classes You can use these classes with ordinary R expressions and functions An extension to core R (October, 2010) Big difference - mutable state
  • 19. Mutable states R> displ = setRefClass("displ", fields = "xmin") R> d1 = displ$new(xmin = 1) R> d1$xmin [1] 1
  • 20. Mutable states R> displ = setRefClass("displ", fields = "xmin") R> d1 = displ$new(xmin = 1) R> d1$xmin [1] 1 R> d2 = d1 R> d2$xmin = 100 R> d2$xmin [1] 100
  • 21. Mutable states R> displ = setRefClass("displ", fields = "xmin") R> d1 = displ$new(xmin = 1) R> d1$xmin [1] 1 R> d2 = d1 R> d2$xmin = 100 R> d2$xmin [1] 100 R> d1$xmin [1] 100
  • 22. Mutable states When estimating xmin, a naive implementation makes this calculation slow Efficient caching speeds up calculations 100 fold For example, using the call R> m_pl$setXmin(10) updates internal variables that makes future calculations quicker
  • 23. Mutable states When estimating xmin, a naive implementation makes this calculation slow Efficient caching speeds up calculations 100 fold For example, using the call R> m_pl$setXmin(10) updates internal variables that makes future calculations quicker On creation of a distribution object, we make "multiple copies" of the data R> x R> cumsum(log(x)) using reference classes avoids constant copying and speeds up calculations R> pl_ref$xmin = 10 R> pl_s4@xmin = 10
  • 24. Comments Reference classes are still new Code has now broken twice with R upgrades roxygen2 and reference classes didn’t play well together Very few questions on Stackoverflow on reference classes Structuring code and files Care has to be taken when using them with parallel computing
  • 25. References Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law distributions in empirical data. SIAM review 51.4 (2009): 661–703. poweRlaw package https://github.com/csgillespie/poweRlaw