SlideShare a Scribd company logo
The problem with Sturges’ rule for constructing
histograms
Rob J Hyndman1
5 July 1995
Abstract Most statistical packages use Sturges’ rule (or an extension of it) for selecting
the number of classes when constructing a histogram. Sturges’ rule is also widely
recommended in introductory statistics textbooks. It is known that Sturges’ rule leads to
oversmoothed histograms, but Sturges’ derivation of his rule has never been questioned.
In this note, I point out that the argument leading to Sturges’ rule is wrong.
Key words: Histogram, bin-width selection, statistical computer packages.
Herbert Sturges (1926) considered an idealised frequency histogram with k bins where the
ith bin count is the binomial coefficient k−1
i , i = 0, 1, . . . , k −1. As k increases, this ideal
frequency histogram approaches the shape of a normal density. The total sample size is
n =
k−1
i=0
k − 1
i
= (1 + 1)k−1
= 2k−1
by the binomial expansion. So the number of classes to choose when constructing a
histogram from normal data is
k = 1 + log2 n.
This is Sturges’ rule. If the data are not normal, additional classes may be required;
Doane (1976) proposed a modification to Sturges’ rule to allow for skewness. It is known
that Sturges’ rule and Doane’s rule lead to oversmoothed histograms, especially for large
samples (see, for example, Scott, 1992). But the derivation of Sturges’ rule does not
seem to have been questioned, and has been reproduced by several authors including both
Doane (1976) and Scott (1992).
The problem with the above argument is that any multiple of the binomial coefficients
could have been used and the frequency histogram would still approach the normal density.
So, any number of classes could be obtained, depending on the multiple used.
In the derivation above, Sturges is implicitly using a binomial distribution to approxi-
mate an underlying normal distribution. That is, if the underlying density is normal and
is appropriately scaled to have mean (k − 1)/2 and variance (k − 1)/4, then it can be
approximated by a binomial distribution B(k − 1, 0.5) for large k. Then the histogram
classes correspond to the discrete values of the binomial distribution. The expected class
frequencies are n k−1
i (0.5)k−1. So if n = 2k−1 we obtain Sturges’ idealised histogram.
But any other value of n is equally valid and does not lead to Sturges’ rule.
1
Rob Hyndman is Senior Lecturer, Department of Econometrics and Business Statistics, Monash Uni-
versity, Clayton, Victoria, Australia, 3168.
1
The problem with Sturges’ rule for constructing histograms 2
Alternative rules for constructing histograms include Scott’s (1979) rule for the class width:
h = 3.5sn−1/3
and Freedman and Diaconis’s (1981) rule for the class width:
h = 2(IQ)n−1/3
where s is the sample standard deviation and IQ is the sample interquartile range. Either
of these are just as simple to use as Sturges’ rule, but are well-founded in statistical
theory. Wand (1995) has recently extended Scott’s rule to give consistent estimators of
the underlying density.
Sturges’ rule has probably survived as long as it has because, for moderate n (less than
200), it gives similar results to the alternative rules above (see Scott, 1992, p.56), and so
produces reasonable histograms. However, it does not work for large n.
The problem with Sturges’ rule is that its derivation is wrong. It is a rule which no longer
deserves a place in statistics textbooks or as a default in statistical computer packages.
References
Doane, D.P. (1976) Aesthetic frequency classification. American Statistician, 30, 181–
183.
Freedman, D. and Diaconis, P. (1981) On this histogram as a density estimator: L2
theory. Zeit. Wahr. ver. Geb., 57, 453–476.
Scott, D.W. (1979) On optimal and data-based histograms. Biometrika, 66, 605–610.
Scott, D.W. (1992) Multivariate density estimation: theory, practice, and visualization,
John Wiley & Sons: New York.
Sturges, H. (1926) The choice of a class-interval. J. Amer. Statist. Assoc., 21, 65–66.
Wand, M.P. (1995) Data-based choice of histogram bin-width. Technical report, Aus-
tralian Graduate School of Management, University of NSW.

More Related Content

What's hot

Stat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingStat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage sampling
Khulna University
 
Chapter 4 estimation of parameters
Chapter 4   estimation of parametersChapter 4   estimation of parameters
Chapter 4 estimation of parameters
Antonio F. Balatar Jr.
 
Mohr circle mohr circle anaysisand application
Mohr circle     mohr circle anaysisand applicationMohr circle     mohr circle anaysisand application
Mohr circle mohr circle anaysisand application
Shivam Jain
 
Systematic sampling
Systematic samplingSystematic sampling
Systematic sampling
CikBeyla Nabiella
 
z-test
z-testz-test
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
numanmunir01
 
Spearman rank correlation coefficient
Spearman rank correlation coefficientSpearman rank correlation coefficient
Spearman rank correlation coefficient
Karishma Chaudhary
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready go
Mmedsc Hahm
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
Dave Marcial
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
Khulna University
 
Stratified sampling
Stratified samplingStratified sampling
Stratified sampling
suncil0071
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
sikojp
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
nszakir
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
swarna dey
 
Introduction to basic concepts of statistics
Introduction to basic concepts of statisticsIntroduction to basic concepts of statistics
Introduction to basic concepts of statistics
Afsheen Affan
 
Skewness and Kurtosis
Skewness and KurtosisSkewness and Kurtosis
Skewness and Kurtosis
Rohan Nagpal
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
Georgios Ath. Kounis
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"
Dalia El-Shafei
 
Correlations
CorrelationsCorrelations
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
richardchandler
 

What's hot (20)

Stat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingStat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage sampling
 
Chapter 4 estimation of parameters
Chapter 4   estimation of parametersChapter 4   estimation of parameters
Chapter 4 estimation of parameters
 
Mohr circle mohr circle anaysisand application
Mohr circle     mohr circle anaysisand applicationMohr circle     mohr circle anaysisand application
Mohr circle mohr circle anaysisand application
 
Systematic sampling
Systematic samplingSystematic sampling
Systematic sampling
 
z-test
z-testz-test
z-test
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Spearman rank correlation coefficient
Spearman rank correlation coefficientSpearman rank correlation coefficient
Spearman rank correlation coefficient
 
Inferential statictis ready go
Inferential statictis ready goInferential statictis ready go
Inferential statictis ready go
 
Statistical analysis and interpretation
Statistical analysis and interpretationStatistical analysis and interpretation
Statistical analysis and interpretation
 
Stat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental scienceStat 1163 -statistics in environmental science
Stat 1163 -statistics in environmental science
 
Stratified sampling
Stratified samplingStratified sampling
Stratified sampling
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Introduction to basic concepts of statistics
Introduction to basic concepts of statisticsIntroduction to basic concepts of statistics
Introduction to basic concepts of statistics
 
Skewness and Kurtosis
Skewness and KurtosisSkewness and Kurtosis
Skewness and Kurtosis
 
Environmental statistics
Environmental statisticsEnvironmental statistics
Environmental statistics
 
Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"Statistics "Descriptive & Inferential"
Statistics "Descriptive & Inferential"
 
Correlations
CorrelationsCorrelations
Correlations
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 

Similar to Sturges

Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histograms
Kushal Kumar Dey
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdf
YaseenRashid4
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ikirkton
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's project
Dong Zhang
 
kcde
kcdekcde
F0742328
F0742328F0742328
F0742328
IOSR Journals
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...
sugeladi
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Avijit Famous
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
NimeshGandhey
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment
Bob Marcus
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
KinzaSuhail2
 
Categorical data stata cox 2004
Categorical data stata    cox 2004Categorical data stata    cox 2004
Categorical data stata cox 2004
Lilian Carvalho
 
A Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data SetsA Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data Sets
Alkis Vazacopoulos
 
Statistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptxStatistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptx
AshellyTugdang
 
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALINGWeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
dessybudiyanti
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
SandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
ImXaib
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulas
Anshul Goyal
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
Baivab Nag
 
Dissertation Paper
Dissertation PaperDissertation Paper
Dissertation Paper
David (Sudip) Nath
 

Similar to Sturges (20)

Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histograms
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdf
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's project
 
kcde
kcdekcde
kcde
 
F0742328
F0742328F0742328
F0742328
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
 
Categorical data stata cox 2004
Categorical data stata    cox 2004Categorical data stata    cox 2004
Categorical data stata cox 2004
 
A Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data SetsA Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data Sets
 
Statistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptxStatistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptx
 
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALINGWeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulas
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Dissertation Paper
Dissertation PaperDissertation Paper
Dissertation Paper
 

Recently uploaded

Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Truck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers ChennaiTruck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers Chennai
ConveyorSystem
 
japanese language course in delhi near me
japanese language course in delhi near mejapanese language course in delhi near me
japanese language course in delhi near me
heyfairies7
 
Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
msthrill
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
MECE (Mutually Exclusive, Collectively Exhaustive) Principle
MECE (Mutually Exclusive, Collectively Exhaustive) PrincipleMECE (Mutually Exclusive, Collectively Exhaustive) Principle
MECE (Mutually Exclusive, Collectively Exhaustive) Principle
Operational Excellence Consulting
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results
 
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based SolutionsRevolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Excel coatings
 
High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15
advik4387
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip
 
Easy Earnings Through Refer and Earn Apps Without KYC.pptx
Easy Earnings Through Refer and Earn Apps Without KYC.pptxEasy Earnings Through Refer and Earn Apps Without KYC.pptx
Easy Earnings Through Refer and Earn Apps Without KYC.pptx
Fx Lotus
 
Kanban Coaching Exchange with Dave White - Example SDR Report
Kanban Coaching Exchange with Dave White - Example SDR ReportKanban Coaching Exchange with Dave White - Example SDR Report
Kanban Coaching Exchange with Dave White - Example SDR Report
Helen Meek
 
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani case
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
pavelborek
 
Enhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: IntroductionEnhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: Introduction
Cor Verdouw
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
concepsionchomo153
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
IPLTech Electric
 
Pro Tips for Effortless Contract Management
Pro Tips for Effortless Contract ManagementPro Tips for Effortless Contract Management
Pro Tips for Effortless Contract Management
Eternity Paralegal Services
 
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian MatkaDpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Matka
 
TriStar Gold Corporate Presentation - June 2024
TriStar Gold Corporate Presentation - June 2024TriStar Gold Corporate Presentation - June 2024
TriStar Gold Corporate Presentation - June 2024
Adnet Communications
 

Recently uploaded (20)

Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Truck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers ChennaiTruck Loading Conveyor Manufacturers Chennai
Truck Loading Conveyor Manufacturers Chennai
 
japanese language course in delhi near me
japanese language course in delhi near mejapanese language course in delhi near me
japanese language course in delhi near me
 
Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
MECE (Mutually Exclusive, Collectively Exhaustive) Principle
MECE (Mutually Exclusive, Collectively Exhaustive) PrincipleMECE (Mutually Exclusive, Collectively Exhaustive) Principle
MECE (Mutually Exclusive, Collectively Exhaustive) Principle
 
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan ChartSatta Matka Dpboss Kalyan Matka Results Kalyan Chart
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
 
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based SolutionsRevolutionizing Surface Protection Xlcoatings Nano Based Solutions
Revolutionizing Surface Protection Xlcoatings Nano Based Solutions
 
High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15High-Quality IPTV Monthly Subscription for $15
High-Quality IPTV Monthly Subscription for $15
 
Kirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper PresentationKirill Klip GEM Royalty TNR Gold Copper Presentation
Kirill Klip GEM Royalty TNR Gold Copper Presentation
 
Easy Earnings Through Refer and Earn Apps Without KYC.pptx
Easy Earnings Through Refer and Earn Apps Without KYC.pptxEasy Earnings Through Refer and Earn Apps Without KYC.pptx
Easy Earnings Through Refer and Earn Apps Without KYC.pptx
 
Kanban Coaching Exchange with Dave White - Example SDR Report
Kanban Coaching Exchange with Dave White - Example SDR ReportKanban Coaching Exchange with Dave White - Example SDR Report
Kanban Coaching Exchange with Dave White - Example SDR Report
 
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
Adani Group's Active Interest In Increasing Its Presence in the Cement Manufa...
 
Lukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptxLukas Rycek - GreenChemForCE - project structure.pptx
Lukas Rycek - GreenChemForCE - project structure.pptx
 
Enhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: IntroductionEnhancing Adoption of AI in Agri-food: Introduction
Enhancing Adoption of AI in Agri-food: Introduction
 
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
欧洲杯投注-欧洲杯投注外围盘口-欧洲杯投注盘口app|【​网址​🎉ac22.net🎉​】
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
 
Pro Tips for Effortless Contract Management
Pro Tips for Effortless Contract ManagementPro Tips for Effortless Contract Management
Pro Tips for Effortless Contract Management
 
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian MatkaDpboss Satta Matta Matka Kalyan Chart Indian Matka
Dpboss Satta Matta Matka Kalyan Chart Indian Matka
 
TriStar Gold Corporate Presentation - June 2024
TriStar Gold Corporate Presentation - June 2024TriStar Gold Corporate Presentation - June 2024
TriStar Gold Corporate Presentation - June 2024
 

Sturges

  • 1. The problem with Sturges’ rule for constructing histograms Rob J Hyndman1 5 July 1995 Abstract Most statistical packages use Sturges’ rule (or an extension of it) for selecting the number of classes when constructing a histogram. Sturges’ rule is also widely recommended in introductory statistics textbooks. It is known that Sturges’ rule leads to oversmoothed histograms, but Sturges’ derivation of his rule has never been questioned. In this note, I point out that the argument leading to Sturges’ rule is wrong. Key words: Histogram, bin-width selection, statistical computer packages. Herbert Sturges (1926) considered an idealised frequency histogram with k bins where the ith bin count is the binomial coefficient k−1 i , i = 0, 1, . . . , k −1. As k increases, this ideal frequency histogram approaches the shape of a normal density. The total sample size is n = k−1 i=0 k − 1 i = (1 + 1)k−1 = 2k−1 by the binomial expansion. So the number of classes to choose when constructing a histogram from normal data is k = 1 + log2 n. This is Sturges’ rule. If the data are not normal, additional classes may be required; Doane (1976) proposed a modification to Sturges’ rule to allow for skewness. It is known that Sturges’ rule and Doane’s rule lead to oversmoothed histograms, especially for large samples (see, for example, Scott, 1992). But the derivation of Sturges’ rule does not seem to have been questioned, and has been reproduced by several authors including both Doane (1976) and Scott (1992). The problem with the above argument is that any multiple of the binomial coefficients could have been used and the frequency histogram would still approach the normal density. So, any number of classes could be obtained, depending on the multiple used. In the derivation above, Sturges is implicitly using a binomial distribution to approxi- mate an underlying normal distribution. That is, if the underlying density is normal and is appropriately scaled to have mean (k − 1)/2 and variance (k − 1)/4, then it can be approximated by a binomial distribution B(k − 1, 0.5) for large k. Then the histogram classes correspond to the discrete values of the binomial distribution. The expected class frequencies are n k−1 i (0.5)k−1. So if n = 2k−1 we obtain Sturges’ idealised histogram. But any other value of n is equally valid and does not lead to Sturges’ rule. 1 Rob Hyndman is Senior Lecturer, Department of Econometrics and Business Statistics, Monash Uni- versity, Clayton, Victoria, Australia, 3168. 1
  • 2. The problem with Sturges’ rule for constructing histograms 2 Alternative rules for constructing histograms include Scott’s (1979) rule for the class width: h = 3.5sn−1/3 and Freedman and Diaconis’s (1981) rule for the class width: h = 2(IQ)n−1/3 where s is the sample standard deviation and IQ is the sample interquartile range. Either of these are just as simple to use as Sturges’ rule, but are well-founded in statistical theory. Wand (1995) has recently extended Scott’s rule to give consistent estimators of the underlying density. Sturges’ rule has probably survived as long as it has because, for moderate n (less than 200), it gives similar results to the alternative rules above (see Scott, 1992, p.56), and so produces reasonable histograms. However, it does not work for large n. The problem with Sturges’ rule is that its derivation is wrong. It is a rule which no longer deserves a place in statistics textbooks or as a default in statistical computer packages. References Doane, D.P. (1976) Aesthetic frequency classification. American Statistician, 30, 181– 183. Freedman, D. and Diaconis, P. (1981) On this histogram as a density estimator: L2 theory. Zeit. Wahr. ver. Geb., 57, 453–476. Scott, D.W. (1979) On optimal and data-based histograms. Biometrika, 66, 605–610. Scott, D.W. (1992) Multivariate density estimation: theory, practice, and visualization, John Wiley & Sons: New York. Sturges, H. (1926) The choice of a class-interval. J. Amer. Statist. Assoc., 21, 65–66. Wand, M.P. (1995) Data-based choice of histogram bin-width. Technical report, Aus- tralian Graduate School of Management, University of NSW.