SlideShare a Scribd company logo
1 of 2
Download to read offline
The problem with Sturges’ rule for constructing
histograms
Rob J Hyndman1
5 July 1995
Abstract Most statistical packages use Sturges’ rule (or an extension of it) for selecting
the number of classes when constructing a histogram. Sturges’ rule is also widely
recommended in introductory statistics textbooks. It is known that Sturges’ rule leads to
oversmoothed histograms, but Sturges’ derivation of his rule has never been questioned.
In this note, I point out that the argument leading to Sturges’ rule is wrong.
Key words: Histogram, bin-width selection, statistical computer packages.
Herbert Sturges (1926) considered an idealised frequency histogram with k bins where the
ith bin count is the binomial coefficient k−1
i , i = 0, 1, . . . , k −1. As k increases, this ideal
frequency histogram approaches the shape of a normal density. The total sample size is
n =
k−1
i=0
k − 1
i
= (1 + 1)k−1
= 2k−1
by the binomial expansion. So the number of classes to choose when constructing a
histogram from normal data is
k = 1 + log2 n.
This is Sturges’ rule. If the data are not normal, additional classes may be required;
Doane (1976) proposed a modification to Sturges’ rule to allow for skewness. It is known
that Sturges’ rule and Doane’s rule lead to oversmoothed histograms, especially for large
samples (see, for example, Scott, 1992). But the derivation of Sturges’ rule does not
seem to have been questioned, and has been reproduced by several authors including both
Doane (1976) and Scott (1992).
The problem with the above argument is that any multiple of the binomial coefficients
could have been used and the frequency histogram would still approach the normal density.
So, any number of classes could be obtained, depending on the multiple used.
In the derivation above, Sturges is implicitly using a binomial distribution to approxi-
mate an underlying normal distribution. That is, if the underlying density is normal and
is appropriately scaled to have mean (k − 1)/2 and variance (k − 1)/4, then it can be
approximated by a binomial distribution B(k − 1, 0.5) for large k. Then the histogram
classes correspond to the discrete values of the binomial distribution. The expected class
frequencies are n k−1
i (0.5)k−1. So if n = 2k−1 we obtain Sturges’ idealised histogram.
But any other value of n is equally valid and does not lead to Sturges’ rule.
1
Rob Hyndman is Senior Lecturer, Department of Econometrics and Business Statistics, Monash Uni-
versity, Clayton, Victoria, Australia, 3168.
1
The problem with Sturges’ rule for constructing histograms 2
Alternative rules for constructing histograms include Scott’s (1979) rule for the class width:
h = 3.5sn−1/3
and Freedman and Diaconis’s (1981) rule for the class width:
h = 2(IQ)n−1/3
where s is the sample standard deviation and IQ is the sample interquartile range. Either
of these are just as simple to use as Sturges’ rule, but are well-founded in statistical
theory. Wand (1995) has recently extended Scott’s rule to give consistent estimators of
the underlying density.
Sturges’ rule has probably survived as long as it has because, for moderate n (less than
200), it gives similar results to the alternative rules above (see Scott, 1992, p.56), and so
produces reasonable histograms. However, it does not work for large n.
The problem with Sturges’ rule is that its derivation is wrong. It is a rule which no longer
deserves a place in statistics textbooks or as a default in statistical computer packages.
References
Doane, D.P. (1976) Aesthetic frequency classification. American Statistician, 30, 181–
183.
Freedman, D. and Diaconis, P. (1981) On this histogram as a density estimator: L2
theory. Zeit. Wahr. ver. Geb., 57, 453–476.
Scott, D.W. (1979) On optimal and data-based histograms. Biometrika, 66, 605–610.
Scott, D.W. (1992) Multivariate density estimation: theory, practice, and visualization,
John Wiley & Sons: New York.
Sturges, H. (1926) The choice of a class-interval. J. Amer. Statist. Assoc., 21, 65–66.
Wand, M.P. (1995) Data-based choice of histogram bin-width. Technical report, Aus-
tralian Graduate School of Management, University of NSW.

More Related Content

What's hot

What's hot (20)

Light in the oceans powerpoint
Light in the oceans powerpointLight in the oceans powerpoint
Light in the oceans powerpoint
 
El nino - The arrival of Warm water
El nino - The arrival of Warm waterEl nino - The arrival of Warm water
El nino - The arrival of Warm water
 
Elements of sea water (aem 215)
Elements of sea water (aem 215)Elements of sea water (aem 215)
Elements of sea water (aem 215)
 
Coastal resource management presentation
Coastal resource management presentationCoastal resource management presentation
Coastal resource management presentation
 
Layers of earth
Layers of earthLayers of earth
Layers of earth
 
Ocean Waves.ppt
Ocean Waves.pptOcean Waves.ppt
Ocean Waves.ppt
 
Types of lake
Types of lakeTypes of lake
Types of lake
 
Ocean Chemistry
Ocean ChemistryOcean Chemistry
Ocean Chemistry
 
Presentation El Nino
Presentation  El  NinoPresentation  El  Nino
Presentation El Nino
 
food chain and ecological energetics
food chain and ecological energeticsfood chain and ecological energetics
food chain and ecological energetics
 
The Lithosphere
The LithosphereThe Lithosphere
The Lithosphere
 
Ocean waves lecture
Ocean waves lectureOcean waves lecture
Ocean waves lecture
 
Improved Marine Meteorological Services
Improved Marine Meteorological Services Improved Marine Meteorological Services
Improved Marine Meteorological Services
 
ECOSYSTEM FINAL.pptx
ECOSYSTEM FINAL.pptxECOSYSTEM FINAL.pptx
ECOSYSTEM FINAL.pptx
 
Lanina
LaninaLanina
Lanina
 
Marine ecosystem
Marine ecosystemMarine ecosystem
Marine ecosystem
 
coastal wetland and its thereats
coastal wetland and its thereatscoastal wetland and its thereats
coastal wetland and its thereats
 
Ocean currents
Ocean currentsOcean currents
Ocean currents
 
Niche .pptx
Niche .pptxNiche .pptx
Niche .pptx
 
Wetland protection
Wetland protectionWetland protection
Wetland protection
 

Similar to Sturges

Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsKushal Kumar Dey
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfYaseenRashid4
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxikirkton
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's projectDong Zhang
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...sugeladi
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environmentBob Marcus
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptxKinzaSuhail2
 
Categorical data stata cox 2004
Categorical data stata    cox 2004Categorical data stata    cox 2004
Categorical data stata cox 2004Lilian Carvalho
 
A Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data SetsA Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data SetsAlkis Vazacopoulos
 
Statistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptxStatistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptxAshellyTugdang
 
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALINGWeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALINGdessybudiyanti
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSandinoBerutu1
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptImXaib
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulasAnshul Goyal
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis Baivab Nag
 

Similar to Sturges (20)

Selection of bin width and bin number in histograms
Selection of bin width and bin number in histogramsSelection of bin width and bin number in histograms
Selection of bin width and bin number in histograms
 
Ch 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdfCh 4 Cluster Analysis.pdf
Ch 4 Cluster Analysis.pdf
 
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docxATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
ATT00001ATT00002ATT00003ATT00004ATT00005CARD.docx
 
Dong Zhang's project
Dong Zhang's projectDong Zhang's project
Dong Zhang's project
 
kcde
kcdekcde
kcde
 
F0742328
F0742328F0742328
F0742328
 
A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...A brief history of generative models for power law and lognormal ...
A brief history of generative models for power law and lognormal ...
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
 
1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment1979 Optimal diffusions in a random environment
1979 Optimal diffusions in a random environment
 
Normal Distribution slides(1).pptx
Normal Distribution slides(1).pptxNormal Distribution slides(1).pptx
Normal Distribution slides(1).pptx
 
Categorical data stata cox 2004
Categorical data stata    cox 2004Categorical data stata    cox 2004
Categorical data stata cox 2004
 
A Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data SetsA Regularization Approach to the Reconciliation of Constrained Data Sets
A Regularization Approach to the Reconciliation of Constrained Data Sets
 
Statistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptxStatistics and Probability- NORMAL DISTRIBUTION.pptx
Statistics and Probability- NORMAL DISTRIBUTION.pptx
 
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALINGWeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
WeightedWEIGHTED METRIC MULTIDIMENSIONAL SCALING
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Slide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.pptSlide-TIF311-DM-10-11.ppt
Slide-TIF311-DM-10-11.ppt
 
Inverse reliability copulas
Inverse reliability copulasInverse reliability copulas
Inverse reliability copulas
 
Cluster Analysis
Cluster Analysis Cluster Analysis
Cluster Analysis
 
Dissertation Paper
Dissertation PaperDissertation Paper
Dissertation Paper
 

Recently uploaded

Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 

Recently uploaded (20)

Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 

Sturges

  • 1. The problem with Sturges’ rule for constructing histograms Rob J Hyndman1 5 July 1995 Abstract Most statistical packages use Sturges’ rule (or an extension of it) for selecting the number of classes when constructing a histogram. Sturges’ rule is also widely recommended in introductory statistics textbooks. It is known that Sturges’ rule leads to oversmoothed histograms, but Sturges’ derivation of his rule has never been questioned. In this note, I point out that the argument leading to Sturges’ rule is wrong. Key words: Histogram, bin-width selection, statistical computer packages. Herbert Sturges (1926) considered an idealised frequency histogram with k bins where the ith bin count is the binomial coefficient k−1 i , i = 0, 1, . . . , k −1. As k increases, this ideal frequency histogram approaches the shape of a normal density. The total sample size is n = k−1 i=0 k − 1 i = (1 + 1)k−1 = 2k−1 by the binomial expansion. So the number of classes to choose when constructing a histogram from normal data is k = 1 + log2 n. This is Sturges’ rule. If the data are not normal, additional classes may be required; Doane (1976) proposed a modification to Sturges’ rule to allow for skewness. It is known that Sturges’ rule and Doane’s rule lead to oversmoothed histograms, especially for large samples (see, for example, Scott, 1992). But the derivation of Sturges’ rule does not seem to have been questioned, and has been reproduced by several authors including both Doane (1976) and Scott (1992). The problem with the above argument is that any multiple of the binomial coefficients could have been used and the frequency histogram would still approach the normal density. So, any number of classes could be obtained, depending on the multiple used. In the derivation above, Sturges is implicitly using a binomial distribution to approxi- mate an underlying normal distribution. That is, if the underlying density is normal and is appropriately scaled to have mean (k − 1)/2 and variance (k − 1)/4, then it can be approximated by a binomial distribution B(k − 1, 0.5) for large k. Then the histogram classes correspond to the discrete values of the binomial distribution. The expected class frequencies are n k−1 i (0.5)k−1. So if n = 2k−1 we obtain Sturges’ idealised histogram. But any other value of n is equally valid and does not lead to Sturges’ rule. 1 Rob Hyndman is Senior Lecturer, Department of Econometrics and Business Statistics, Monash Uni- versity, Clayton, Victoria, Australia, 3168. 1
  • 2. The problem with Sturges’ rule for constructing histograms 2 Alternative rules for constructing histograms include Scott’s (1979) rule for the class width: h = 3.5sn−1/3 and Freedman and Diaconis’s (1981) rule for the class width: h = 2(IQ)n−1/3 where s is the sample standard deviation and IQ is the sample interquartile range. Either of these are just as simple to use as Sturges’ rule, but are well-founded in statistical theory. Wand (1995) has recently extended Scott’s rule to give consistent estimators of the underlying density. Sturges’ rule has probably survived as long as it has because, for moderate n (less than 200), it gives similar results to the alternative rules above (see Scott, 1992, p.56), and so produces reasonable histograms. However, it does not work for large n. The problem with Sturges’ rule is that its derivation is wrong. It is a rule which no longer deserves a place in statistics textbooks or as a default in statistical computer packages. References Doane, D.P. (1976) Aesthetic frequency classification. American Statistician, 30, 181– 183. Freedman, D. and Diaconis, P. (1981) On this histogram as a density estimator: L2 theory. Zeit. Wahr. ver. Geb., 57, 453–476. Scott, D.W. (1979) On optimal and data-based histograms. Biometrika, 66, 605–610. Scott, D.W. (1992) Multivariate density estimation: theory, practice, and visualization, John Wiley & Sons: New York. Sturges, H. (1926) The choice of a class-interval. J. Amer. Statist. Assoc., 21, 65–66. Wand, M.P. (1995) Data-based choice of histogram bin-width. Technical report, Aus- tralian Graduate School of Management, University of NSW.