The Seven pillars of statistical wisdom Stephen M. Stigler

THE SEVEN PILLARS OF
STATISTICALWISDOM
SUMMARY
Stephen M. Stigler
1

TABLE OF CONTENTS
1. Aggregation
2. Information
3. Likelihood
4. Intercomparison
5. Regression
6. Design
7. Residual
2

AGGREGATION
■ Aggregation allows one to gain information by discarding information,
namely, the individuality of the observations.
■ It is an act of “creative destruction”, to describe a form of economic
reorganization.
■ It must be done on principle, discarding information that does not aid the
ultimate scientific goal.
■ In some statistical problems, a notion of a sufficient statistic – a data
summary that loses no relevant information can be employed, yet, in the
era of big data, that frequently is not feasible or the assumptions behind it
are untenable.
■ It has taken many forms, from simple addition to modern algorithms.
■ However, the principle of using summaries by selectively discarding
information, has remained the same.
3

INFORMATION
■ Information, challenges the importance of “big data” by noting that
observations are not all equally important: the amount of information in a
data set is often proportional to only the square root of the number of
observations, not the absolute number.
■ It takes on a different meaning in statistics from that found in signal
processing.
■ It works with aggregation to help recognize how the diminishing rate of
gain in information relates to the anticipated use.
■ In signal Processing, the information passed can remain at a constant rate
indefinitely; in statistics the rate of accumulation of information from the
signal must decline.
■ The measurement of information in data – the comparative information in
different data sets and the rate of increase in information with an increase
in data – has become a pillar of statistics.
4

LIKELIHOOD
■ Likelihood, the use of probability to calibrate inferences and to give a
scale to the measurement of uncertainty, is both particularly dangerous
and valuable.
■ It requires great care and understanding to be employed positively, but
the rewards are great as well.
■ When used poorly the summary can mislead, but that should not blind
us to the much greater propensity to mislead with verbal summaries
lacking even a nod towards an attempt at calibration with respect to a
generally accepted standard.
■ Likelihood not only can provide a measure of our conclusions, it can be
a guide to the analysis, to the method of aggregation, and to rate at
which information accrues.
5

INTERCOMPARISON
■ Intercomparison is the principle that statistical comparisons do not
need to be made with respect to an external standard.
■ It gives us internal standards and a way to judge effects and their
significance purely within the data at hand.
■ It is a two-edged sword, for the lack of appeal to an outside standard
can remove our conclusion from all relevance.
■ When employed with care and intelligence, it, together with the
designs of the sixth pillar, can yield an almost magical route to
understanding in some high-dimensional settings.
6

REGRESSION
■ Regression, both a paradox (tall parents on average produce shorter
children; tall children on average have shorter parents) and the basis of
inference, including Bayesian inference and causal reasoning.
■ It is a principle of relatively for statistical analysis, the idea that asking a
question from different standpoints leads not only to unexpected
insight but also to a new way of framing analyses.
■ The idea is not simply the construction of multivariate object; it is the
way they are used, taken apart and reassembled in a genuine
multivariate analysis.
■ Fully developed in the twentieth century, the methods that flowed
from this understanding could empower tours to higher altitudes and
even to higher dimensions.
7

DESIGN
■ Design, involved great subtleties: the ability to structure models for the
exploration of high dimensional data with the simultaneous
consideration of multiple factor, and the creation through
randomization of a basis fro inference that relied only minimally upon
modeling.
■ A pillar of statistics is the design of experiments, and—by extension—all
data collection and planning that leads to good data.
■ For example, by recognizing the gains to be had from a combinatorial
approach with rigorous randomization.
8

RESIDUAL
■ The seventh pillar, Residual, the notion that a complicated phenomenon
can be simplified by subtracting the effect of known causes, leaving a
residual phenomenon that can be explained more easily.
■ It is the logic of comparison of complex models as a route to the exploration
of high-dimensional data, and the use of the same scientific logic in
graphical analysis.
■ It is here that in the current day we face the greatest need, confronting the
question for which we, after all these centuries, remain least able to provide
broad answers.
■ This pillar enables you to examine shortcomings of a model by examining
the difference between the observed data and the model. If the residuals
have a systematic pattern, you can revise your model to explain the data
better.
9

The Seven pillars of statistical wisdom Stephen M. Stigler

Recommended

Recommended

More Related Content

Similar to The Seven pillars of statistical wisdom Stephen M. Stigler

Similar to The Seven pillars of statistical wisdom Stephen M. Stigler (20)

Recently uploaded

Recently uploaded (20)

The Seven pillars of statistical wisdom Stephen M. Stigler