D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statistics, and the philosophers

Mayo
5/15

1

The Science Wars and the Statistics Wars: scientism,
popular statistics, and the philosophers

Deborah Mayo
• In thinking about scientism for this conference—a topic on
which I’ve never written—a puzzle arises: How can we
worry about science being held in too high a regard when
we are daily confronted with articles shouting that “most
scientific findings are false?”
• Too deferential to scientific methodology? In the fields I’m
most closely involved, scarcely a day goes by where we’re
not reading articles on “bad science”, “trouble in the lab”,
and “science fails to self-correct.”

Mayo
5/15

2

• Not puzzling: I suggest that legitimate criticisms of
scientism often stem from methodological abuses of
statistical methodology—i.e., from what might be called
“statisticism”—“lies, damned lies, and statistics.”
• The rise of big data and high-powered computer programs
extend statistical methods across the sciences, law and
evidence-based policy,—and beyond (culturomics,
philosophometrics)—but often with methodological-
philosophical loopholes.
• It’s the false veneer of science, statistics as window
dressing, that bothers us.

Mayo
5/15

3

Are philosophies about science relevant here?
• I say yes: “Getting philosophical” here would be to provide
tools to avoid obfuscating philosophically tinged notions
about inference, testing, while offering a critical
illumination of flaws and foibles linking technical statistical
concepts to substantive claims.
That is the goal of the different examples I will consider.
• Provocative articles give useful exposés of classic fallacies:
op-values are not posterior probabilities,
ostatistical significance is not substantive significance,
oassociation is not causation.
They often lack a depth of understanding of underlying
philosophical, statistical, and historical issues.

Mayo
5/15

4

Demarcation: Bad Methodology/Bad Statistics
• Investigators of Diederik Stapel, the social psychologist
who fabricated his data, walked into a culture of
“verification bias” (2012 Tilberg Report, “Flawed
Science”).
• They were shocked when people they interviewed
“defended the serious and less serious violations of proper
scientific method saying: that is what I have learned in
practice; everyone in my research environment does the
same, and so does everyone we talk to…” (48).

Mayo
5/15

5

• Philosophers tend to have cold feet when it comes to saying
anything general about science versus pseudoscience.
• Debunkers need to have a position on bad, very bad, not so
bad methodology.
• The Tilberg Report does a pretty good job:
“One of the most fundamental rules of scientific research is
that an investigation must be designed in such a way that
facts that might refute the research hypotheses are given at
least an equal chance of emerging as do facts that confirm
the research hypotheses. Violations of this rule, continuing
an experiment until it works as desired, or excluding
unwelcome experimental subjects or results, inevitably
tends to confirm the researcher’s research hypotheses, and
essentially render the hypotheses immune to the facts”.

Mayo
5/15

6

Items in their list of “dirty laundry” include:
“An experiment fails to yield the expected statistically
significant results. The experimenters try and try again
until they find something (multiple testing, multiple
modeling, post-data search of endpoint or subgroups,
and the only experiment subsequently reported is the
one that did yield the expected results.” (Report, 48)
In fields like medicine, these gambits are deemed bad statistics
if not criminal behavior.
(A recent case went all the way to the Supreme Court, Scott
Harkonen case: post data searching for statistically significant
endpoints does not qualify as free speech.)

Mayo
5/15

7

Popper had the right idea:
“Observations or experiments can be accepted as
supporting a theory (or a hypothesis, or a scientific
assertion) only if these observations or experiments are
severe tests of the theory” (Popper 1994, p. 89).
Unfortunately Popper never arrived at an adequate notion of a
severe test.
(In a letter, Popper said he regretted not having sufficiently
learned statistics.)

Mayo
5/15

8

Philosophers have their own “statisticisms”—logicism,
mathematicism: search for logics
of
evidential-‐relationship

Assumes:
For
any
data
x,
hypothesis
H,
there
is
an

(context
free)
evidential
relationship.
(x
assumed
given)

Hacking
(1965):

the
“Law
of
Likelihood”:

x
support
hypotheses

H1
more
than
H2
if
P(x;H1)
>
P(x;H2).

Such
a
maximally
likelihood
alternative
H2
can
always
be

constructed:
H1
may
always
be
found
less
well
supported,

even
if
H1
is
true—no
error
control.

Hacking
rejected
the
likelihood
approach
(1977)
on
such

grounds

Mayo
5/15

9

Lakatos was correct that there’s a tension between logics of
evidence and the intuition against ad hoc hypotheses; he
described it as an appeal to history, to how the hypothesis was
formulated, selected for testing, modified, etc.
Now we’d call them “selection effects” and “cherry picking”.
The problems with selective reporting, stopping when the data
look good are not problems about long-runs….
It’s that we cannot say about the case at hand that it has done a
good job of avoiding the sources of misinterpretation.
That makes it questionable inference

Mayo
5/15

10

Role for philosophers? One of the final recommendations in
the Report is this:
In the training program for PhD students, the relevant
basic principles of philosophy of science, methodology,
ethics and statistics that enable the responsible practice
of science must be covered.
A philosophy department could well create an entire core
specialization that revolved around these themes.

Mayo
5/15

11

Statistics Wars: Was the Discovery of the Higgs Particle
“Bad Science”?

One of the biggest science events of 2012-13 was undoubtedly
the announcement on July 4, 2012 of evidence for the discovery
of a Higgs-like particle based on a “5 sigma observed effect”.
Because the 5 sigma report refers to frequentist statistical tests,
the discovery is imbued with some controversial themes from
philosophy of statistics

Mayo
5/15

12

Subjective Bayesian Dennis Lindley (of the Jeffreys-Lindley
paradox) sent around a letter to the ISBA (through O’Hagan):
1. Why such an extreme evidence requirement? We
know from a Bayesian perspective that this only makes
sense if (a) the existence of the Higgs boson has
extremely small prior probability and/or (b) the
consequences of erroneously announcing its discovery
are dire in the extreme. …
2. Are the particle physics community completely
wedded to frequentist analysis? If so, has anyone tried
to explain what bad science that is?

Mayo
5/15

13

Not bad science at all.
Practitioners of HEP are very sophisticated with their
statistical methodology and modeling: they’d seen too many
bumps disappear.
They want to ensure that before announcing the hypothesis
H*: “a SM Higgs boson has been discovered” that
H* has been given a severe run for its money.

Mayo
5/15

14

Within
a
general
model
for
the
detector,

H0:
μ
=
0—background
only
hypothesis,

μ
is
the
“global signal strength” parameter,
μ = 1—measures the SM Higgs boson signal in addition to
the background (SM: Standard Model).
They
want
to
ensure
that
with
extremely
high
probability,

H0
would
have
survived
a
cluster
of
tests,
fortified
with
much

cross-‐checking
T,
were
μ
=
0.

Mayo
5/15

15

Note what’s being given a high probability:

Pr(test
T
would
produce
less
than
5
sigma;
H0)
>

.9999997.

With
probability
.9999997,
the
bumps
would
disappear
(in

either
ATLAS
or
CMS)
under
the
assumption
data
are
due
to

background
H0:
this
is
an
error
probability.

Mayo
5/15

16

P-‐value
police

Science
writers
rushed
in
to
examine
if
the
.99999
was

fallaciously
being
assigned
to
H*
itself—a
posterior

probability
in
H*.

P-‐value
police
graded
sentences
from
each
news
article.

Physicists
did
not
assign
a
high
probability
to

H*: A
Standard
Model
(SM)
Higgs
exists
(…whatever

it
might
mean).

Most
believed
a
Higgs
particle
before
the
collider,
but
most

also
believe
in
beyond
the
standard
model
physics
(BSM).
Once H* passes with severity, they quantify various properties
of the particle discovered (inferring ranges of magnitudes).

Mayo
5/15

17

Statistics Wars: Bayesian vs Frequentist
The traditional frequentist-Bayesian wars are still alive.
In an oversimple nutshell:
• A Bayesian account uses probability for updating beliefs in
claims using Bayes’ theorem.
• Frequentist accounts use probability to control long-run error
rates of procedures (e.g., 95% coverage probability)
Note: anyone who uses conditional probability employs Bayes’
theorem, be it Bayes’ nets or ordinary probability—doesn’t
make it Bayesian)
Probabilism vs Performance
I advocate a third “p”: probativeness

Mayo
5/15

18

Current state of play? (save for discussion)
• Bayesian methods useful but the traditional subjective
Bayesian philosophy (largely) rejected.
• Since the 1990s: “Insisting we should be doing a subjective
analysis falls on deaf ears; they come to statistics to avoid
subjectivity.” (Berger); elicitation given up on.
• Reconciliations and unifications: non-subjective (default or
conventional) Bayesianism: the prior is automatically chosen
so as to maximize the contribution of the data (rather than the
prior). Many different rival systems.
• Priors aren’t considered a degree of belief, not even
probabilities (improper).

Mayo
5/15

19

• Reject Dutch Book, Likelihood Principle; rarely is the final
form a posterior probability, or even a Bayes ratio.
• Gelman and Shalizi (2013)–a Bayesian at Columbia and a
CMU error statistician): “There have been technical
advances, now we need an advance in philosophy…”
“Implicit in the best Bayesian practice is a stance
that has much in common with [my] error-statistical
approach…Indeed crucial parts of Bayesian data
analysis, such as model checking, can be understood
as ‘error probes’ in Mayo’s sense” (p. 10).

Mayo
5/15

20

Big Data: Statistics vs. Data Science (Informatics, Machine
learning, data analytics, CS): “data revolution”
2013 was the “International Year of Celebrating Statistics.”
The label was to help prevent Statistical Science being eclipsed
by the fashionable “Big Data” crowd.
Larry Wasserman: Talk of “Data Science” and “Big Data” fills
me with:
Optimism––it means statistics is finally a sexy field.
Dread––statistics is being left on the sidelines.

Mayo
5/15

21

Data Science: The End of Statistics?
Vapnik, of the Vapnik/Chervonenkis (VC) theory, is known for
his seminal work in machine learning.
They distinguish classical and modern work in philosophy as
well as statistics.
In philosophy:
The classical conception is objective, rational, a naïve realism.
The modern “data driven” empirical view, illustrated by
machine learning, is enlightened.

Mayo
5/15

22

In statistics:
Classical view seeks statistical regularities modeled with
parametric distributions, seeks to estimate and test parameters in
a model intended to describe a real data generating process.
Modern “data driven” view: aims for good predictions with
wholly uninterpretable “black boxes”; views models as mental
constructs and exhorts scientists to restrict themselves to
problems deemed “well posed” by machine-learning criteria.

Mayo
5/15

23

Black Box science
How would the Higgs Boson fit? (It wouldn’t.)
“So the Instrumentalist view follows directly from a sound
scientific theory, and not from the philosophical argument.
So realism is not possible, and instrumentalism is an
appropriate (technically sound) philosophical position”.

Mayo
5/15

24

Down with models: They claim to avoid assumptions about
parametric distributions—but iid is a big assumption.
“Machine-learning inductions, based on training samples
work only so long as stationarity is sufficient to ensure that
the new data are adequately similar to the training data” .
You don’t have to be a naïve realist to think that science is more
than the binary classification problem,
(predicting if you will buy X’s book, or teaching a machine to
disambiguate a handwritten 5 from an 8 in postal addresses),
improve Google searches,….)
All very impressive, limited to that realm.

Mayo
5/15

25

The success of other outgrowths “culturomics” is unclear
(statistics on frequency of word use).
If making something more scientific means treating it as data
mining “associations”, then it may be less scientific (a less good
methodology for given aims).
Not everyone who works in these areas agrees with this
philosophy, but these are founders.

Mayo
5/15

26

Broadly analogous moves occur in philosophy: all science and
inquiry should be restricted to problems deemed “well posed”
by their favorite science,
(neuroscience, physics, evolutionary psychology….)
• The
problem,
of
course,
is
that
they
are
question

begging.

• Uncritical
about
the
methodological
rigor
underlying

research
purporting
to
show
it’s
a
good
way
to
solve

problems
outside
their
particular
subset
of
inquiry.

Mayo
5/15

27

“Aren’t We Data Science?” Marie Davidian, president of the
ASA, asks.
She argues that data scientists have “little appreciation for the
power of design of experiments”.

Reports
are
now
trickling
in
about
the
consequences
of

ignoring
principles
of
DOE

Mayo
5/15

28

Microarray
Big
Data
Analytics:
Screening
for
genetic

associations

Stanley
Young
(Nat.
Inst.
Of
Stat):
There
is
a
relatively

unknown
problem
with
microarray
experiments,
in
addition

to
the
multiple
testing
problems.

Until
relatively
recently,
the
microarray
samples
were
not

sent
through
assay
equipment
in
random
order.

Essentially
all
the
microarray
data
pre-‐2010
is
unreliable.

Mayo
5/15

29

“Stop
Ignoring
Experimental
Design
(or
my
head
will

explode)”
(Lambert,
of
a
bioinformatics
software
Co.)

Statisticians
“tell
me
how
they
are
never
asked
to
help

with
design
before
the
experiment
begins,
only
asked
to

clean
up
the
mess
after
millions
have
been
spent.”

•Fisher:
“To consult the statistician after an experiment is
finished is often merely to ask him to conduct a post mortem
examination…[to] say what the experiment died of.”

Mayo
5/15

30

• Different research programs now appeal to gene and other
theories to get more reliable results than black box
bioinformatics.
• Maybe black boxes aren’t enough after all….
• Let’s go back to the International Year of Celebrating
Statistics

Mayo
5/15

31

The Analytics Rock Star: Nate Silver
The Presidential Address at the ASA (usually by a famous
statistician) was given by pollster Nate Silver.
He’s not in statistics, but he did combine numerous polling
results to predict the Obama win in 2012.
Nate
Silver
“hit
a
home
run
with
the
crowd
in
his
reply
to

the
question
“What
do
you
think
of
data
science
vs.

statistics?”
(Questions
were
twittered.)

Nate’s
reply:
“data
scientist”
was
just
a
“sexed
up”
term
for

statistician.

Audience members cried out with joy.

Mayo
5/15

32

In the talk itself, Silver listed his advice to data journalists:
The reason he favors the Bayesian philosophy is that people
should be explicit about disclosing their biases and
preconceptions.
• If people are so inclined to see the world through their
tunnel vision, why suppose they are able/willing to be
explicit about their biases?
• If priors are to represent biases, shouldn’t they be kept
separate from the data rather than combined with them?
At odds with the idea of data driven journalism.

Mayo
5/15

33

Data-driven journalism
Silver’s
538
blog
is
one
of
the
new
attempts
at
“Big
Data”

journalism:
“to
use
statistical
analysis
—
hard
numbers
—

to
tell
compelling
stories.”

• They
don’t
announce
priors
(so
far
as
I
can
tell).
• My antennae go up for other reasons: reports on observable
statistical associations, running this or that regression may
allow shaky claims under the guise of hard-nosed, “just the
facts” journalism.
(One of the biggest sources of “sciency” approaches.)
• Maybe announcing the biases would be better.
• I’d want an entirely distinct account of warranted inference
from data.

Mayo
5/15

34

Plausibility differs from Well-Testedness
When we hear there’s statistical evidence of some unbelievable
claim (distinguishing shades of grey and being politically
moderate, ovulation and voting preferences), some claim—you
see, if our beliefs were mixed into the interpretation of the
evidence, we wouldn’t be fooled.
We know these things are unbelievable.
That could work in some cases (though it still wouldn’t show
what they’d done wrong).

Mayo
5/15

35

It wouldn’t help with our most important problem:
How to distinguish tests of one and the same hypothesis
with different methods used (e.g., one with searching, post
data subgroups, etc., another without)?
Moreover, committees investigating questionable research
practices (QRPs) find:
“People are not deliberately cheating: they honestly
believe in their theories and believe the data is
supporting them and are just doing the best to make this
as clear as possible to everyone”. Richard Gill (forensic
statistician).

Mayo
5/15

36

We are back to the Tilberg report (and now Jens Forster).
Diederik Stapel says he always read the research literature
extensively to generate his hypotheses.
“So that it was believable and could be argued that this
was the only logical thing you would find.” (E.g., eating
meat causes aggression.)
(In “The Mind of a Con Man,” NY Times, April 26,
2013[4])
(He really doesn’t think he did anything that bad.)

Mayo
5/15

37

Demarcating Methodologies for Finding Things Out
§ Rather than report on believability, researchers need to
report the properties of the methods they used: What was
their capacity to have identified, avoided, admitted bias?
Probability enters to quantify well-testedness, and
discrepancies well or poorly detected
§ A methodology (for finding things out) is questionable if it
cannot or will not distinguish the correctness or plausibility
of inferences from problems stemming from a poorly run
study.

Mayo
5/15

38

An
inference
to
H*
is
questionable
if
it
stems
from
a
method

with
little
ability
to
have
found
flaws
if
they
existed.

Area
of
pseudoinquiry:
A
research
area
that
regularly
fails
to

be
able
to
vouchsafe
the
capability
of
discerning/reporting

mistakes
at
the
levels
of
data,
statistical
model,
substantive

inference

Need
to
be
able
to
say:
H
is
plausible,
but
this
is
a
bad
test

Mayo
5/15

39

Here’s a believable hypothesis: Men react more negatively to
the success of their partners than to their failures?
Studies have shown:
H: partner’s success lowers self-esteem in men
It’s believable, but the statistical experiments are a sham:
[Subjects are randomly assigned to either think about a time
their partner succeeded, or a time they failed. They purport to
find a statistically significant difference in self-esteem is
measured on an Official Psychological Self-Esteem measure
(based on positive word associations with “me” versus “other”)]
Randomly assigning “treatments” does not protect against data-
mining, flexibilities in interpreting results (problems with the
statistics, the self-esteem measure).

Mayo
5/15

40

The New Science of Replication:
• They do not question the methodology of the original study.
• It’s another statistical analysis to mimic everything and see
if it is found in an appropriately powered test.
The problem with failing to replicate one of these social
scientific studies is we cannot say we’ve refuted the original
study because there is too much latitude for finding and not
finding the effect (aside from the formal capacities).
(I’m on one such committee; they need more philosophers of
methodology.)
Distinguish from fraud busting: Statistical fraud busting is
essential (a few days ago Jens Forster, using R.A. Fisher’s “too
good to be true” F-test).

Mayo
5/15

41

Need a “philosophical-methodological” assessment
(I’m calling it this because, philosophers do not always question
the methodology; e.g.,“experimental philosophers” use results
from this type of study for informing philosophical questions.)

Mayo
5/15

42

I began with a puzzle: How can we worry about science being
held in too high a regard when we are daily confronted with
articles shouting that “most scientific findings are false?”
“there is a crisis of replication”?
There is a connection: methodological and philosophical
problems with the use and interpretation of statistical method
Statistics as holy water, hide selection effects, misinterpret
methods (based on assumed philosophies of statistics) ignore
DOEs (we have so much data we don’t need them), ….
One more (underlying the): “Most scientific findings are false”
Based on using measures of exploratory screening to assess
“science-wise error rates.” (I’ll save for discussion.)

Mayo
5/15

43

“Science-wise error rates” (FDRs):
A: finding a statistically significant result at the .05 level

If we:
• imagine two point hypotheses H0
and H1
–
H1
identified with
some “meaningful” effect, H1,
all else ignored,
• assume P(H1)
is very small (.1),
• permit a dichotomous “thumbs up-down” pronouncement,
from a single (just) .05 significant result (ignoring
magnitudes),

Mayo
5/15

44

• allow the ratio of type 1 error probability to the power
against H1 to supply a “likelihood ratio”.
The unsurprising result is that most “positive results” are false.
Not based on data, but an analytic exercise (Ioannides 2005):
Their computations might at best hold for crude screening
exercises (e.g., for associations between genes and disease).
It risks entrenching just about every fallacy in the books.

Mayo
5/15

45

Conclusion

• Legitimate criticisms of scientism often stem from
insufficiently self-critical methodology, often statistical i.e.,
from what might be called “statisticism.”
• Understanding and resolving these issues calls for
philosophical scrutiny of the methodological sort (jointly
with statistical practitioners, and science journalists).
• Not only would this help to make progress in the debates—
the science wars and the statistics wars—it would promote
philosophies of science genuinely relevant for practice.

D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statistics, and the philosophers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statistics, and the philosophers

Similar to D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statistics, and the philosophers (20)

More from jemille6

More from jemille6 (20)

Recently uploaded

Recently uploaded (20)

D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statistics, and the philosophers