Debunk bullshit in statistics QN

Debunk in statistics
@QuanNguyen3010
Quan Nguyen
Institute of Educational Technology
Open University UK
Misuse, misrepresentations, and misinterpretations
of statistics in social science and beyond

What is bullshit (in academia)?
In 5 minutes, discuss with your fellows:
1. Bullshit that you produced yourself
2. Bullshit that you are exposed to
3. Bullshit that you debunked or try to debunk
Source: http://callingbullshit.org/exercises_inventory.html

Bullshit vs lying
The liar, knows and cares about
the truth, but deliberately sets
out to mislead instead of telling
the truth.
Source: Frankfurt, H. G. (2009). On bullshit. Princeton University Press.
The "bullshitter", on the other
hand, does not care about the
truth and is only seeking to
impress.

Academic writing bullshit
• Can you translate this in plain English?
“Methodological observation of the sociometrical behavior tendencies
of prematurated isolates indicates that a causal relationship exists
between groundward tropism and lachrymatory, or ‘crying, ’ behavior
forms. ”
= Children cry when they fall down
Source: Eubanks, P., & Schaeffer, J. D. (2008). A kind word for bullshit: The problem of academic writing.
College Composition and Communication, 372-388.

Researchers Media Readers
Misuse MisinterpretMisrepresent
Bullshit in statistics

Misuse 1: p-value
True or False?
A) p<0.05 so the effect is significant
B) p>0.05 so the effect is nonsignificant
C) p-value measures the probability that the
studied hypothesis is true
D) p-value measures the probability that the data
were produced by random chance alone
E) p-value measures the probability that the null
hypothesis is true

Misuse 1: p-value
What is p-value?
The probability, of obtaining a
result equal to or more extreme
than what was actually observed,
given the null hypothesis is true

Misuse 1: p-value
1. P-values can indicate how incompatible the data are with a
specified statistical model.
2. P-values do not measure the probability that the studied hypothesis
is true, or the probability that the data were produced by random
chance alone.
3. A p-value, or statistical significance, does not measure the size of an
effect or the importance of a result.
4. …
5. …
6. …
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's Statement on p-Values: Context, Process, and Purpose.
The American Statistician, 70(2), 129-133.

Misuse 2: p-hacking
Prof. Charles Goodhart
"When a measure becomes a target, it
ceases to be a good measure."

Misuse 2: p-hacking
• Motulsky, H. J. (2014). Common misconceptions about data analysis and statistics. Naunyn-Schmiedeberg’s Archives of
Pharmacology, 387(11), 1017–1023.
• Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent and Consequences of P-Hacking in Science. PLOS
Biology 13(3): e1002106.
• https://bitssblog.files.wordpress.com/2014/02/nelson-presentation.pdf
• http://freakonometrics.hypotheses.org/19817
An ultimate guide to p-hacking
1. Stop collecting data once p<.05
2. Analyze many measures, but report only those
with p<.05.
3. Collect and analyze many conditions, but only
report those with p<.05.
4. Use covariates to get p<.05.
5. Exclude participants to get p<.05.
6. Transform the data to get p<.05.

Misuse 2: p-hacking
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534-547.
Publication bias => File drawer effect & P-hacking

Misuse 3: Linear regression
True of False?
A) Independent/Dependent variables must be normally distributed
B) The higher the R2, the better model fit
C) Standard error measures variability
• Ernst, A. F., & Albers, C. J. (2017). Regression assumptions in clinical psychology research practice—a systematic review of common
misconceptions. PeerJ, 5, e3323.
• Williams, Matt N., Grajales, Carlos Alberto Gómez, & Kurkiewicz, Dason (2013). Assumptions of Multiple Regression: Correcting Two
Misconceptions. Practical Assessment, Research & Evaluation, 18(11).
• Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. BMJ : British Medical Journal, 331(7521), 903.

Independent/Dependent variables must be
normally distributed?
Nope, it’s the residuals (difference between
predicted and observed values) that should
be normally distributed
• Ernst, A. F., & Albers, C. J. (2017). Regression assumptions in clinical psychology research practice—a systematic review of common
misconceptions. PeerJ, 5, e3323.
• Williams, Matt N., Grajales, Carlos Alberto Gómez, & Kurkiewicz, Dason (2013). Assumptions of Multiple Regression: Correcting Two
Misconceptions. Practical Assessment, Research & Evaluation, 18(11).
• Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. BMJ : British Medical Journal, 331(7521), 903.

• http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit
You can have a low R-squared value for a good model, or a high R-
squared value for a model that does not fit the data!

• https://onlinecourses.science.psu.edu/stat501/node/258
• http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf
The coefficient of determination R2 and the correlation coefficient r quantify the
strength of a linear relationship. It is possible that r2 = 0% and r = 0, suggesting
there is no linear relation between x and y, and yet a perfect curved (or
"curvilinear" relationship) exists

R2 is also pretty useless as a measure of predictability.
• R2 says nothing about prediction error
• R2 says nothing about interval forecasts

R2 cannot be compared across data sets
R2 cannot be compared between a model with untransformed Y and
one with transformed Y , or between different transformations of Y
The one situation where R2 can be compared is when different models
are fit to the same data set with the same, untransformed response
variable

Misuse 4: Parametric & non-parametric test
True of False?
A) You should use nonparametric tests when your data don’t meet the
assumptions of the parametric test (e.g. normality)
• http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-
parametric-test

Misuse 4: Parametric & non-parametric test
• Parametric tests can provide trustworthy results with distributions
that are skewed and nonnormal
• Parametric tests can provide trustworthy results when the groups
have different amounts of variability
• Parametric tests have greater statistical power
• http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-
parametric-test

Misinterpret 1: Correlation & causation
Wage
True of False?
A) Higher years of education lead to higher wage
B) The increase in years of edu is associated with higher wage
Years of Education
What could be the alternative explanations?
Check out: http://tylervigen.com/spurious-correlations

A causes B (direct causation)
Edu Wage
B causes A (reverse causation)
A and B are consequences of a common cause,
but do not cause each other.
A and B both causes C, which is (explicitly or
implicitly) conditioned on
A causes B and B causes A (bidirectional or
cyclic causation)
A causes C which causes B (indirect causation)
The correlation is a coincidence
Crime

• So what implies causation?
1. Strength
2. Consistency
3. Specificity
4. Temporality
5. Gradient
6. Plausibility
7. Coherence
8. Experimental evidence
9. Analogy
Source: Hill, Austin Bradford (1965). "The Environment and Disease: Association or
Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300.

Misinterpret 2: Relative vs absolute risk

The majority of people in academia has a PhD, so a PhD is likely to end up in academia
Misinterpret 3: Inverse’s fallacy
PhD
Academics

Misrepresent 1: Truncated axis
Source: https://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/

Misrepresent 2: Dual axis

Misrepresent 3: It does not add up

Misrepresent 4: Absolute vs relative
https://proteinpower.com/drmike/2013/12/30/absolute-risk-versus-relative-risk-need-know-difference/

Misrepresent 5: Cherry picking

Misrepresent 6: Odd choice of binning

Misrepresent 7: Area dimension

Being critical vs. an asshole
Do you like the
conclusion implied
by the research?
YES
“This is a major
contribution of
unpararelled rigor”
NO
Is the research based
on regression
analysis?
NO
Did the research
control for cofound
factors?
YES
“Correlation does
not imply
causation, duh”
Is the research based
correlation analysis?
YES
NO
YES
“The results could be
explained by other
unobservable factors”
“The phenomenon is
too complex to be
represented in
numbers, further
qualitative research
are needed”
Adapted from:
https://www.washingtonpost.com/n
ews/wonk/wp/2013/09/12/how-to-
argue-with-research-you-dont-like/

Moving forward
Stats producers
• Take time to understand your data, the
assumptions, limitations of your
statistical test
• Don’t try to shortcut stats
• Describe method (replicable)
• Report results AND limitations
• Use simple yet precise language
• Visualize responsibly
• Consult statisticians if not sure
Stats receivers
• Take time to understand data source,
context, design, the assumptions,
limitations of the statistical test
• Too good to be true => More sceptical
• Interpret results AND limitations WITHIN
the method (a.k.a don’t be an asshole)
• Don’t oversimply your use of language
• Aware visualizations = simplified versions
• Consult statisticians if not sure, and pay
them…

If you’re a hard-core stats enthusiast

References organized by topics
• Bullshit in academia
1. Cohen, G. A. (2012). Chapter 5. Complete Bullshit. In: Finding Oneself in the Other. Princeton University Press. pp. 94-114.
2. Eubanks, P., & Schaeffer, J. D. (2008). A kind word for bullshit: The problem of academic writing. College Composition and Communication, 372-388.
3. Frankfurt, H. G. (2009). On bullshit. Princeton University Press.
4. http://callingbullshit.org/exercises_inventory.html
• P-hacking & Misconceptions of p-value
1. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133.
2. Motulsky, H. J. (2014). Common misconceptions about data analysis and statistics. Naunyn-Schmiedeberg’s Archives of Pharmacology, 387(11), 1017–1023.
3. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent and Consequences of P-Hacking in Science. PLOS Biology 13(3): e1002106.
4. https://bitssblog.files.wordpress.com/2014/02/nelson-presentation.pdf
5. http://freakonometrics.hypotheses.org/19817
6. Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534-547.
• Misconceptions of normality assumption, R-squared, and non-parametric test
1. Ernst, A. F., & Albers, C. J. (2017). Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions. PeerJ, 5,
e3323.
2. Williams, Matt N., Grajales, Carlos Alberto Gómez, & Kurkiewicz, Dason (2013). Assumptions of Multiple Regression: Correcting Two Misconceptions. Practical
Assessment, Research & Evaluation, 18(11).
3. Altman, D. G., & Bland, J. M. (2005). Standard deviations and standard errors. BMJ : British Medical Journal, 331(7521), 903.
4. http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit
5. https://onlinecourses.science.psu.edu/stat501/node/258
6. http://www.stat.cmu.edu/~cshalizi/mreg/15/lectures/10/lecture-10.pdf
7. http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test
• Criteria for causation inference
1. Hill, Austin Bradford (1965). "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300.
• Misleading visualizations
1. https://flowingdata.com/2017/02/09/how-to-spot-visualization-lies/
2. https://proteinpower.com/drmike/2013/12/30/absolute-risk-versus-relative-risk-need-know-difference/
3. https://www.washingtonpost.com/news/wonk/wp/2013/09/12/how-to-argue-with-research-you-dont-like/

Debunk bullshit in statistics QN

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Debunk bullshit in statistics QN

Similar to Debunk bullshit in statistics QN (20)

More from Quan Nguyen

More from Quan Nguyen (6)

Recently uploaded

Recently uploaded (20)

Debunk bullshit in statistics QN