SlideShare a Scribd company logo
1 of 58
Download to read offline
INTRO
PROBABILITY
BILL HOWE
UNIVERSITY OF WASHINGTON
1
HOW DIFFERENT IS DIFFERENT?
How do we know the difference in two treatments is
not just due to chance?
We don’t. But we can calculate the odds that it is.
This is the p-value
In repeated experiments at this sample size, how
often would you see a result at least this extreme
assuming the null hypothesis?
2
P-VALUE
If the test is two-sided:
• P-value = 2 * P( X > |observed value| )
If the test is one-sided:
• HA is µ > µ0
• P-value = P( X > observed value )
• HA is µ < µ0
• P-value = P( X < observed value )
3
4
5
JONAH LEHRER,2010,THE NEWYORKER
THE TRUTHWEARS OFF
6
http://www.newyorker.com/reporting/
2010/12/13/101213fa_fact_lehrer
Anders Pape Møller, 1991
“female barn swallows were far more likely to mate with male birds that had long,
symmetrical feathers”
“Between 1992 and 1997, the average effect size shrank by eighty per cent.”
Joseph Rhine, 1930s, coiner of the term extrasensory perception
Tested individuals with card-guessing experiments. A few students achieved
multiple low-probability streaks. But there was a “decline effect” – their
performance became worse over time.
John Davis, University of Illinois
“Davis has a forthcoming analysis demonstrating that the efficacy of
antidepressants has gone down as much as threefold in recent decades.”
Jonathan Schooler, 1990
“subjects shown a face and asked to describe it were much less likely to
recognize the face when shown it later than those who had simply looked at
it.”
The effect became increasingly difficult to measure.
REASON 1:PUBLICATION BIAS
7
“In the last few years, several meta-analyses have reappraised the
efficacy and safety of antidepressants and concluded that the therapeutic
value of these drugs may have been significantly overestimated.”
Publication bias:What are the challenges and can they be overcome?
Ridha Joober,Norbert Schmitz,Lawrence Annable,and Patricia Boksa
J Psychiatry Neurosci.2012 May; 37(3): 149–152.doi: 10.1503/jpn.120065
“Although publication bias has been documented in the literature
for decades and its origins and consequences debated extensively,
there is evidence suggesting that this bias is increasing.”
“A case in point is the field of biomedical research in autism spectrum
disorder (ASD), which suggests that in some areas negative results
are completely absent”
“… a highly significant correlation (R2= 0.13, p < 0.001) between impact
factor and overestimation of effect sizes has been reported.”
PUBLICATION BIAS (2)
8
“decline effect”
9
“decline effect” = publication bias!
BACKGROUND:EFFECT SIZE
Expressed in relevant units
Not just “significant” – how significant?
Used prolifically in meta-analysis to combine results from
multiple studies
•  But be careful – averaging results from different experiments can produce
nonsense
10
Robert Coe, 2002, Annual Conference of the British Educational Research Association
It's the Effect Size, Stupid:What effect size is and why it is important.
[Mean of experimental group] – [Mean of control group]
standard deviation
Effect size =
Caveat: Other definitions of effect size exist: odds-ratio,correlation coefficient
EFFECT SIZE
Standardized Mean Difference
11
ES =
¯X1
¯X2
pooled Lots of ways to
estimate the
pooled standard
deviation
e.g., Hartung et al., 2008
pooled = ˆ2
Glass, 1976
META-ANALYSIS
1978: Gene V. Glass statistically aggregate the findings
of 375 psychotherapy outcome studies Glass (and
colleague Smith) to disprove claim that psychotherapy
was useless
Glass coined the term “meta-analysis”
Earlier ideas from Fisher (1944)
•  “When a number of quite independent tests of significance have been
made, it sometimes happens that although few or none can be claimed
individually as significant, yet the aggregate gives an impression that the
probabilities are on the whole lower than would often have been obtained
by chance”
12
adapted from slide by David B.Wilson,1999
META-ANALYSIS
Even more important in data science
• You will often be working with data you didn’t collect
• “Big Data” may have become big by combining data
from different sources
• When is this ok? Test for homogeneity
13
META-ANALYSIS:WEIGHTED AVERAGE
Idea: Average across multiple studies, but give more weight to
more precise studies
Simple method: Weight by sample size
Inverse-variance weight :
14
Caveat:This is a fixed-effect model: it assumes that each
individual study is measuring the same true effect.
We won’t talk about the random effects model.
wi =
ni
P
j nj
wi =
1
se2Lots of
variants
EFFECT SIZE:COHEN’S HEURISTIC
Standardized mean difference effect
size
• small = 0.20
• medium = 0.50
• large = 0.80
15
CONFIDENCE INTERVAL (OF EFFECT SIZE)
What does a 95% confidence interval of
the effect size mean?
• If we repeated the experiment 100 times, we
expect that the interval would include this effect
size 95/100 times
• If this interval includes 0.0, that’s equivalent to
saying the result is not statistically significant.
16
ASIDE:HETEROSKEDASTICITY
17
non-uniform variance
ASIDE:HETEROSKEDASTICITY
18
100 repetitions, same x-values, y-values drawn from the same model
•  Not necessarily a
problem
•  Still provides an
unbiased estimate
•  Can increase error
estimates, leading
to Type 2 errors:
overlooking a real
effect
REASON 2:MISTAKES AND FRAUD
19
http://www.nature.com/news/2011/111005/pdf/478026a.pdf
Richard Van Noorden, 2011, Nature 478
The Rise of the Retractions
•  10X increase in retractions
•  only 1.44X increase in papers
2001 – 2011:
BENFORD’S LAW:POTENTIAL TOOL FOR FRAUD
DETECTION
20
NewYork 8,336,697
Los Angeles 3,857,799
Chicago 2,714,856
Houston 2,160,821
Philadelphia 1,547,607
Phoenix 1,488,750
San Antonio 1,382,951
San Diego 1,338,348
Dallas 1,241,162
21
src: http://testingbenfordslaw.com/x
22
src: http://testingbenfordslaw.com/x
23
src: http://testingbenfordslaw.com/x
24
src: http://testingbenfordslaw.com/x
BENFORD’S LAW TO DETECT FRAUD
Diekmann, 2007
•  Found that first and second digits of published statistical estimates
were approximately Benford distributed
•  Asked subjects to manufacture regression coefficients, and found
that the first digits were hard to detect as anomolous, but the second
and third digits deviated from expected distributions significantly.
25
Andreas Diekmann, 2007, Journal of Applied Statistics, 34(3)
Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific
Data
BENFORD’S LAW INTUITION
Given a sequence of cards labeled 1, 2, 3, …
999999
Put them in a hat, one by one, in order
After each card, ask “What is the probability
of drawing the number 1?”
26
27
BENFORD’S LAW EXPLANATION
28
Limitations
•  Data must
span several
orders of
magnitude
•  No min/max
cutoffs
REASON 3:MULTIPLE HYPOTHESIS TESTING
•  If you perform experiments over and over,
you’re bound to find something
•  This is a bit different than the publication bias
problem: Same sample, different hypotheses
•  Significance level must be adjusted down when
performing multiple hypothesis tests
29
30
P(detecting an effect when there is none) = α = 0.05
P(detecting an effect when it exists) = 1 – α
P(detecting an effect when it exists on every experiment) = (1 – α)k
P(detecting an effect when there is none on at least one
experiment) = 1 – (1 – α)k
α = 0.05
“Familywise Error Rate”
FAMILYWISE ERROR RATE CORRECTIONS
Bonferroni Correction
•  Just divide by the number of hypotheses
Šidák Correction
•  Asserts independence
31
↵c =
↵
k
↵c = 1 (1 ↵)
1
k
↵ = 1 (1 ↵c)k
FALSE DISCOVERY RATE
32
Reject
H0
Do Not Reject
H0
Total
H0 is true FD TN T
H0 is false TD FN F
Total D N TFDN
T/F = True/False
D/N = Discovery/Nondiscovery
Q = FDR =
FD
D
FDR (2)
Bonferroni correction and other FWER corrections tend to wipe
out evidence of the most interesting effects; they suffer from low
power.
FDR control offers a way to increase power while maintaining a
bound on the ratio of wrong conclusions
Intuition:
•  4 false discoveries out of 10 rejected null hypotheses
is a more serious error than
•  20 false discoveries out of 100 rejected null hypotheses.
33
adapted from a slide by Christopher Genovese
BENJAMINI-HOCHBERG PROCEDURE
Compute the p-value of m hypotheses
Order them in increasing order of p-value
•  That is, most likely hypotheses are first
34
Pi 
i
m
↵
i
1 0.001
2 0.002
3 0.003
…
20 0.020
…
i
0.05
↵
FDR 
T
m
↵
50
BENJAMINI-HOCHBERG PROCEDURE
35
m = 50
α = 0.05
WHEREWE ARE
MOTIVATION
Publication Bias
Fraud Detection
Multiple Hypothesis
Testing
TOPIC OR TECHNIQUE
Basic Statistical Inference
Hypothesis Testing
Effect Size
Heteroskedasticity
Benford’s Law
Familywise Error Rate
•  Bonferroni Correction
•  Šidák Correction
False Discovery Rate
•  Benjamini-Hochberg Procedure
36
WHAT ABOUT BIG DATA?
37
http://www-stat.stanford.edu/~ckirby/brad/papers/
2005BayesFreqSci.pdf
“Classical statistics was fashioned for small problems, a few
hundred data points at most, a few parameters.”
“The bottom line is that we have entered an era of massive
scientific data collection, with a demand for answers to large-
scale inference problems that lie beyond the scope of classical
statistics.”
Bradley Efron,
Bayesians, Frequentists, and Scientists
38
POSITIVE CORRELATIONS
Number of police officers and number of
crimes (Glass & Hopkins, 1996)
Amount of ice cream sold and deaths by
drownings (Moore, 1993)
Stork sightings and population increase
(Box, Hunter, Hunter, 1978)
39
THE“CURSE”OF BIG DATA?
40
http://www.analyticbridge.com/profiles/blogs/the-
curse-of-big-data
“…the curse of big data is the fact that when you
search for patterns in very, very large data sets with
billions or trillions of data points and thousands of
metrics, you are bound to identify coincidences that
have no predictive power.”
Vincent Granville
VINCENT GRANVILLE’S EXAMPLE
Consider stock prices for 500
companies over a 1-month period
Check for correlations in all pairs
41
ASIDE:VERY BASIC TIMESERIES ANALYSIS (1)
42
standard
deviation
cov(x, y) = (xi −ux )(yi −uy )
i
∑
corr(x, y) =
cov(x, y)
(xi −ux )2
(yi −uy )2
i
∑
RANDOMWALK
43
each step is normally distributed @ 1% of current price
44
correlation threshold = 0.9
IS BIG DATA DIFFERENT?
Big P vs. Big N
•  P = number of variables (columns)
•  N = number of records
Marginal cost of increasing N is essentially zero!
But while >N decreases variance, it amplifies bias
•  Ex:You log all clicks to your website to model user behavior, but
this only samples current users, not the users you want to attract.
•  Ex: Using mobile data to infer buying behavior
Beware multiple hypothesis tests
•  “Green jelly beans cause acne”
Taleb’s “Black Swan” events
•  The turkey’s model of human behavior
45
Frequentist Approach to Statistics
Bayesian Approach to Statistics
46
P(D|H) Probability of seeing this
data, given the (null)
hypothesis
P(H|D)
Jeremy Fox
Probability of a given
outcome, given this data
DIFFERENCES BETWEEN BAYESIANS AND NON-BAYESIANS
WHAT IS FIXED?
FREQUENTIST
Data are a repeatable
random sample
•  there is a frequency
Underlying parameters
remain constant during
this repeatable process
Parameters are fixed
BAYESIAN
Data are observed from the
realized sample.
Parameters are unknown
and described
probabilistically
Data are fixed
47
http://www.stat.ufl.edu/~casella/Talks/BayesRefresher.pdf
BAYES’THEOREM
48
A key benefit:The ability to incorporate prior knowledge
A key weakness:The need to incorporate prior knowledge
P(A | B) =
P(B | A)P(A)
P(B)
MATTHEWS 1998
49
“Faced with the same experimental evidence for, say,
ESP, true believers could use Bayes's Theorem to claim
that the new results implied that telepathy is almost
certainly real.
Skeptics, in contrast, could use Bayes's Theorem to insist
they were still not convinced.”
“…different people could use Bayes's Theorem and get
different results”
“Both views are possible because Bayes's Theorem
shows only how to alter one's prior level of belief -- and
different people can start out with different opinions.”
http://junksciencearchive.com/sep98/matthews.html
MATTHEWS 1998
50
http://junksciencearchive.com/sep98/matthews.html
“Fisher had achieved what Bayes claimed was impossible: he had found a
way of judging the "significance" of experimental data entirely objectively.
That is, he had found a way that anyone could use to show that a result was
too impressive to be dismissed as a fluke.”
“All scientists had to do, said Fisher, was to convert their raw data into … a
P-value”
“So just what were the brilliant insights that led Fisher to choose that
talismanic figure of 0.05, on which so much scientific research has since
stood or fallen ? Incredibly, as Fisher himself admitted, there weren't any. He
simply decided on 0.05 because it was mathematically convenient.”
“Professor James Berger of Purdue University - a world authority on Bayes's
Theorem - published a entire series of papers again warning of the
"astonishing" tendency of Fisher's P-values to exaggerate significance.
Findings that met the 0.05 standard, said Berger, "Can actually arise when
the data provide very little or no evidence in favour of an effect".”
51
Prior Belief
x0
x1
x2
x3
.
.
.
Data
Decision
Frequentist
Bayesian
BAYES THEOREM
52
Prior
The probability of the
hypothesis being true
before collecting data
Marginal
What is the probability of
collecting this data under
all possible hypotheses?
Likelihood
Probability of
collecting this data
when our hypothesis is
true
Posterior
The probability of our
hypothesis being true
given the data collected
P(A | B) =
P(B | A)P(A)
P(B)
53
A ~A
B
~B
P(A | B) =
P(B | A)P(A)
P(B)
P(A∩B) = P(A | B)P(B) = P(B | A)P(A)
P(A∩B)
QUESTION
54
1% of women at age forty who participate in routine screening have
breast cancer.
80% of women with breast cancer will get positive mammographies.
9.6% of women without breast cancer will also get positive
mammographies.
A woman in this age group had a positive mammography in a
routine screening.
What is the probability that she actually has breast cancer?
http://yudkowsky.net/rational/bayes
55
Have Cancer (1%)
Do Not Have Cancer
(99%)
Positive Test True Positive: 1% * 80%
False Positive: 99% *
9.6%
Negative Test
False Negative: 1% *
20%
True Negative: 99% *
90.4%
56
P(cancer | positive test) =
P(positive test | cancer)P(cancer)
P(positive test)
P(positive test) = P(positive test | cancer)P(cancer)+ P(positive test | no cancer)P(no cancer)
P(positive test) = 0.8(0.01)+ 0.096(0.99) = 0.103
P(cancer) = 0.01
P(positive test | cancer) = 0.8
P(cancer | positive test) =
0.8(0.01)
0.103
= 0.078
SPAM FILTERINGWITH NAÏVE BAYES
57
P(spam|viagra, rich, ..., friend) =
P(spam)P(viagra, rich, ..., friend|spam)
P(viagra, rich, ..., friend)
P(spam|words) =
P(spam)P(spam|words)
P(words)
P(spam)P(viagra, rich, ..., friend|spam)
/ P(spam)P(viagra|spam)P(rich, ..., friend|spam, viagra)
/ P(spam)P(viagra|spam)P(rich|spam, viagra)P(..., friend|spam, viagra, rich)
...
P(viagra|spam)P(rich|spam)...P(friend|spam)
SLIDES CAN BE FOUND AT:
TEACHINGDATASCIENCE.ORG

More Related Content

What's hot

Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
What is your question
What is your questionWhat is your question
What is your questionStephenSenn2
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysisPaul Gardner
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansStephen Senn
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to misleadStephen Senn
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Stephen Senn
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher Stephen Senn
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."jemille6
 
Personalised medicine a sceptical view
Personalised medicine a sceptical viewPersonalised medicine a sceptical view
Personalised medicine a sceptical viewStephen Senn
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively jemille6
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significanceStephenSenn2
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicineStephen Senn
 
Take it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersTake it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersStephen Senn
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresStephen Senn
 
Effect size presentation.rob
Effect size presentation.robEffect size presentation.rob
Effect size presentation.robRob Darrow
 

What's hot (20)

Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
What is your question
What is your questionWhat is your question
What is your question
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to mislead
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
Personalised medicine a sceptical view
Personalised medicine a sceptical viewPersonalised medicine a sceptical view
Personalised medicine a sceptical view
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 
Depersonalising medicine
Depersonalising medicineDepersonalising medicine
Depersonalising medicine
 
On being Bayesian
On being BayesianOn being Bayesian
On being Bayesian
 
Take it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other mattersTake it to the Limit: quantitation, likelihood, modelling and other matters
Take it to the Limit: quantitation, likelihood, modelling and other matters
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Effect size presentation.rob
Effect size presentation.robEffect size presentation.rob
Effect size presentation.rob
 

Similar to Bill howe 5_statistics

P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG researchDorothy Bishop
 
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...Pat Barlow
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Empirische wetenschap onder vuur
Empirische wetenschap onder vuurEmpirische wetenschap onder vuur
Empirische wetenschap onder vuurtim smits
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Sample size
Sample sizeSample size
Sample sizezubis
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2John Labrador
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
Excursions into the garden of the forking paths
Excursions into the garden of the forking paths Excursions into the garden of the forking paths
Excursions into the garden of the forking paths Ulrich Dirnagl
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
Sample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdfSample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdfstatsanjal
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 

Similar to Bill howe 5_statistics (20)

P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
 
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
What's Significant? Hypothesis Testing, Effect Size, Confidence Intervals, & ...
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Lund 2009
Lund 2009Lund 2009
Lund 2009
 
Empirische wetenschap onder vuur
Empirische wetenschap onder vuurEmpirische wetenschap onder vuur
Empirische wetenschap onder vuur
 
ch 2 hypothesis
ch 2 hypothesisch 2 hypothesis
ch 2 hypothesis
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Sample size
Sample sizeSample size
Sample size
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
p-values9.pdf
p-values9.pdfp-values9.pdf
p-values9.pdf
 
Excursions into the garden of the forking paths
Excursions into the garden of the forking paths Excursions into the garden of the forking paths
Excursions into the garden of the forking paths
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
Sample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdfSample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdf
 
Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 

More from Mahammad Valiyev

More from Mahammad Valiyev (6)

Bill howe 8_graphs
Bill howe 8_graphsBill howe 8_graphs
Bill howe 8_graphs
 
Bill howe 7_machinelearning_2
Bill howe 7_machinelearning_2Bill howe 7_machinelearning_2
Bill howe 7_machinelearning_2
 
Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1Bill howe 6_machinelearning_1
Bill howe 6_machinelearning_1
 
Bill howe 4_bigdatasystems
Bill howe 4_bigdatasystemsBill howe 4_bigdatasystems
Bill howe 4_bigdatasystems
 
Bill howe 3_mapreduce
Bill howe 3_mapreduceBill howe 3_mapreduce
Bill howe 3_mapreduce
 
Bill howe 2_databases
Bill howe 2_databasesBill howe 2_databases
Bill howe 2_databases
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Bill howe 5_statistics

  • 2. HOW DIFFERENT IS DIFFERENT? How do we know the difference in two treatments is not just due to chance? We don’t. But we can calculate the odds that it is. This is the p-value In repeated experiments at this sample size, how often would you see a result at least this extreme assuming the null hypothesis? 2
  • 3. P-VALUE If the test is two-sided: • P-value = 2 * P( X > |observed value| ) If the test is one-sided: • HA is µ > µ0 • P-value = P( X > observed value ) • HA is µ < µ0 • P-value = P( X < observed value ) 3
  • 4. 4
  • 5. 5
  • 6. JONAH LEHRER,2010,THE NEWYORKER THE TRUTHWEARS OFF 6 http://www.newyorker.com/reporting/ 2010/12/13/101213fa_fact_lehrer Anders Pape Møller, 1991 “female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers” “Between 1992 and 1997, the average effect size shrank by eighty per cent.” Joseph Rhine, 1930s, coiner of the term extrasensory perception Tested individuals with card-guessing experiments. A few students achieved multiple low-probability streaks. But there was a “decline effect” – their performance became worse over time. John Davis, University of Illinois “Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.” Jonathan Schooler, 1990 “subjects shown a face and asked to describe it were much less likely to recognize the face when shown it later than those who had simply looked at it.” The effect became increasingly difficult to measure.
  • 7. REASON 1:PUBLICATION BIAS 7 “In the last few years, several meta-analyses have reappraised the efficacy and safety of antidepressants and concluded that the therapeutic value of these drugs may have been significantly overestimated.” Publication bias:What are the challenges and can they be overcome? Ridha Joober,Norbert Schmitz,Lawrence Annable,and Patricia Boksa J Psychiatry Neurosci.2012 May; 37(3): 149–152.doi: 10.1503/jpn.120065 “Although publication bias has been documented in the literature for decades and its origins and consequences debated extensively, there is evidence suggesting that this bias is increasing.” “A case in point is the field of biomedical research in autism spectrum disorder (ASD), which suggests that in some areas negative results are completely absent” “… a highly significant correlation (R2= 0.13, p < 0.001) between impact factor and overestimation of effect sizes has been reported.”
  • 9. 9 “decline effect” = publication bias!
  • 10. BACKGROUND:EFFECT SIZE Expressed in relevant units Not just “significant” – how significant? Used prolifically in meta-analysis to combine results from multiple studies •  But be careful – averaging results from different experiments can produce nonsense 10 Robert Coe, 2002, Annual Conference of the British Educational Research Association It's the Effect Size, Stupid:What effect size is and why it is important. [Mean of experimental group] – [Mean of control group] standard deviation Effect size = Caveat: Other definitions of effect size exist: odds-ratio,correlation coefficient
  • 11. EFFECT SIZE Standardized Mean Difference 11 ES = ¯X1 ¯X2 pooled Lots of ways to estimate the pooled standard deviation e.g., Hartung et al., 2008 pooled = ˆ2 Glass, 1976
  • 12. META-ANALYSIS 1978: Gene V. Glass statistically aggregate the findings of 375 psychotherapy outcome studies Glass (and colleague Smith) to disprove claim that psychotherapy was useless Glass coined the term “meta-analysis” Earlier ideas from Fisher (1944) •  “When a number of quite independent tests of significance have been made, it sometimes happens that although few or none can be claimed individually as significant, yet the aggregate gives an impression that the probabilities are on the whole lower than would often have been obtained by chance” 12 adapted from slide by David B.Wilson,1999
  • 13. META-ANALYSIS Even more important in data science • You will often be working with data you didn’t collect • “Big Data” may have become big by combining data from different sources • When is this ok? Test for homogeneity 13
  • 14. META-ANALYSIS:WEIGHTED AVERAGE Idea: Average across multiple studies, but give more weight to more precise studies Simple method: Weight by sample size Inverse-variance weight : 14 Caveat:This is a fixed-effect model: it assumes that each individual study is measuring the same true effect. We won’t talk about the random effects model. wi = ni P j nj wi = 1 se2Lots of variants
  • 15. EFFECT SIZE:COHEN’S HEURISTIC Standardized mean difference effect size • small = 0.20 • medium = 0.50 • large = 0.80 15
  • 16. CONFIDENCE INTERVAL (OF EFFECT SIZE) What does a 95% confidence interval of the effect size mean? • If we repeated the experiment 100 times, we expect that the interval would include this effect size 95/100 times • If this interval includes 0.0, that’s equivalent to saying the result is not statistically significant. 16
  • 18. ASIDE:HETEROSKEDASTICITY 18 100 repetitions, same x-values, y-values drawn from the same model •  Not necessarily a problem •  Still provides an unbiased estimate •  Can increase error estimates, leading to Type 2 errors: overlooking a real effect
  • 19. REASON 2:MISTAKES AND FRAUD 19 http://www.nature.com/news/2011/111005/pdf/478026a.pdf Richard Van Noorden, 2011, Nature 478 The Rise of the Retractions •  10X increase in retractions •  only 1.44X increase in papers 2001 – 2011:
  • 20. BENFORD’S LAW:POTENTIAL TOOL FOR FRAUD DETECTION 20 NewYork 8,336,697 Los Angeles 3,857,799 Chicago 2,714,856 Houston 2,160,821 Philadelphia 1,547,607 Phoenix 1,488,750 San Antonio 1,382,951 San Diego 1,338,348 Dallas 1,241,162
  • 25. BENFORD’S LAW TO DETECT FRAUD Diekmann, 2007 •  Found that first and second digits of published statistical estimates were approximately Benford distributed •  Asked subjects to manufacture regression coefficients, and found that the first digits were hard to detect as anomolous, but the second and third digits deviated from expected distributions significantly. 25 Andreas Diekmann, 2007, Journal of Applied Statistics, 34(3) Not the First Digit! Using Benford's Law to Detect Fraudulent Scientific Data
  • 26. BENFORD’S LAW INTUITION Given a sequence of cards labeled 1, 2, 3, … 999999 Put them in a hat, one by one, in order After each card, ask “What is the probability of drawing the number 1?” 26
  • 27. 27
  • 28. BENFORD’S LAW EXPLANATION 28 Limitations •  Data must span several orders of magnitude •  No min/max cutoffs
  • 29. REASON 3:MULTIPLE HYPOTHESIS TESTING •  If you perform experiments over and over, you’re bound to find something •  This is a bit different than the publication bias problem: Same sample, different hypotheses •  Significance level must be adjusted down when performing multiple hypothesis tests 29
  • 30. 30 P(detecting an effect when there is none) = α = 0.05 P(detecting an effect when it exists) = 1 – α P(detecting an effect when it exists on every experiment) = (1 – α)k P(detecting an effect when there is none on at least one experiment) = 1 – (1 – α)k α = 0.05 “Familywise Error Rate”
  • 31. FAMILYWISE ERROR RATE CORRECTIONS Bonferroni Correction •  Just divide by the number of hypotheses Šidák Correction •  Asserts independence 31 ↵c = ↵ k ↵c = 1 (1 ↵) 1 k ↵ = 1 (1 ↵c)k
  • 32. FALSE DISCOVERY RATE 32 Reject H0 Do Not Reject H0 Total H0 is true FD TN T H0 is false TD FN F Total D N TFDN T/F = True/False D/N = Discovery/Nondiscovery Q = FDR = FD D
  • 33. FDR (2) Bonferroni correction and other FWER corrections tend to wipe out evidence of the most interesting effects; they suffer from low power. FDR control offers a way to increase power while maintaining a bound on the ratio of wrong conclusions Intuition: •  4 false discoveries out of 10 rejected null hypotheses is a more serious error than •  20 false discoveries out of 100 rejected null hypotheses. 33 adapted from a slide by Christopher Genovese
  • 34. BENJAMINI-HOCHBERG PROCEDURE Compute the p-value of m hypotheses Order them in increasing order of p-value •  That is, most likely hypotheses are first 34 Pi  i m ↵ i 1 0.001 2 0.002 3 0.003 … 20 0.020 … i 0.05 ↵ FDR  T m ↵ 50
  • 36. WHEREWE ARE MOTIVATION Publication Bias Fraud Detection Multiple Hypothesis Testing TOPIC OR TECHNIQUE Basic Statistical Inference Hypothesis Testing Effect Size Heteroskedasticity Benford’s Law Familywise Error Rate •  Bonferroni Correction •  Šidák Correction False Discovery Rate •  Benjamini-Hochberg Procedure 36
  • 37. WHAT ABOUT BIG DATA? 37 http://www-stat.stanford.edu/~ckirby/brad/papers/ 2005BayesFreqSci.pdf “Classical statistics was fashioned for small problems, a few hundred data points at most, a few parameters.” “The bottom line is that we have entered an era of massive scientific data collection, with a demand for answers to large- scale inference problems that lie beyond the scope of classical statistics.” Bradley Efron, Bayesians, Frequentists, and Scientists
  • 38. 38
  • 39. POSITIVE CORRELATIONS Number of police officers and number of crimes (Glass & Hopkins, 1996) Amount of ice cream sold and deaths by drownings (Moore, 1993) Stork sightings and population increase (Box, Hunter, Hunter, 1978) 39
  • 40. THE“CURSE”OF BIG DATA? 40 http://www.analyticbridge.com/profiles/blogs/the- curse-of-big-data “…the curse of big data is the fact that when you search for patterns in very, very large data sets with billions or trillions of data points and thousands of metrics, you are bound to identify coincidences that have no predictive power.” Vincent Granville
  • 41. VINCENT GRANVILLE’S EXAMPLE Consider stock prices for 500 companies over a 1-month period Check for correlations in all pairs 41
  • 42. ASIDE:VERY BASIC TIMESERIES ANALYSIS (1) 42 standard deviation cov(x, y) = (xi −ux )(yi −uy ) i ∑ corr(x, y) = cov(x, y) (xi −ux )2 (yi −uy )2 i ∑
  • 43. RANDOMWALK 43 each step is normally distributed @ 1% of current price
  • 45. IS BIG DATA DIFFERENT? Big P vs. Big N •  P = number of variables (columns) •  N = number of records Marginal cost of increasing N is essentially zero! But while >N decreases variance, it amplifies bias •  Ex:You log all clicks to your website to model user behavior, but this only samples current users, not the users you want to attract. •  Ex: Using mobile data to infer buying behavior Beware multiple hypothesis tests •  “Green jelly beans cause acne” Taleb’s “Black Swan” events •  The turkey’s model of human behavior 45
  • 46. Frequentist Approach to Statistics Bayesian Approach to Statistics 46 P(D|H) Probability of seeing this data, given the (null) hypothesis P(H|D) Jeremy Fox Probability of a given outcome, given this data
  • 47. DIFFERENCES BETWEEN BAYESIANS AND NON-BAYESIANS WHAT IS FIXED? FREQUENTIST Data are a repeatable random sample •  there is a frequency Underlying parameters remain constant during this repeatable process Parameters are fixed BAYESIAN Data are observed from the realized sample. Parameters are unknown and described probabilistically Data are fixed 47 http://www.stat.ufl.edu/~casella/Talks/BayesRefresher.pdf
  • 48. BAYES’THEOREM 48 A key benefit:The ability to incorporate prior knowledge A key weakness:The need to incorporate prior knowledge P(A | B) = P(B | A)P(A) P(B)
  • 49. MATTHEWS 1998 49 “Faced with the same experimental evidence for, say, ESP, true believers could use Bayes's Theorem to claim that the new results implied that telepathy is almost certainly real. Skeptics, in contrast, could use Bayes's Theorem to insist they were still not convinced.” “…different people could use Bayes's Theorem and get different results” “Both views are possible because Bayes's Theorem shows only how to alter one's prior level of belief -- and different people can start out with different opinions.” http://junksciencearchive.com/sep98/matthews.html
  • 50. MATTHEWS 1998 50 http://junksciencearchive.com/sep98/matthews.html “Fisher had achieved what Bayes claimed was impossible: he had found a way of judging the "significance" of experimental data entirely objectively. That is, he had found a way that anyone could use to show that a result was too impressive to be dismissed as a fluke.” “All scientists had to do, said Fisher, was to convert their raw data into … a P-value” “So just what were the brilliant insights that led Fisher to choose that talismanic figure of 0.05, on which so much scientific research has since stood or fallen ? Incredibly, as Fisher himself admitted, there weren't any. He simply decided on 0.05 because it was mathematically convenient.” “Professor James Berger of Purdue University - a world authority on Bayes's Theorem - published a entire series of papers again warning of the "astonishing" tendency of Fisher's P-values to exaggerate significance. Findings that met the 0.05 standard, said Berger, "Can actually arise when the data provide very little or no evidence in favour of an effect".”
  • 52. BAYES THEOREM 52 Prior The probability of the hypothesis being true before collecting data Marginal What is the probability of collecting this data under all possible hypotheses? Likelihood Probability of collecting this data when our hypothesis is true Posterior The probability of our hypothesis being true given the data collected P(A | B) = P(B | A)P(A) P(B)
  • 53. 53 A ~A B ~B P(A | B) = P(B | A)P(A) P(B) P(A∩B) = P(A | B)P(B) = P(B | A)P(A) P(A∩B)
  • 54. QUESTION 54 1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? http://yudkowsky.net/rational/bayes
  • 55. 55 Have Cancer (1%) Do Not Have Cancer (99%) Positive Test True Positive: 1% * 80% False Positive: 99% * 9.6% Negative Test False Negative: 1% * 20% True Negative: 99% * 90.4%
  • 56. 56 P(cancer | positive test) = P(positive test | cancer)P(cancer) P(positive test) P(positive test) = P(positive test | cancer)P(cancer)+ P(positive test | no cancer)P(no cancer) P(positive test) = 0.8(0.01)+ 0.096(0.99) = 0.103 P(cancer) = 0.01 P(positive test | cancer) = 0.8 P(cancer | positive test) = 0.8(0.01) 0.103 = 0.078
  • 57. SPAM FILTERINGWITH NAÏVE BAYES 57 P(spam|viagra, rich, ..., friend) = P(spam)P(viagra, rich, ..., friend|spam) P(viagra, rich, ..., friend) P(spam|words) = P(spam)P(spam|words) P(words) P(spam)P(viagra, rich, ..., friend|spam) / P(spam)P(viagra|spam)P(rich, ..., friend|spam, viagra) / P(spam)P(viagra|spam)P(rich|spam, viagra)P(..., friend|spam, viagra, rich) ... P(viagra|spam)P(rich|spam)...P(friend|spam)
  • 58. SLIDES CAN BE FOUND AT: TEACHINGDATASCIENCE.ORG