3. who is Nate Silver?
◮ developed PECOTA, Pitcher Empirical Comparison and
Optimization Test Algorithm
◮ predict performance of pitchers, expanded to hitters
◮ purchased by Baseball Prospectus
◮ 2003-2008 matched or bettered commercial baseball forecasting
systems, Vegas over-under lines
◮ 538 blog–political predictions for NYT
◮ ex-poker player
◮ Out magazine’s person of the year, 2012
4. motivation
◮ two contexts: “big data” and failure of prediction
◮ “big data”: IBM–2.5 quintillion bytes of data per day
◮ failures of prediction
banking crisis/housing bubble
tv pundits
baseball
weather ...
◮ test of theory–falsifiability (Popper)
◮ complex theories difficult to falsify–Popper: they have little value
◮ predictions about simple things make us suspect complex models
◮ Silver: nonetheless, these theories have value–new attitude toward
prediction: Bayes’ Rule
5. today
◮ go through some of Silver’s examples–failures, sucesses
◮ outline solution: Bayes’ Rule
◮ limitations: example using “fairly big” data
◮ conclusion
6. example: housing bubble, bank crash, great recession
actor: rating agencies
◮ S&P, Moody’s, other rating agencies got it very wrong
◮ predicted default on AAA-rated CDO’s: .12%; actual: 28%
◮ why did they do such poor modeling job?
1. rating agencies part of a legal oligopoly
2. many pension funds require rating by Moody’s/S&P to buy
any piece of debt
3. Moody’s would rate debt “structured by cows”
4. no financial incentive for predictions to have good frequentist
properties
5. all incentives were toward giving “bubble ratings”
7. example: housing bubble, bank crash, great recession
◮ what were problems with models?
1. assume that P(default) for mortgages bundled in CDO’s are
uncorrelated
2. confuse risk and uncertainty
◮ risk – quantifiable likelihood of upside or downside
event–something you can put a price on; the point of
decision-making is to reduce the risk of being a sure loser
◮ uncertainty – requires assigning distribution to
unknowns/unobservables
3. rating agencies “spun uncertainty into what looked and felt
like risk” (29)
4. result is to produce precise but wholly inaccurate predictions
(courtesy “big data”)
8. example: housing bubble, bank crash, con’t
other actors: homeowners, consumers, banks, economists/policymakers
◮ homeowners: 2003 survey believe housing prices ↑ 13% per year
(over 100 years ending in 1996: < 1%)
◮ banks: leverage
◮ Bear Stearns bail out: overnight repo market frozen due to
change in value of collateral
◮ Lehmann Bros leverage ratio 33:1, decline of 3% in financial
positions → negative equity, bankruptcy
◮ consumers: leverage. wages stagnant, treat homes as ATMs
◮ economists (Romer, Summers, etc.): overestimate the soundness of
financial system, underestimate ripple effects in economy
(unemployment)
9. all of these things are just like the others
◮ what do forecasting failures have in common?
◮ they all failed to take into account a critical piece of context
◮ example: confidence in ⇑ in housing prices stems from trends
in recent prices, but housing prices had never risen this
rapidly before
◮ in short, events forecasters were considering were out of
sample
◮ so, courtesy “big data”, the forecasts (especially of banks and
rating agencies) were very precise (big N), but wildly
inaccurate
10. punditry: or the political entertainment media complex
◮ political pundits basically get it right about 50% of the time
◮ this is not awesome: you could flip a coin and do about as
well as any political pundit on tv
◮ two kinds of predictors: hedgehogs and foxes (Philip Tetlock:
psychology and political science)
◮ foxes: “scrappy creatures who believe in a plethora of little
ideas and in taking a multitude of approaches toward a
problem” (53)
◮ hedgehogs: type A personalities that believe in Big Ideas–in
governing principles that act as physical laws
11. foxes and hedgehogs
Table : foxes vs. hedgehogs
foxes hedgehogs
multidisciplinary specialized
adaptable stalwart
self-critical stubborn
tolerant of complexity order-seeking
cautious confident
empirical ideological
better forecasters better tv guests
obviously (?), it’s better to be foxy
12. other examples
◮ earthquakes: impossible (though not impossible to do better than
some)
◮ weather: real success story in the last 30 years
◮ two challenges
◮ dynamic system (good understanding of laws of motion)
◮ non-linear (exponential changes)
◮ ⇒ small changes in decimal places have big effects on outputs
◮ in weather, don’t always have incentives for accurate prediction
◮ calibration comparisons TWC, AccuWeather, Local quite different,
local do a lot worse
◮ a lot stronger ”wet bias” in local forecasts–perception of accuracy
more important than accuracy
13. how to drown in 3 feet of water: listen to an economist
◮ economists bad at communicating uncertainty of forecasts
◮ a study of survey of professional forecasters: GDP growth fell out of
prediction interval 50% of the time
◮ biased toward overconfidence
◮ why are they so not awesome?
◮ cause/effect are often reversed: nothing is predictive over time
◮ dynamic system: things that matter change
◮ economic data is very noisy
◮ like weather: dynamic, uncertain initial conditions
◮ biased forecasts are rational: no skin in the game–no market for
accurate forecast
◮ difference b/w meaningful and “dumb data” forecast: Jan
Hatzius/ECRI (196)
14. bayesian reasoning: how to be less wrong
◮ studied judgement and a meaningful model
◮ model of dgp–i.e. theory (this means you, economists)
◮ statistical model that accounts for uncertainty (not just
measurement error) in parameters
◮ Bayes’ Rule: for any event
p(B|A) ∗ p(A)
p(A|B) = (1)
p(B)
where p denotes probability
conditional∗prior
◮ posterior = marginal
◮ does not require metaphysical uncertainty, only epistemological
uncertainty
15. bayes’ rule: a simple example
◮ example given in Silver (245):
◮ prevalence of breast cancer for women in their 40’s: .014
◮ p(test = 1|cancer = 0) = .10
◮ p(test = 1|cancer = 1) = .75
◮ 42 year old woman, positive test, what is probability of cancer?
◮ p(cancer = 1|test = 1) =
p(test=1|cancer =1)∗p(cancer =1)
p(test=1|cancer =1)∗p(cancer =1)+p(test=1|cancer =0)∗p(cancer =0)
◮ p(cancer = 1|test = 1) =
.75∗.014+.10∗.986 = .096
.75∗.014
◮ with this example: low prior (for this population); relatively low
sensitivity (true positive); relatively high false positive
16. bayes and fisher: statistics smackdown!
◮ Bayesian vs. Fisherian (frequentist) methods–fear of the
“subjectivity” of the prior
◮ really interesting history McGrayne, The Theory that Would Not
Die
◮ Air France flight 447, July 2009, not found in 18 months
◮ Metron, US consulting firm, using bayesian search method, 7
days
◮ Silver: frequentist methods keep him/her hermetically sealed off
from the world: “discourage the researcher from considering the
underlying context or plausibility of his hypothesis, something that
Bayesian method demands in the form of the prior probability” (253)
◮ Fisher late in his life argued against research showing a that smoking
caused lung cancer, arguing that lung cancer caused smoking
◮ many reasons for this, among them insistence of “objective purity”
of the experiment
◮ prediction is inherently subjective, but can be made rigourous
◮ can be less irrational, less subjective, less wrong
17. examples
◮ chess: Deep Blue vs. Kasparov (chess is “Bayesian process”)
◮ poker: getting a few things right can go a long way (the Pareto
principle)
◮ this is true of other disciplines
◮ right data
◮ right technology
◮ right incentives–i.e. you need to care about accuracy
◮ when a field is competitive, then a lot of extra effort is needed
to get to the margins of profitability
◮ Silver was lucky in baseball, political forecasting, poker
◮ people have begun to copy 538 blog; copied PECOTA; never really
made it in poker
18. a trip to bayesland
◮ in bayesland, you walk around with your prediction about events on
a signboard: obama re-elected, lindsey lohan re-arrested, nadal wins
wimbledon, ...
◮ if you meet someone with different estimates, either: 1. come to
consensus, or 2. place a bet
◮ this kind of thinking is crucial to decision theory, which represents
the rigourous application of probability to all decisions
◮ the primary question: how to avoid being a sure loser (Kadane,
Principles of Uncertainty )
◮ pay close attention consensus forecasts–no free lunch in efficient
markets–trying to bet against this will bring trouble
◮ usually only works when people are forecasting independently, not
based on one another’s forecasts.
◮ also may not work when you are using other people’s money–which
is almost always true today–Abacus GS–herd behavior
◮ bayesian reasoning will not get you out of a bubble if all of your
prior information is unreliable (GIGO)
19. bayes’ rule: single/multi-parameter models
◮ this can be applied to estimating parameters for any statistical
model
◮ from Bayesian perspective, parameters are random variables, not
fixed quantities we try to estimate
◮ for set of parameters
p(data|θ) ∗ p(θ)
p(θ|data) = (2)
p(data)
p(θ|data) ∝ p(data|θ) ∗ p(θ)
posterior ∝ likelihood*prior
20. bayes’ rule: a canonical example
◮ canonical example: estimation of a population proportion, p
◮ let (X1 , X2 ...Xn ) be independent binary variables
◮ let y = i Xi ∼ BR(n, p)
f (y|p)∗f (p)
◮ goal: estimate posterior of p, f (p|y ) = f (y)
◮ prior for p: Be(A, B) (to make this simple, this is a conjugate form)
Γ(n+A+B) (y+A)−1
◮ f (p|y ) = Γ(y+A)Γ(n−y+B) p (1 − p)n−y+B−1
y+A y
◮ Bayesian p = A+B+n = ML: p = n
21. bayes’ rule: a canonical example
MLE
Bayes
15
density
10
5
0
0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p
Figure : Bayes and MLE Estimates of Proportion
22. bayes’ rule: an empirical example
◮ chorus of people saying we should restrict SNAP benefits, no SSBs
◮ take simple counterexamples: cigarettes and alcohol
◮ goal: estimate posterior of µ for SNAP participants and
non-particpants
◮ likelihood: poisson; prior for µ: G (α, β) (to make this simple, this is
a conjugate form)
1 ′α′ α′ −1
◮ f (µ|y ) = Γ(α′ ) β µ exp(−β ′ µ)
◮ α′ = α + i yi ; β ′ = β + n
◮ data: NHIS 2011
23. bayes’ rule: an empirical example
Cigarettes per Day
0.12
0.12
0.08
0.08
Density
Density
0.04
0.04
0.00
0 10 20 30 40 50 60 0.00 0 20 40 60
SNAP Participants Non−Participants
Figure : Cigarette Consumption and SNAP Participation
24. bayes’ rule: an empirical example
Drinks per Day
0.4
0.15
0.3
Density
Density
0.10
0.2
0.05
0.1
0.00
0.0
0 5 10 20 30 0 20 40 60 80
SNAP Participants Non−Participants
Figure : Alcohol Consumption and SNAP Participation
25. bayes’ rule: an empirical example
non−participants snap participants
8
8
6
6
posterior density
posterior density
4
4
2
2
0
0
6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 5.7 5.8 5.9 6.0 6.1
cigarettes per day cigarettes per day
Figure : Posterior Mean of # of Cigarettes/Day: Gamma-Poisson
26. bayes’ rule: an empirical example
non−participants snap participants
30
20
25
20
15
posterior density
posterior density
15
10
10
5
5
0
0
1.80 1.85 1.90 1.00 1.02 1.04 1.06 1.08 1.10
drinks per day drinks per day
Figure : Posterior Mean of # of Drinks/Day: Gamma-Poisson
27. bayes’ rule: an empirical example
Table : ML and Bayes’ Estimates of Mean
No SNAP SNAP
ML Bayes ML Bayes N
cigarettes 6.31 6.34 5.89 5.86 2,737
drinks 1.85 1.85 1.05 1.07 5,613
◮ GLM (µ = xi β) with poisson requires MCMC, no conjugate
analysis possible
28. what’s the frequency, Nate? (i.e., so?)
◮ Silver: prior is a statement of assumptions–clearing of the air
◮ bayesian reasoning is about being careful about understanding and
declaring one’s model
◮ would Countrywide have been able to declare their prior on home
price trends to exclude < 0? if they had, would it have made a
difference?
◮ even with relatively small N, the “answers” are the same
◮ for econometricians (very different point of view from statistician),
question is whether different interpretations different enough to
warrant extra work
◮ probably don’t need bayes’ rule to say: be careful of your priors,
check your model for
◮ how well it works–calibration
◮ how robust it is to alternative assumptions
◮ where it matters: likelihood doesn’t converge, missing data
(includes latent variables)