Signal and noise

On The Signal and the Noise: Why So Many
Predictions Fail, But Some Don’t

Christian Gregory

February 12, 2013

declaring my priors

Am I an unbiased estimator?

who is Nate Silver?

◮ developed PECOTA, Pitcher Empirical Comparison and
Optimization Test Algorithm
◮ predict performance of pitchers, expanded to hitters
◮ purchased by Baseball Prospectus
◮ 2003-2008 matched or bettered commercial baseball forecasting
systems, Vegas over-under lines
◮ 538 blog–political predictions for NYT
◮ ex-poker player
◮ Out magazine’s person of the year, 2012

motivation

◮ two contexts: “big data” and failure of prediction
◮ “big data”: IBM–2.5 quintillion bytes of data per day
◮ failures of prediction
banking crisis/housing bubble
tv pundits
baseball
weather ...
◮ test of theory–falsiﬁability (Popper)
◮ complex theories diﬃcult to falsify–Popper: they have little value
◮ predictions about simple things make us suspect complex models
◮ Silver: nonetheless, these theories have value–new attitude toward
prediction: Bayes’ Rule

today

◮ go through some of Silver’s examples–failures, sucesses
◮ outline solution: Bayes’ Rule
◮ limitations: example using “fairly big” data
◮ conclusion

example: housing bubble, bank crash, great recession

actor: rating agencies
◮ S&P, Moody’s, other rating agencies got it very wrong
◮ predicted default on AAA-rated CDO’s: .12%; actual: 28%
◮ why did they do such poor modeling job?
1. rating agencies part of a legal oligopoly
2. many pension funds require rating by Moody’s/S&P to buy
any piece of debt
3. Moody’s would rate debt “structured by cows”
4. no ﬁnancial incentive for predictions to have good frequentist
properties
5. all incentives were toward giving “bubble ratings”

example: housing bubble, bank crash, great recession

◮ what were problems with models?
1. assume that P(default) for mortgages bundled in CDO’s are
uncorrelated
2. confuse risk and uncertainty
◮ risk – quantiﬁable likelihood of upside or downside
event–something you can put a price on; the point of
decision-making is to reduce the risk of being a sure loser
◮ uncertainty – requires assigning distribution to
unknowns/unobservables
3. rating agencies “spun uncertainty into what looked and felt
like risk” (29)
4. result is to produce precise but wholly inaccurate predictions
(courtesy “big data”)

example: housing bubble, bank crash, con’t

other actors: homeowners, consumers, banks, economists/policymakers
◮ homeowners: 2003 survey believe housing prices ↑ 13% per year
(over 100 years ending in 1996: < 1%)
◮ banks: leverage
◮ Bear Stearns bail out: overnight repo market frozen due to
change in value of collateral
◮ Lehmann Bros leverage ratio 33:1, decline of 3% in financial
positions → negative equity, bankruptcy
◮ consumers: leverage. wages stagnant, treat homes as ATMs
◮ economists (Romer, Summers, etc.): overestimate the soundness of
financial system, underestimate ripple effects in economy
(unemployment)

all of these things are just like the others

◮ what do forecasting failures have in common?
◮ they all failed to take into account a critical piece of context
◮ example: conﬁdence in ⇑ in housing prices stems from trends
in recent prices, but housing prices had never risen this
rapidly before
◮ in short, events forecasters were considering were out of
sample
◮ so, courtesy “big data”, the forecasts (especially of banks and
rating agencies) were very precise (big N), but wildly
inaccurate

punditry: or the political entertainment media complex

◮ political pundits basically get it right about 50% of the time
◮ this is not awesome: you could ﬂip a coin and do about as
well as any political pundit on tv
◮ two kinds of predictors: hedgehogs and foxes (Philip Tetlock:
psychology and political science)
◮ foxes: “scrappy creatures who believe in a plethora of little
ideas and in taking a multitude of approaches toward a
problem” (53)
◮ hedgehogs: type A personalities that believe in Big Ideas–in
governing principles that act as physical laws

foxes and hedgehogs

Table : foxes vs. hedgehogs
foxes hedgehogs
multidisciplinary specialized
adaptable stalwart
self-critical stubborn
tolerant of complexity order-seeking
cautious conﬁdent
empirical ideological
better forecasters better tv guests

obviously (?), it’s better to be foxy

other examples

◮ earthquakes: impossible (though not impossible to do better than
some)
◮ weather: real success story in the last 30 years
◮ two challenges
◮ dynamic system (good understanding of laws of motion)
◮ non-linear (exponential changes)
◮ ⇒ small changes in decimal places have big eﬀects on outputs
◮ in weather, don’t always have incentives for accurate prediction
◮ calibration comparisons TWC, AccuWeather, Local quite diﬀerent,
local do a lot worse
◮ a lot stronger ”wet bias” in local forecasts–perception of accuracy
more important than accuracy

how to drown in 3 feet of water: listen to an economist

◮ economists bad at communicating uncertainty of forecasts
◮ a study of survey of professional forecasters: GDP growth fell out of
prediction interval 50% of the time
◮ biased toward overconfidence
◮ why are they so not awesome?
◮ cause/effect are often reversed: nothing is predictive over time
◮ dynamic system: things that matter change
◮ economic data is very noisy
◮ like weather: dynamic, uncertain initial conditions
◮ biased forecasts are rational: no skin in the game–no market for
accurate forecast
◮ difference b/w meaningful and “dumb data” forecast: Jan
Hatzius/ECRI (196)

bayesian reasoning: how to be less wrong

◮ studied judgement and a meaningful model
◮ model of dgp–i.e. theory (this means you, economists)
◮ statistical model that accounts for uncertainty (not just
measurement error) in parameters
◮ Bayes’ Rule: for any event

p(B|A) ∗ p(A)
p(A|B) = (1)
p(B)

where p denotes probability
conditional∗prior
◮ posterior = marginal
◮ does not require metaphysical uncertainty, only epistemological
uncertainty

bayes’ rule: a simple example

◮ example given in Silver (245):
◮ prevalence of breast cancer for women in their 40’s: .014
◮ p(test = 1|cancer = 0) = .10
◮ p(test = 1|cancer = 1) = .75
◮ 42 year old woman, positive test, what is probability of cancer?
◮ p(cancer = 1|test = 1) =
p(test=1|cancer =1)∗p(cancer =1)
p(test=1|cancer =1)∗p(cancer =1)+p(test=1|cancer =0)∗p(cancer =0)
◮ p(cancer = 1|test = 1) =
.75∗.014+.10∗.986 = .096
.75∗.014

◮ with this example: low prior (for this population); relatively low
sensitivity (true positive); relatively high false positive

bayes and fisher: statistics smackdown!
◮ Bayesian vs. Fisherian (frequentist) methods–fear of the
“subjectivity” of the prior
◮ really interesting history McGrayne, The Theory that Would Not
Die
◮ Air France flight 447, July 2009, not found in 18 months

◮ Metron, US consulting firm, using bayesian search method, 7

days
◮ Silver: frequentist methods keep him/her hermetically sealed off
from the world: “discourage the researcher from considering the
underlying context or plausibility of his hypothesis, something that
Bayesian method demands in the form of the prior probability” (253)
◮ Fisher late in his life argued against research showing a that smoking
caused lung cancer, arguing that lung cancer caused smoking
◮ many reasons for this, among them insistence of “objective purity”
of the experiment
◮ prediction is inherently subjective, but can be made rigourous
◮ can be less irrational, less subjective, less wrong

examples

◮ chess: Deep Blue vs. Kasparov (chess is “Bayesian process”)
◮ poker: getting a few things right can go a long way (the Pareto
principle)
◮ this is true of other disciplines
◮ right data
◮ right technology
◮ right incentives–i.e. you need to care about accuracy
◮ when a field is competitive, then a lot of extra effort is needed
to get to the margins of profitability
◮ Silver was lucky in baseball, political forecasting, poker
◮ people have begun to copy 538 blog; copied PECOTA; never really
made it in poker

a trip to bayesland
◮ in bayesland, you walk around with your prediction about events on
a signboard: obama re-elected, lindsey lohan re-arrested, nadal wins
wimbledon, ...
◮ if you meet someone with diﬀerent estimates, either: 1. come to
consensus, or 2. place a bet
◮ this kind of thinking is crucial to decision theory, which represents
the rigourous application of probability to all decisions
◮ the primary question: how to avoid being a sure loser (Kadane,
Principles of Uncertainty )
◮ pay close attention consensus forecasts–no free lunch in eﬃcient
markets–trying to bet against this will bring trouble
◮ usually only works when people are forecasting independently, not
based on one another’s forecasts.
◮ also may not work when you are using other people’s money–which
is almost always true today–Abacus GS–herd behavior
◮ bayesian reasoning will not get you out of a bubble if all of your
prior information is unreliable (GIGO)

bayes’ rule: single/multi-parameter models

◮ this can be applied to estimating parameters for any statistical
model
◮ from Bayesian perspective, parameters are random variables, not
ﬁxed quantities we try to estimate
◮ for set of parameters

p(data|θ) ∗ p(θ)
p(θ|data) = (2)
p(data)
p(θ|data) ∝ p(data|θ) ∗ p(θ)
posterior ∝ likelihood*prior

bayes’ rule: a canonical example

◮ canonical example: estimation of a population proportion, p
◮ let (X1 , X2 ...Xn ) be independent binary variables
◮ let y = i Xi ∼ BR(n, p)
f (y|p)∗f (p)
◮ goal: estimate posterior of p, f (p|y ) = f (y)

◮ prior for p: Be(A, B) (to make this simple, this is a conjugate form)
Γ(n+A+B) (y+A)−1
◮ f (p|y ) = Γ(y+A)Γ(n−y+B) p (1 − p)n−y+B−1
y+A y
◮ Bayesian p = A+B+n = ML: p = n

bayes’ rule: a canonical example

MLE
Bayes
15
density
10
5
0

0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
p

Figure : Bayes and MLE Estimates of Proportion

bayes’ rule: an empirical example

◮ chorus of people saying we should restrict SNAP beneﬁts, no SSBs
◮ take simple counterexamples: cigarettes and alcohol
◮ goal: estimate posterior of µ for SNAP participants and
non-particpants
◮ likelihood: poisson; prior for µ: G (α, β) (to make this simple, this is
a conjugate form)
1 ′α′ α′ −1
◮ f (µ|y ) = Γ(α′ ) β µ exp(−β ′ µ)
◮ α′ = α + i yi ; β ′ = β + n
◮ data: NHIS 2011


Cigarettes per Day

0.12
0.12

0.08
0.08
Density

Density
0.04
0.04
0.00

0 10 20 30 40 50 60 0.00 0 20 40 60
SNAP Participants Non−Participants

Figure : Cigarette Consumption and SNAP Participation


Drinks per Day

0.4

0.15
0.3
Density

Density
0.10
0.2

0.05
0.1

0.00
0.0

0 5 10 20 30 0 20 40 60 80
SNAP Participants Non−Participants

Figure : Alcohol Consumption and SNAP Participation


non−participants snap participants
8

8
6

6
posterior density

posterior density
4

4
2

2
0
0

6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 5.7 5.8 5.9 6.0 6.1

cigarettes per day cigarettes per day

Figure : Posterior Mean of # of Cigarettes/Day: Gamma-Poisson


non−participants snap participants

30
20

25
20
15
posterior density

posterior density

15
10

10
5

5
0

0

1.80 1.85 1.90 1.00 1.02 1.04 1.06 1.08 1.10

drinks per day drinks per day

Figure : Posterior Mean of # of Drinks/Day: Gamma-Poisson


Table : ML and Bayes’ Estimates of Mean

No SNAP SNAP
ML Bayes ML Bayes N
cigarettes 6.31 6.34 5.89 5.86 2,737
drinks 1.85 1.85 1.05 1.07 5,613

◮ GLM (µ = xi β) with poisson requires MCMC, no conjugate
analysis possible

what’s the frequency, Nate? (i.e., so?)
◮ Silver: prior is a statement of assumptions–clearing of the air
◮ bayesian reasoning is about being careful about understanding and
declaring one’s model
◮ would Countrywide have been able to declare their prior on home
price trends to exclude < 0? if they had, would it have made a
difference?
◮ even with relatively small N, the “answers” are the same
◮ for econometricians (very different point of view from statistician),
question is whether different interpretations different enough to
warrant extra work
◮ probably don’t need bayes’ rule to say: be careful of your priors,
check your model for
◮ how well it works–calibration
◮ how robust it is to alternative assumptions

◮ where it matters: likelihood doesn’t converge, missing data
(includes latent variables)

Signal and noise

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to Signal and noise

Similar to Signal and noise (20)

More from christiangregory

More from christiangregory (7)

Signal and noise