Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On The Signal and the Noise: Why So Many    Predictions Fail, But Some Don’t             Christian Gregory             Feb...
declaring my priors   Am I an unbiased estimator?
who is Nate Silver?     ◮   developed PECOTA, Pitcher Empirical Comparison and         Optimization Test Algorithm     ◮  ...
motivation    ◮   two contexts: “big data” and failure of prediction    ◮   “big data”: IBM–2.5 quintillion bytes of data ...
today    ◮   go through some of Silver’s examples–failures, sucesses    ◮   outline solution: Bayes’ Rule    ◮   limitatio...
example: housing bubble, bank crash, great recession   actor: rating agencies     ◮   S&P, Moody’s, other rating agencies ...
example: housing bubble, bank crash, great recession     ◮   what were problems with models?          1. assume that P(def...
example: housing bubble, bank crash, con’t   other actors: homeowners, consumers, banks, economists/policymakers     ◮   h...
all of these things are just like the others     ◮   what do forecasting failures have in common?     ◮   they all failed ...
punditry: or the political entertainment media complex     ◮   political pundits basically get it right about 50% of the t...
foxes and hedgehogs                           Table : foxes vs. hedgehogs                    foxes                    hedg...
other examples    ◮   earthquakes: impossible (though not impossible to do better than        some)    ◮   weather: real s...
how to drown in 3 feet of water: listen to an economist     ◮   economists bad at communicating uncertainty of forecasts  ...
bayesian reasoning: how to be less wrong    ◮   studied judgement and a meaningful model          ◮   model of dgp–i.e. th...
bayes’ rule: a simple example     ◮   example given in Silver (245):           ◮   prevalence of breast cancer for women i...
bayes and fisher: statistics smackdown!    ◮   Bayesian vs. Fisherian (frequentist) methods–fear of the        “subjectivit...
examples    ◮   chess: Deep Blue vs. Kasparov (chess is “Bayesian process”)    ◮   poker: getting a few things right can g...
a trip to bayesland     ◮   in bayesland, you walk around with your prediction about events on         a signboard: obama ...
bayes’ rule: single/multi-parameter models     ◮   this can be applied to estimating parameters for any statistical       ...
bayes’ rule: a canonical example     ◮   canonical example: estimation of a population proportion, p     ◮   let (X1 , X2 ...
bayes’ rule: a canonical example                                 MLE                                 Bayes                ...
bayes’ rule: an empirical example     ◮   chorus of people saying we should restrict SNAP benefits, no SSBs     ◮   take si...
bayes’ rule: an empirical example                                                     Cigarettes per Day                  ...
bayes’ rule: an empirical example                                                   Drinks per Day                     0.4...
bayes’ rule: an empirical example                                             non−participants                            ...
bayes’ rule: an empirical example                                        non−participants                                 ...
bayes’ rule: an empirical example                 Table : ML and Bayes’ Estimates of Mean                           No SNA...
what’s the frequency, Nate? (i.e., so?)     ◮   Silver: prior is a statement of assumptions–clearing of the air     ◮   ba...
Upcoming SlideShare
Loading in …5
×

Signal and noise

1,004 views

Published on

  • Be the first to comment

  • Be the first to like this

Signal and noise

  1. 1. On The Signal and the Noise: Why So Many Predictions Fail, But Some Don’t Christian Gregory February 12, 2013
  2. 2. declaring my priors Am I an unbiased estimator?
  3. 3. who is Nate Silver? ◮ developed PECOTA, Pitcher Empirical Comparison and Optimization Test Algorithm ◮ predict performance of pitchers, expanded to hitters ◮ purchased by Baseball Prospectus ◮ 2003-2008 matched or bettered commercial baseball forecasting systems, Vegas over-under lines ◮ 538 blog–political predictions for NYT ◮ ex-poker player ◮ Out magazine’s person of the year, 2012
  4. 4. motivation ◮ two contexts: “big data” and failure of prediction ◮ “big data”: IBM–2.5 quintillion bytes of data per day ◮ failures of prediction banking crisis/housing bubble tv pundits baseball weather ... ◮ test of theory–falsifiability (Popper) ◮ complex theories difficult to falsify–Popper: they have little value ◮ predictions about simple things make us suspect complex models ◮ Silver: nonetheless, these theories have value–new attitude toward prediction: Bayes’ Rule
  5. 5. today ◮ go through some of Silver’s examples–failures, sucesses ◮ outline solution: Bayes’ Rule ◮ limitations: example using “fairly big” data ◮ conclusion
  6. 6. example: housing bubble, bank crash, great recession actor: rating agencies ◮ S&P, Moody’s, other rating agencies got it very wrong ◮ predicted default on AAA-rated CDO’s: .12%; actual: 28% ◮ why did they do such poor modeling job? 1. rating agencies part of a legal oligopoly 2. many pension funds require rating by Moody’s/S&P to buy any piece of debt 3. Moody’s would rate debt “structured by cows” 4. no financial incentive for predictions to have good frequentist properties 5. all incentives were toward giving “bubble ratings”
  7. 7. example: housing bubble, bank crash, great recession ◮ what were problems with models? 1. assume that P(default) for mortgages bundled in CDO’s are uncorrelated 2. confuse risk and uncertainty ◮ risk – quantifiable likelihood of upside or downside event–something you can put a price on; the point of decision-making is to reduce the risk of being a sure loser ◮ uncertainty – requires assigning distribution to unknowns/unobservables 3. rating agencies “spun uncertainty into what looked and felt like risk” (29) 4. result is to produce precise but wholly inaccurate predictions (courtesy “big data”)
  8. 8. example: housing bubble, bank crash, con’t other actors: homeowners, consumers, banks, economists/policymakers ◮ homeowners: 2003 survey believe housing prices ↑ 13% per year (over 100 years ending in 1996: < 1%) ◮ banks: leverage ◮ Bear Stearns bail out: overnight repo market frozen due to change in value of collateral ◮ Lehmann Bros leverage ratio 33:1, decline of 3% in financial positions → negative equity, bankruptcy ◮ consumers: leverage. wages stagnant, treat homes as ATMs ◮ economists (Romer, Summers, etc.): overestimate the soundness of financial system, underestimate ripple effects in economy (unemployment)
  9. 9. all of these things are just like the others ◮ what do forecasting failures have in common? ◮ they all failed to take into account a critical piece of context ◮ example: confidence in ⇑ in housing prices stems from trends in recent prices, but housing prices had never risen this rapidly before ◮ in short, events forecasters were considering were out of sample ◮ so, courtesy “big data”, the forecasts (especially of banks and rating agencies) were very precise (big N), but wildly inaccurate
  10. 10. punditry: or the political entertainment media complex ◮ political pundits basically get it right about 50% of the time ◮ this is not awesome: you could flip a coin and do about as well as any political pundit on tv ◮ two kinds of predictors: hedgehogs and foxes (Philip Tetlock: psychology and political science) ◮ foxes: “scrappy creatures who believe in a plethora of little ideas and in taking a multitude of approaches toward a problem” (53) ◮ hedgehogs: type A personalities that believe in Big Ideas–in governing principles that act as physical laws
  11. 11. foxes and hedgehogs Table : foxes vs. hedgehogs foxes hedgehogs multidisciplinary specialized adaptable stalwart self-critical stubborn tolerant of complexity order-seeking cautious confident empirical ideological better forecasters better tv guests obviously (?), it’s better to be foxy
  12. 12. other examples ◮ earthquakes: impossible (though not impossible to do better than some) ◮ weather: real success story in the last 30 years ◮ two challenges ◮ dynamic system (good understanding of laws of motion) ◮ non-linear (exponential changes) ◮ ⇒ small changes in decimal places have big effects on outputs ◮ in weather, don’t always have incentives for accurate prediction ◮ calibration comparisons TWC, AccuWeather, Local quite different, local do a lot worse ◮ a lot stronger ”wet bias” in local forecasts–perception of accuracy more important than accuracy
  13. 13. how to drown in 3 feet of water: listen to an economist ◮ economists bad at communicating uncertainty of forecasts ◮ a study of survey of professional forecasters: GDP growth fell out of prediction interval 50% of the time ◮ biased toward overconfidence ◮ why are they so not awesome? ◮ cause/effect are often reversed: nothing is predictive over time ◮ dynamic system: things that matter change ◮ economic data is very noisy ◮ like weather: dynamic, uncertain initial conditions ◮ biased forecasts are rational: no skin in the game–no market for accurate forecast ◮ difference b/w meaningful and “dumb data” forecast: Jan Hatzius/ECRI (196)
  14. 14. bayesian reasoning: how to be less wrong ◮ studied judgement and a meaningful model ◮ model of dgp–i.e. theory (this means you, economists) ◮ statistical model that accounts for uncertainty (not just measurement error) in parameters ◮ Bayes’ Rule: for any event p(B|A) ∗ p(A) p(A|B) = (1) p(B) where p denotes probability conditional∗prior ◮ posterior = marginal ◮ does not require metaphysical uncertainty, only epistemological uncertainty
  15. 15. bayes’ rule: a simple example ◮ example given in Silver (245): ◮ prevalence of breast cancer for women in their 40’s: .014 ◮ p(test = 1|cancer = 0) = .10 ◮ p(test = 1|cancer = 1) = .75 ◮ 42 year old woman, positive test, what is probability of cancer? ◮ p(cancer = 1|test = 1) = p(test=1|cancer =1)∗p(cancer =1) p(test=1|cancer =1)∗p(cancer =1)+p(test=1|cancer =0)∗p(cancer =0) ◮ p(cancer = 1|test = 1) = .75∗.014+.10∗.986 = .096 .75∗.014 ◮ with this example: low prior (for this population); relatively low sensitivity (true positive); relatively high false positive
  16. 16. bayes and fisher: statistics smackdown! ◮ Bayesian vs. Fisherian (frequentist) methods–fear of the “subjectivity” of the prior ◮ really interesting history McGrayne, The Theory that Would Not Die ◮ Air France flight 447, July 2009, not found in 18 months ◮ Metron, US consulting firm, using bayesian search method, 7 days ◮ Silver: frequentist methods keep him/her hermetically sealed off from the world: “discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that Bayesian method demands in the form of the prior probability” (253) ◮ Fisher late in his life argued against research showing a that smoking caused lung cancer, arguing that lung cancer caused smoking ◮ many reasons for this, among them insistence of “objective purity” of the experiment ◮ prediction is inherently subjective, but can be made rigourous ◮ can be less irrational, less subjective, less wrong
  17. 17. examples ◮ chess: Deep Blue vs. Kasparov (chess is “Bayesian process”) ◮ poker: getting a few things right can go a long way (the Pareto principle) ◮ this is true of other disciplines ◮ right data ◮ right technology ◮ right incentives–i.e. you need to care about accuracy ◮ when a field is competitive, then a lot of extra effort is needed to get to the margins of profitability ◮ Silver was lucky in baseball, political forecasting, poker ◮ people have begun to copy 538 blog; copied PECOTA; never really made it in poker
  18. 18. a trip to bayesland ◮ in bayesland, you walk around with your prediction about events on a signboard: obama re-elected, lindsey lohan re-arrested, nadal wins wimbledon, ... ◮ if you meet someone with different estimates, either: 1. come to consensus, or 2. place a bet ◮ this kind of thinking is crucial to decision theory, which represents the rigourous application of probability to all decisions ◮ the primary question: how to avoid being a sure loser (Kadane, Principles of Uncertainty ) ◮ pay close attention consensus forecasts–no free lunch in efficient markets–trying to bet against this will bring trouble ◮ usually only works when people are forecasting independently, not based on one another’s forecasts. ◮ also may not work when you are using other people’s money–which is almost always true today–Abacus GS–herd behavior ◮ bayesian reasoning will not get you out of a bubble if all of your prior information is unreliable (GIGO)
  19. 19. bayes’ rule: single/multi-parameter models ◮ this can be applied to estimating parameters for any statistical model ◮ from Bayesian perspective, parameters are random variables, not fixed quantities we try to estimate ◮ for set of parameters p(data|θ) ∗ p(θ) p(θ|data) = (2) p(data) p(θ|data) ∝ p(data|θ) ∗ p(θ) posterior ∝ likelihood*prior
  20. 20. bayes’ rule: a canonical example ◮ canonical example: estimation of a population proportion, p ◮ let (X1 , X2 ...Xn ) be independent binary variables ◮ let y = i Xi ∼ BR(n, p) f (y|p)∗f (p) ◮ goal: estimate posterior of p, f (p|y ) = f (y) ◮ prior for p: Be(A, B) (to make this simple, this is a conjugate form) Γ(n+A+B) (y+A)−1 ◮ f (p|y ) = Γ(y+A)Γ(n−y+B) p (1 − p)n−y+B−1 y+A y ◮ Bayesian p = A+B+n = ML: p = n
  21. 21. bayes’ rule: a canonical example MLE Bayes 15 density 10 5 0 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 p Figure : Bayes and MLE Estimates of Proportion
  22. 22. bayes’ rule: an empirical example ◮ chorus of people saying we should restrict SNAP benefits, no SSBs ◮ take simple counterexamples: cigarettes and alcohol ◮ goal: estimate posterior of µ for SNAP participants and non-particpants ◮ likelihood: poisson; prior for µ: G (α, β) (to make this simple, this is a conjugate form) 1 ′α′ α′ −1 ◮ f (µ|y ) = Γ(α′ ) β µ exp(−β ′ µ) ◮ α′ = α + i yi ; β ′ = β + n ◮ data: NHIS 2011
  23. 23. bayes’ rule: an empirical example Cigarettes per Day 0.12 0.12 0.08 0.08 Density Density 0.04 0.04 0.00 0 10 20 30 40 50 60 0.00 0 20 40 60 SNAP Participants Non−Participants Figure : Cigarette Consumption and SNAP Participation
  24. 24. bayes’ rule: an empirical example Drinks per Day 0.4 0.15 0.3 Density Density 0.10 0.2 0.05 0.1 0.00 0.0 0 5 10 20 30 0 20 40 60 80 SNAP Participants Non−Participants Figure : Alcohol Consumption and SNAP Participation
  25. 25. bayes’ rule: an empirical example non−participants snap participants 8 8 6 6 posterior density posterior density 4 4 2 2 0 0 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 5.7 5.8 5.9 6.0 6.1 cigarettes per day cigarettes per day Figure : Posterior Mean of # of Cigarettes/Day: Gamma-Poisson
  26. 26. bayes’ rule: an empirical example non−participants snap participants 30 20 25 20 15 posterior density posterior density 15 10 10 5 5 0 0 1.80 1.85 1.90 1.00 1.02 1.04 1.06 1.08 1.10 drinks per day drinks per day Figure : Posterior Mean of # of Drinks/Day: Gamma-Poisson
  27. 27. bayes’ rule: an empirical example Table : ML and Bayes’ Estimates of Mean No SNAP SNAP ML Bayes ML Bayes N cigarettes 6.31 6.34 5.89 5.86 2,737 drinks 1.85 1.85 1.05 1.07 5,613 ◮ GLM (µ = xi β) with poisson requires MCMC, no conjugate analysis possible
  28. 28. what’s the frequency, Nate? (i.e., so?) ◮ Silver: prior is a statement of assumptions–clearing of the air ◮ bayesian reasoning is about being careful about understanding and declaring one’s model ◮ would Countrywide have been able to declare their prior on home price trends to exclude < 0? if they had, would it have made a difference? ◮ even with relatively small N, the “answers” are the same ◮ for econometricians (very different point of view from statistician), question is whether different interpretations different enough to warrant extra work ◮ probably don’t need bayes’ rule to say: be careful of your priors, check your model for ◮ how well it works–calibration ◮ how robust it is to alternative assumptions ◮ where it matters: likelihood doesn’t converge, missing data (includes latent variables)

×