SlideShare a Scribd company logo
1 of 28
Download to read offline
On The Signal and the Noise: Why So Many
    Predictions Fail, But Some Don’t

             Christian Gregory


             February 12, 2013
declaring my priors




   Am I an unbiased estimator?
who is Nate Silver?



     ◮   developed PECOTA, Pitcher Empirical Comparison and
         Optimization Test Algorithm
     ◮   predict performance of pitchers, expanded to hitters
     ◮   purchased by Baseball Prospectus
     ◮   2003-2008 matched or bettered commercial baseball forecasting
         systems, Vegas over-under lines
     ◮   538 blog–political predictions for NYT
     ◮   ex-poker player
     ◮   Out magazine’s person of the year, 2012
motivation


    ◮   two contexts: “big data” and failure of prediction
    ◮   “big data”: IBM–2.5 quintillion bytes of data per day
    ◮   failures of prediction
             banking crisis/housing bubble
             tv pundits
             baseball
             weather ...
    ◮   test of theory–falsifiability (Popper)
    ◮   complex theories difficult to falsify–Popper: they have little value
    ◮   predictions about simple things make us suspect complex models
    ◮   Silver: nonetheless, these theories have value–new attitude toward
        prediction: Bayes’ Rule
today




    ◮   go through some of Silver’s examples–failures, sucesses
    ◮   outline solution: Bayes’ Rule
    ◮   limitations: example using “fairly big” data
    ◮   conclusion
example: housing bubble, bank crash, great recession


   actor: rating agencies
     ◮   S&P, Moody’s, other rating agencies got it very wrong
     ◮   predicted default on AAA-rated CDO’s: .12%; actual: 28%
     ◮   why did they do such poor modeling job?
           1. rating agencies part of a legal oligopoly
           2. many pension funds require rating by Moody’s/S&P to buy
              any piece of debt
           3. Moody’s would rate debt “structured by cows”
           4. no financial incentive for predictions to have good frequentist
              properties
           5. all incentives were toward giving “bubble ratings”
example: housing bubble, bank crash, great recession


     ◮   what were problems with models?
          1. assume that P(default) for mortgages bundled in CDO’s are
             uncorrelated
          2. confuse risk and uncertainty
                ◮   risk – quantifiable likelihood of upside or downside
                    event–something you can put a price on; the point of
                    decision-making is to reduce the risk of being a sure loser
                ◮   uncertainty – requires assigning distribution to
                    unknowns/unobservables
          3. rating agencies “spun uncertainty into what looked and felt
             like risk” (29)
          4. result is to produce precise but wholly inaccurate predictions
             (courtesy “big data”)
example: housing bubble, bank crash, con’t


   other actors: homeowners, consumers, banks, economists/policymakers
     ◮   homeowners: 2003 survey believe housing prices ↑ 13% per year
         (over 100 years ending in 1996: < 1%)
     ◮   banks: leverage
           ◮   Bear Stearns bail out: overnight repo market frozen due to
               change in value of collateral
           ◮   Lehmann Bros leverage ratio 33:1, decline of 3% in financial
               positions → negative equity, bankruptcy
     ◮   consumers: leverage. wages stagnant, treat homes as ATMs
     ◮   economists (Romer, Summers, etc.): overestimate the soundness of
         financial system, underestimate ripple effects in economy
         (unemployment)
all of these things are just like the others



     ◮   what do forecasting failures have in common?
     ◮   they all failed to take into account a critical piece of context
     ◮   example: confidence in ⇑ in housing prices stems from trends
         in recent prices, but housing prices had never risen this
         rapidly before
     ◮   in short, events forecasters were considering were out of
         sample
     ◮   so, courtesy “big data”, the forecasts (especially of banks and
         rating agencies) were very precise (big N), but wildly
         inaccurate
punditry: or the political entertainment media complex



     ◮   political pundits basically get it right about 50% of the time
     ◮   this is not awesome: you could flip a coin and do about as
         well as any political pundit on tv
     ◮   two kinds of predictors: hedgehogs and foxes (Philip Tetlock:
         psychology and political science)
           ◮   foxes: “scrappy creatures who believe in a plethora of little
               ideas and in taking a multitude of approaches toward a
               problem” (53)
           ◮   hedgehogs: type A personalities that believe in Big Ideas–in
               governing principles that act as physical laws
foxes and hedgehogs




                           Table : foxes vs. hedgehogs
                    foxes                    hedgehogs
                    multidisciplinary        specialized
                    adaptable                stalwart
                    self-critical            stubborn
                    tolerant of complexity   order-seeking
                    cautious                 confident
                    empirical                ideological
                    better forecasters       better tv guests

   obviously (?), it’s better to be foxy
other examples


    ◮   earthquakes: impossible (though not impossible to do better than
        some)
    ◮   weather: real success story in the last 30 years
    ◮   two challenges
          ◮   dynamic system (good understanding of laws of motion)
          ◮   non-linear (exponential changes)
          ◮   ⇒ small changes in decimal places have big effects on outputs
    ◮   in weather, don’t always have incentives for accurate prediction
    ◮   calibration comparisons TWC, AccuWeather, Local quite different,
        local do a lot worse
    ◮   a lot stronger ”wet bias” in local forecasts–perception of accuracy
        more important than accuracy
how to drown in 3 feet of water: listen to an economist

     ◮   economists bad at communicating uncertainty of forecasts
     ◮   a study of survey of professional forecasters: GDP growth fell out of
         prediction interval 50% of the time
     ◮   biased toward overconfidence
     ◮   why are they so not awesome?
           ◮   cause/effect are often reversed: nothing is predictive over time
           ◮   dynamic system: things that matter change
           ◮   economic data is very noisy
           ◮   like weather: dynamic, uncertain initial conditions
     ◮   biased forecasts are rational: no skin in the game–no market for
         accurate forecast
     ◮   difference b/w meaningful and “dumb data” forecast: Jan
         Hatzius/ECRI (196)
bayesian reasoning: how to be less wrong


    ◮   studied judgement and a meaningful model
          ◮   model of dgp–i.e. theory (this means you, economists)
          ◮   statistical model that accounts for uncertainty (not just
              measurement error) in parameters
    ◮   Bayes’ Rule: for any event

                                             p(B|A) ∗ p(A)
                                  p(A|B) =                                (1)
                                                 p(B)

        where p denotes probability
                      conditional∗prior
    ◮   posterior =      marginal
    ◮   does not require metaphysical uncertainty, only epistemological
        uncertainty
bayes’ rule: a simple example



     ◮   example given in Silver (245):
           ◮   prevalence of breast cancer for women in their 40’s: .014
           ◮   p(test = 1|cancer = 0) = .10
           ◮   p(test = 1|cancer = 1) = .75
           ◮   42 year old woman, positive test, what is probability of cancer?
           ◮   p(cancer = 1|test = 1) =
                              p(test=1|cancer =1)∗p(cancer =1)
             p(test=1|cancer =1)∗p(cancer =1)+p(test=1|cancer =0)∗p(cancer =0)
           ◮ p(cancer = 1|test = 1) =
                                              .75∗.014+.10∗.986 = .096
                                                   .75∗.014

     ◮   with this example: low prior (for this population); relatively low
         sensitivity (true positive); relatively high false positive
bayes and fisher: statistics smackdown!
    ◮   Bayesian vs. Fisherian (frequentist) methods–fear of the
        “subjectivity” of the prior
    ◮   really interesting history McGrayne, The Theory that Would Not
        Die
           ◮ Air France flight 447, July 2009, not found in 18 months

           ◮ Metron, US consulting firm, using bayesian search method, 7

              days
    ◮   Silver: frequentist methods keep him/her hermetically sealed off
        from the world: “discourage the researcher from considering the
        underlying context or plausibility of his hypothesis, something that
        Bayesian method demands in the form of the prior probability” (253)
    ◮   Fisher late in his life argued against research showing a that smoking
        caused lung cancer, arguing that lung cancer caused smoking
    ◮   many reasons for this, among them insistence of “objective purity”
        of the experiment
    ◮   prediction is inherently subjective, but can be made rigourous
    ◮   can be less irrational, less subjective, less wrong
examples


    ◮   chess: Deep Blue vs. Kasparov (chess is “Bayesian process”)
    ◮   poker: getting a few things right can go a long way (the Pareto
        principle)
    ◮   this is true of other disciplines
          ◮   right data
          ◮   right technology
          ◮   right incentives–i.e. you need to care about accuracy
          ◮   when a field is competitive, then a lot of extra effort is needed
              to get to the margins of profitability
    ◮   Silver was lucky in baseball, political forecasting, poker
    ◮   people have begun to copy 538 blog; copied PECOTA; never really
        made it in poker
a trip to bayesland
     ◮   in bayesland, you walk around with your prediction about events on
         a signboard: obama re-elected, lindsey lohan re-arrested, nadal wins
         wimbledon, ...
     ◮   if you meet someone with different estimates, either: 1. come to
         consensus, or 2. place a bet
     ◮   this kind of thinking is crucial to decision theory, which represents
         the rigourous application of probability to all decisions
     ◮   the primary question: how to avoid being a sure loser (Kadane,
         Principles of Uncertainty )
     ◮   pay close attention consensus forecasts–no free lunch in efficient
         markets–trying to bet against this will bring trouble
     ◮   usually only works when people are forecasting independently, not
         based on one another’s forecasts.
     ◮   also may not work when you are using other people’s money–which
         is almost always true today–Abacus GS–herd behavior
     ◮   bayesian reasoning will not get you out of a bubble if all of your
         prior information is unreliable (GIGO)
bayes’ rule: single/multi-parameter models



     ◮   this can be applied to estimating parameters for any statistical
         model
     ◮   from Bayesian perspective, parameters are random variables, not
         fixed quantities we try to estimate
     ◮   for set of parameters

                                            p(data|θ) ∗ p(θ)
                           p(θ|data) =                                      (2)
                                                 p(data)
                           p(θ|data) ∝     p(data|θ) ∗ p(θ)
                            posterior ∝    likelihood*prior
bayes’ rule: a canonical example



     ◮   canonical example: estimation of a population proportion, p
     ◮   let (X1 , X2 ...Xn ) be independent binary variables
     ◮   let y =      i   Xi ∼ BR(n, p)
                                                       f (y|p)∗f (p)
     ◮   goal: estimate posterior of p, f (p|y ) =          f (y)

     ◮   prior for p: Be(A, B) (to make this simple, this is a conjugate form)
                         Γ(n+A+B)      (y+A)−1
     ◮   f (p|y ) =   Γ(y+A)Γ(n−y+B) p         (1   − p)n−y+B−1
                             y+A                y
     ◮   Bayesian p =       A+B+n   = ML: p =   n
bayes’ rule: a canonical example



                                 MLE
                                 Bayes
                     15
           density
                     10
                     5
                     0




                          0.15      0.20   0.25   0.30       0.35   0.40   0.45   0.50
                                                         p



                 Figure : Bayes and MLE Estimates of Proportion
bayes’ rule: an empirical example



     ◮   chorus of people saying we should restrict SNAP benefits, no SSBs
     ◮   take simple counterexamples: cigarettes and alcohol
     ◮   goal: estimate posterior of µ for SNAP participants and
         non-particpants
     ◮   likelihood: poisson; prior for µ: G (α, β) (to make this simple, this is
         a conjugate form)
                        1      ′α′ α′ −1
     ◮   f (µ|y ) =   Γ(α′ ) β    µ      exp(−β ′ µ)
     ◮   α′ = α +       i   yi ; β ′ = β + n
     ◮   data: NHIS 2011
bayes’ rule: an empirical example


                                                     Cigarettes per Day




                                                                        0.12
                     0.12




                                                                        0.08
                     0.08
           Density




                                                              Density
                                                                        0.04
                     0.04
                     0.00




                            0   10   20   30   40   50   60             0.00   0   20     40     60
                                SNAP Participants                                  Non−Participants



         Figure : Cigarette Consumption and SNAP Participation
bayes’ rule: an empirical example


                                                   Drinks per Day


                     0.4




                                                                    0.15
                     0.3
           Density




                                                          Density
                                                                    0.10
                     0.2




                                                                    0.05
                     0.1




                                                                    0.00
                     0.0




                           0   5 10    20     30                           0   20   40   60   80
                               SNAP Participants                               Non−Participants



         Figure : Alcohol Consumption and SNAP Participation
bayes’ rule: an empirical example

                                             non−participants                                                               snap participants
                           8




                                                                                                            8
                           6




                                                                                                            6
       posterior density




                                                                                        posterior density
                           4




                                                                                                            4
                           2




                                                                                                            2
                                                                                                            0
                           0




                               6.15   6.20   6.25   6.30   6.35    6.40   6.45   6.50                           5.7   5.8          5.9           6.0   6.1

                                              cigarettes per day                                                            cigarettes per day




      Figure : Posterior Mean of # of Cigarettes/Day: Gamma-Poisson
bayes’ rule: an empirical example

                                        non−participants                                                 snap participants




                                                                                      30
                            20




                                                                                      25
                                                                                      20
                            15
        posterior density




                                                                  posterior density

                                                                                      15
                            10




                                                                                      10
                            5




                                                                                      5
                            0




                                                                                      0




                                 1.80           1.85       1.90                            1.00   1.02    1.04    1.06      1.08   1.10

                                          drinks per day                                                   drinks per day




       Figure : Posterior Mean of # of Drinks/Day: Gamma-Poisson
bayes’ rule: an empirical example




                 Table : ML and Bayes’ Estimates of Mean


                           No SNAP         SNAP
                          ML Bayes      ML Bayes       N
             cigarettes   6.31 6.34     5.89 5.86      2,737
             drinks       1.85 1.85     1.05 1.07      5,613

     ◮   GLM (µ = xi β) with poisson requires MCMC, no conjugate
         analysis possible
what’s the frequency, Nate? (i.e., so?)
     ◮   Silver: prior is a statement of assumptions–clearing of the air
     ◮   bayesian reasoning is about being careful about understanding and
         declaring one’s model
     ◮   would Countrywide have been able to declare their prior on home
         price trends to exclude < 0? if they had, would it have made a
         difference?
     ◮   even with relatively small N, the “answers” are the same
     ◮   for econometricians (very different point of view from statistician),
         question is whether different interpretations different enough to
         warrant extra work
     ◮   probably don’t need bayes’ rule to say: be careful of your priors,
         check your model for
            ◮ how well it works–calibration
            ◮ how robust it is to alternative assumptions

     ◮   where it matters: likelihood doesn’t converge, missing data
         (includes latent variables)

More Related Content

What's hot

Book summery fooled_by_randomness
Book summery fooled_by_randomnessBook summery fooled_by_randomness
Book summery fooled_by_randomnessAnuya Kadam
 
102 Creating Value with Real Options
102 Creating Value with Real Options102 Creating Value with Real Options
102 Creating Value with Real Optionsmjvandenplas
 
Barra Presentation
Barra PresentationBarra Presentation
Barra Presentationspgreiner
 
Thinking Fast & Slow presentation
Thinking Fast & Slow presentationThinking Fast & Slow presentation
Thinking Fast & Slow presentationLaure Parsons
 
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmg
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca CmgThe%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmg
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmgdahirf
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 

What's hot (6)

Book summery fooled_by_randomness
Book summery fooled_by_randomnessBook summery fooled_by_randomness
Book summery fooled_by_randomness
 
102 Creating Value with Real Options
102 Creating Value with Real Options102 Creating Value with Real Options
102 Creating Value with Real Options
 
Barra Presentation
Barra PresentationBarra Presentation
Barra Presentation
 
Thinking Fast & Slow presentation
Thinking Fast & Slow presentationThinking Fast & Slow presentation
Thinking Fast & Slow presentation
 
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmg
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca CmgThe%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmg
The%20 Minimum%20 Daily%20 Adult%20 %20 Ca Cmg
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 

Similar to Signal and noise

Applied bayesian statistics
Applied bayesian statisticsApplied bayesian statistics
Applied bayesian statisticsSpringer
 
D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1jemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountChester Chen
 
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...Daniel Katz
 
Emperors without clothes
Emperors without clothesEmperors without clothes
Emperors without clothesJohannes Meier
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causalityBayes Nets meetup London
 
Society for Risk Analysis On Transnational Risk & Terrorism
Society for Risk Analysis  On Transnational Risk & TerrorismSociety for Risk Analysis  On Transnational Risk & Terrorism
Society for Risk Analysis On Transnational Risk & TerrorismJohn Marke
 
Risk taking in Academic Libraries: The Implications of Prospect Theory
Risk taking in Academic Libraries: The Implications of Prospect TheoryRisk taking in Academic Libraries: The Implications of Prospect Theory
Risk taking in Academic Libraries: The Implications of Prospect TheoryTony Horava
 
American Bankers Association Risk Management Forum April 29, 2010 Tyler D. ...
American Bankers Association Risk Management Forum April 29, 2010   Tyler D. ...American Bankers Association Risk Management Forum April 29, 2010   Tyler D. ...
American Bankers Association Risk Management Forum April 29, 2010 Tyler D. ...tnunnally
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
Measuring Risk - What Doesn’t Work and What Does
Measuring Risk - What Doesn’t Work and What DoesMeasuring Risk - What Doesn’t Work and What Does
Measuring Risk - What Doesn’t Work and What DoesJody Keyser
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
 

Similar to Signal and noise (20)

Cognitive Bias in Risk-Reward Analysis
Cognitive Bias in Risk-Reward AnalysisCognitive Bias in Risk-Reward Analysis
Cognitive Bias in Risk-Reward Analysis
 
Applied bayesian statistics
Applied bayesian statisticsApplied bayesian statistics
Applied bayesian statistics
 
D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
Frequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John MountFrequentist inference only seems easy By John Mount
Frequentist inference only seems easy By John Mount
 
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...
Quantitative Methods for Lawyers - Class #14 - Power Laws, Hypothesis Testing...
 
Emperors without clothes
Emperors without clothesEmperors without clothes
Emperors without clothes
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Bayesian networks and the search for causality
Bayesian networks and the search for causalityBayesian networks and the search for causality
Bayesian networks and the search for causality
 
Society for Risk Analysis On Transnational Risk & Terrorism
Society for Risk Analysis  On Transnational Risk & TerrorismSociety for Risk Analysis  On Transnational Risk & Terrorism
Society for Risk Analysis On Transnational Risk & Terrorism
 
Behav finance oct2011
Behav finance oct2011Behav finance oct2011
Behav finance oct2011
 
Risk taking in Academic Libraries: The Implications of Prospect Theory
Risk taking in Academic Libraries: The Implications of Prospect TheoryRisk taking in Academic Libraries: The Implications of Prospect Theory
Risk taking in Academic Libraries: The Implications of Prospect Theory
 
American Bankers Association Risk Management Forum April 29, 2010 Tyler D. ...
American Bankers Association Risk Management Forum April 29, 2010   Tyler D. ...American Bankers Association Risk Management Forum April 29, 2010   Tyler D. ...
American Bankers Association Risk Management Forum April 29, 2010 Tyler D. ...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
Measuring Risk - What Doesn’t Work and What Does
Measuring Risk - What Doesn’t Work and What DoesMeasuring Risk - What Doesn’t Work and What Does
Measuring Risk - What Doesn’t Work and What Does
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 

More from christiangregory

More from christiangregory (7)

HNCC Presentation 8 May 2014
HNCC Presentation 8 May 2014HNCC Presentation 8 May 2014
HNCC Presentation 8 May 2014
 
Does SNAP Improve Your Health? (SEA Slides)
Does SNAP Improve Your Health? (SEA Slides)Does SNAP Improve Your Health? (SEA Slides)
Does SNAP Improve Your Health? (SEA Slides)
 
Does SNAP Improve Your Health?
Does SNAP Improve Your Health?Does SNAP Improve Your Health?
Does SNAP Improve Your Health?
 
Gregory etal aaea_12
Gregory etal aaea_12Gregory etal aaea_12
Gregory etal aaea_12
 
Uab 28june 12
Uab 28june 12Uab 28june 12
Uab 28june 12
 
Letsmakeadeal
LetsmakeadealLetsmakeadeal
Letsmakeadeal
 
Letsmakeadeal
LetsmakeadealLetsmakeadeal
Letsmakeadeal
 

Signal and noise

  • 1. On The Signal and the Noise: Why So Many Predictions Fail, But Some Don’t Christian Gregory February 12, 2013
  • 2. declaring my priors Am I an unbiased estimator?
  • 3. who is Nate Silver? ◮ developed PECOTA, Pitcher Empirical Comparison and Optimization Test Algorithm ◮ predict performance of pitchers, expanded to hitters ◮ purchased by Baseball Prospectus ◮ 2003-2008 matched or bettered commercial baseball forecasting systems, Vegas over-under lines ◮ 538 blog–political predictions for NYT ◮ ex-poker player ◮ Out magazine’s person of the year, 2012
  • 4. motivation ◮ two contexts: “big data” and failure of prediction ◮ “big data”: IBM–2.5 quintillion bytes of data per day ◮ failures of prediction banking crisis/housing bubble tv pundits baseball weather ... ◮ test of theory–falsifiability (Popper) ◮ complex theories difficult to falsify–Popper: they have little value ◮ predictions about simple things make us suspect complex models ◮ Silver: nonetheless, these theories have value–new attitude toward prediction: Bayes’ Rule
  • 5. today ◮ go through some of Silver’s examples–failures, sucesses ◮ outline solution: Bayes’ Rule ◮ limitations: example using “fairly big” data ◮ conclusion
  • 6. example: housing bubble, bank crash, great recession actor: rating agencies ◮ S&P, Moody’s, other rating agencies got it very wrong ◮ predicted default on AAA-rated CDO’s: .12%; actual: 28% ◮ why did they do such poor modeling job? 1. rating agencies part of a legal oligopoly 2. many pension funds require rating by Moody’s/S&P to buy any piece of debt 3. Moody’s would rate debt “structured by cows” 4. no financial incentive for predictions to have good frequentist properties 5. all incentives were toward giving “bubble ratings”
  • 7. example: housing bubble, bank crash, great recession ◮ what were problems with models? 1. assume that P(default) for mortgages bundled in CDO’s are uncorrelated 2. confuse risk and uncertainty ◮ risk – quantifiable likelihood of upside or downside event–something you can put a price on; the point of decision-making is to reduce the risk of being a sure loser ◮ uncertainty – requires assigning distribution to unknowns/unobservables 3. rating agencies “spun uncertainty into what looked and felt like risk” (29) 4. result is to produce precise but wholly inaccurate predictions (courtesy “big data”)
  • 8. example: housing bubble, bank crash, con’t other actors: homeowners, consumers, banks, economists/policymakers ◮ homeowners: 2003 survey believe housing prices ↑ 13% per year (over 100 years ending in 1996: < 1%) ◮ banks: leverage ◮ Bear Stearns bail out: overnight repo market frozen due to change in value of collateral ◮ Lehmann Bros leverage ratio 33:1, decline of 3% in financial positions → negative equity, bankruptcy ◮ consumers: leverage. wages stagnant, treat homes as ATMs ◮ economists (Romer, Summers, etc.): overestimate the soundness of financial system, underestimate ripple effects in economy (unemployment)
  • 9. all of these things are just like the others ◮ what do forecasting failures have in common? ◮ they all failed to take into account a critical piece of context ◮ example: confidence in ⇑ in housing prices stems from trends in recent prices, but housing prices had never risen this rapidly before ◮ in short, events forecasters were considering were out of sample ◮ so, courtesy “big data”, the forecasts (especially of banks and rating agencies) were very precise (big N), but wildly inaccurate
  • 10. punditry: or the political entertainment media complex ◮ political pundits basically get it right about 50% of the time ◮ this is not awesome: you could flip a coin and do about as well as any political pundit on tv ◮ two kinds of predictors: hedgehogs and foxes (Philip Tetlock: psychology and political science) ◮ foxes: “scrappy creatures who believe in a plethora of little ideas and in taking a multitude of approaches toward a problem” (53) ◮ hedgehogs: type A personalities that believe in Big Ideas–in governing principles that act as physical laws
  • 11. foxes and hedgehogs Table : foxes vs. hedgehogs foxes hedgehogs multidisciplinary specialized adaptable stalwart self-critical stubborn tolerant of complexity order-seeking cautious confident empirical ideological better forecasters better tv guests obviously (?), it’s better to be foxy
  • 12. other examples ◮ earthquakes: impossible (though not impossible to do better than some) ◮ weather: real success story in the last 30 years ◮ two challenges ◮ dynamic system (good understanding of laws of motion) ◮ non-linear (exponential changes) ◮ ⇒ small changes in decimal places have big effects on outputs ◮ in weather, don’t always have incentives for accurate prediction ◮ calibration comparisons TWC, AccuWeather, Local quite different, local do a lot worse ◮ a lot stronger ”wet bias” in local forecasts–perception of accuracy more important than accuracy
  • 13. how to drown in 3 feet of water: listen to an economist ◮ economists bad at communicating uncertainty of forecasts ◮ a study of survey of professional forecasters: GDP growth fell out of prediction interval 50% of the time ◮ biased toward overconfidence ◮ why are they so not awesome? ◮ cause/effect are often reversed: nothing is predictive over time ◮ dynamic system: things that matter change ◮ economic data is very noisy ◮ like weather: dynamic, uncertain initial conditions ◮ biased forecasts are rational: no skin in the game–no market for accurate forecast ◮ difference b/w meaningful and “dumb data” forecast: Jan Hatzius/ECRI (196)
  • 14. bayesian reasoning: how to be less wrong ◮ studied judgement and a meaningful model ◮ model of dgp–i.e. theory (this means you, economists) ◮ statistical model that accounts for uncertainty (not just measurement error) in parameters ◮ Bayes’ Rule: for any event p(B|A) ∗ p(A) p(A|B) = (1) p(B) where p denotes probability conditional∗prior ◮ posterior = marginal ◮ does not require metaphysical uncertainty, only epistemological uncertainty
  • 15. bayes’ rule: a simple example ◮ example given in Silver (245): ◮ prevalence of breast cancer for women in their 40’s: .014 ◮ p(test = 1|cancer = 0) = .10 ◮ p(test = 1|cancer = 1) = .75 ◮ 42 year old woman, positive test, what is probability of cancer? ◮ p(cancer = 1|test = 1) = p(test=1|cancer =1)∗p(cancer =1) p(test=1|cancer =1)∗p(cancer =1)+p(test=1|cancer =0)∗p(cancer =0) ◮ p(cancer = 1|test = 1) = .75∗.014+.10∗.986 = .096 .75∗.014 ◮ with this example: low prior (for this population); relatively low sensitivity (true positive); relatively high false positive
  • 16. bayes and fisher: statistics smackdown! ◮ Bayesian vs. Fisherian (frequentist) methods–fear of the “subjectivity” of the prior ◮ really interesting history McGrayne, The Theory that Would Not Die ◮ Air France flight 447, July 2009, not found in 18 months ◮ Metron, US consulting firm, using bayesian search method, 7 days ◮ Silver: frequentist methods keep him/her hermetically sealed off from the world: “discourage the researcher from considering the underlying context or plausibility of his hypothesis, something that Bayesian method demands in the form of the prior probability” (253) ◮ Fisher late in his life argued against research showing a that smoking caused lung cancer, arguing that lung cancer caused smoking ◮ many reasons for this, among them insistence of “objective purity” of the experiment ◮ prediction is inherently subjective, but can be made rigourous ◮ can be less irrational, less subjective, less wrong
  • 17. examples ◮ chess: Deep Blue vs. Kasparov (chess is “Bayesian process”) ◮ poker: getting a few things right can go a long way (the Pareto principle) ◮ this is true of other disciplines ◮ right data ◮ right technology ◮ right incentives–i.e. you need to care about accuracy ◮ when a field is competitive, then a lot of extra effort is needed to get to the margins of profitability ◮ Silver was lucky in baseball, political forecasting, poker ◮ people have begun to copy 538 blog; copied PECOTA; never really made it in poker
  • 18. a trip to bayesland ◮ in bayesland, you walk around with your prediction about events on a signboard: obama re-elected, lindsey lohan re-arrested, nadal wins wimbledon, ... ◮ if you meet someone with different estimates, either: 1. come to consensus, or 2. place a bet ◮ this kind of thinking is crucial to decision theory, which represents the rigourous application of probability to all decisions ◮ the primary question: how to avoid being a sure loser (Kadane, Principles of Uncertainty ) ◮ pay close attention consensus forecasts–no free lunch in efficient markets–trying to bet against this will bring trouble ◮ usually only works when people are forecasting independently, not based on one another’s forecasts. ◮ also may not work when you are using other people’s money–which is almost always true today–Abacus GS–herd behavior ◮ bayesian reasoning will not get you out of a bubble if all of your prior information is unreliable (GIGO)
  • 19. bayes’ rule: single/multi-parameter models ◮ this can be applied to estimating parameters for any statistical model ◮ from Bayesian perspective, parameters are random variables, not fixed quantities we try to estimate ◮ for set of parameters p(data|θ) ∗ p(θ) p(θ|data) = (2) p(data) p(θ|data) ∝ p(data|θ) ∗ p(θ) posterior ∝ likelihood*prior
  • 20. bayes’ rule: a canonical example ◮ canonical example: estimation of a population proportion, p ◮ let (X1 , X2 ...Xn ) be independent binary variables ◮ let y = i Xi ∼ BR(n, p) f (y|p)∗f (p) ◮ goal: estimate posterior of p, f (p|y ) = f (y) ◮ prior for p: Be(A, B) (to make this simple, this is a conjugate form) Γ(n+A+B) (y+A)−1 ◮ f (p|y ) = Γ(y+A)Γ(n−y+B) p (1 − p)n−y+B−1 y+A y ◮ Bayesian p = A+B+n = ML: p = n
  • 21. bayes’ rule: a canonical example MLE Bayes 15 density 10 5 0 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 p Figure : Bayes and MLE Estimates of Proportion
  • 22. bayes’ rule: an empirical example ◮ chorus of people saying we should restrict SNAP benefits, no SSBs ◮ take simple counterexamples: cigarettes and alcohol ◮ goal: estimate posterior of µ for SNAP participants and non-particpants ◮ likelihood: poisson; prior for µ: G (α, β) (to make this simple, this is a conjugate form) 1 ′α′ α′ −1 ◮ f (µ|y ) = Γ(α′ ) β µ exp(−β ′ µ) ◮ α′ = α + i yi ; β ′ = β + n ◮ data: NHIS 2011
  • 23. bayes’ rule: an empirical example Cigarettes per Day 0.12 0.12 0.08 0.08 Density Density 0.04 0.04 0.00 0 10 20 30 40 50 60 0.00 0 20 40 60 SNAP Participants Non−Participants Figure : Cigarette Consumption and SNAP Participation
  • 24. bayes’ rule: an empirical example Drinks per Day 0.4 0.15 0.3 Density Density 0.10 0.2 0.05 0.1 0.00 0.0 0 5 10 20 30 0 20 40 60 80 SNAP Participants Non−Participants Figure : Alcohol Consumption and SNAP Participation
  • 25. bayes’ rule: an empirical example non−participants snap participants 8 8 6 6 posterior density posterior density 4 4 2 2 0 0 6.15 6.20 6.25 6.30 6.35 6.40 6.45 6.50 5.7 5.8 5.9 6.0 6.1 cigarettes per day cigarettes per day Figure : Posterior Mean of # of Cigarettes/Day: Gamma-Poisson
  • 26. bayes’ rule: an empirical example non−participants snap participants 30 20 25 20 15 posterior density posterior density 15 10 10 5 5 0 0 1.80 1.85 1.90 1.00 1.02 1.04 1.06 1.08 1.10 drinks per day drinks per day Figure : Posterior Mean of # of Drinks/Day: Gamma-Poisson
  • 27. bayes’ rule: an empirical example Table : ML and Bayes’ Estimates of Mean No SNAP SNAP ML Bayes ML Bayes N cigarettes 6.31 6.34 5.89 5.86 2,737 drinks 1.85 1.85 1.05 1.07 5,613 ◮ GLM (µ = xi β) with poisson requires MCMC, no conjugate analysis possible
  • 28. what’s the frequency, Nate? (i.e., so?) ◮ Silver: prior is a statement of assumptions–clearing of the air ◮ bayesian reasoning is about being careful about understanding and declaring one’s model ◮ would Countrywide have been able to declare their prior on home price trends to exclude < 0? if they had, would it have made a difference? ◮ even with relatively small N, the “answers” are the same ◮ for econometricians (very different point of view from statistician), question is whether different interpretations different enough to warrant extra work ◮ probably don’t need bayes’ rule to say: be careful of your priors, check your model for ◮ how well it works–calibration ◮ how robust it is to alternative assumptions ◮ where it matters: likelihood doesn’t converge, missing data (includes latent variables)