1. Forecasting the 2012 Presidential Election
from History and the Polls
Drew Linzer
Assistant Professor
Emory University
Department of Political Science
Visiting Assistant Professor, 2012-13
Stanford University
Center on Democracy, Development,
and the Rule of Law
votamatic.org
Bay Area R Users Group
February 12, 2013
4. The 2012 Presidential Election: Obama 332–Romney 206
But also: Nerds 1–Pundits 0
Analyst forecasts based on history and the polls
Drew Linzer, Emory University 332-206
Simon Jackman, Stanford University 332-206
Josh Putnam, Davidson College 332-206
Nate Silver, New York Times 332-206
Sam Wang, Princeton University 303-235
Pundit forecasts based on intuition and gut instinct
Karl Rove, Fox News 259-279
Newt Gingrich, Republican politician 223-315
Michael Barone, Washington Examiner 223-315
George Will, Washington Post 217-321
Steve Forbes, Forbes Magazine 217-321
5. What we want: Accurate forecasts as early as possible
The problem:
• The data that are available early aren’t accurate:
Fundamental variables (economy, approval, incumbency)
• The data that are accurate aren’t available early:
Late-campaign state-level public opinion polls
• The polls contain sampling error, house effects, and most
states aren’t even polled on most days
6. What we want: Accurate forecasts as early as possible
The problem:
• The data that are available early aren’t accurate:
Fundamental variables (economy, approval, incumbency)
• The data that are accurate aren’t available early:
Late-campaign state-level public opinion polls
• The polls contain sampling error, house effects, and most
states aren’t even polled on most days
The solution:
• A statistical model that uses what we know about presidential
campaigns to update forecasts from the polls in real time
7. What we want: Accurate forecasts as early as possible
The problem:
• The data that are available early aren’t accurate:
Fundamental variables (economy, approval, incumbency)
• The data that are accurate aren’t available early:
Late-campaign state-level public opinion polls
• The polls contain sampling error, house effects, and most
states aren’t even polled on most days
The solution:
• A statistical model that uses what we know about presidential
campaigns to update forecasts from the polls in real time
What do we know?
8. 1. The fundamentals predict national outcomes, noisily
Election year economic growth
Source: U.S. Bureau of Economic Analysis
9. 1. The fundamentals predict national outcomes, noisily
Presidential approval, June
Source: Gallup
10. 2. States vote outcomes swing (mostly) in tandem
Source: New York Times
11. 3. Polls are accurate on Election Day; maybe not before
Florida: Obama, 2008
60
55
Obama vote share
Actual
50 outcome
45
40
May Jul Sep Nov
Source: HuffPost-Pollster
12. 4. Voter preferences evolve in similar ways across states
Florida: Obama, 2008 Virginia: Obama, 2008
60 60
55 55
Obama vote share
Obama vote share
50 50
45 45
40 40
May Jul Sep Nov May Jul Sep Nov
Ohio: Obama, 2008 Colorado: Obama, 2008
60 60
55 55
Obama vote share
Obama vote share
50 50
45 45
40 40
May Jul Sep Nov May Jul Sep Nov
Source: HuffPost-Pollster
13. 5. Voters have short term reactions to big campaign events
Source: Tom Holbrook, UW-Milwaukee
14. All together: A forecasting model that learns from the polls
Publicly available state polls during the campaign
Cumulative number of polls fielded
2000
2008
1500
2012
1000
500
0
12 11 10 9 8 7 6 5 4 3 2 1 0
Months prior to Election Day
Forecasts weight fundamentals ←→ Forecasts weight polls
Source: HuffPost-Pollster
15. First, create a baseline forecast of each state outcome
Abramowitz Time-for-Change regression makes a national forecast:
Incumbent vote share = 51.5 + 0.6 Q2 GDP growth
+ 0.1 June net approval
− 4.3 In office two+ terms
16. First, create a baseline forecast of each state outcome
Abramowitz Time-for-Change regression makes a national forecast:
Incumbent vote share = 51.5 + 0.6 Q2 GDP growth
+ 0.1 June net approval
− 4.3 In office two+ terms
Predicted Obama 2012 vote = 51.5 + 0.6 (1.3)
+ 0.1 (-0.8)
− 4.3 (0)
17. First, create a baseline forecast of each state outcome
Abramowitz Time-for-Change regression makes a national forecast:
Incumbent vote share = 51.5 + 0.6 Q2 GDP growth
+ 0.1 June net approval
− 4.3 In office two+ terms
Predicted Obama 2012 vote = 51.5 + 0.6 (1.3)
+ 0.1 (-0.8)
− 4.3 (0)
Predicted Obama 2012 vote = 52.2%
18. First, create a baseline forecast of each state outcome
Abramowitz Time-for-Change regression makes a national forecast:
Incumbent vote share = 51.5 + 0.6 Q2 GDP growth
+ 0.1 June net approval
− 4.3 In office two+ terms
Predicted Obama 2012 vote = 51.5 + 0.6 (1.3)
+ 0.1 (-0.8)
− 4.3 (0)
Predicted Obama 2012 vote = 52.2%
Use uniform swing assumption to translate to the state level:
Subtract 1.5% for Obama from his 2008 state vote shares
Make this a Bayesian prior over the final state outcomes
19. Combine polls across days and states to estimate trends
States with many polls States with fewer polls
Florida: Obama, 2012 Oregon: Obama, 2012
56 56
54 54 q
q q
Obama vote share
Obama vote share
q q q
52 q 52
q
qq qq
q q
q q
q
q q
50 q q q
qq qq q
q q
q 50
qq
q q q
48 q 48
q q
q q
q
q
46 46
44 44
May Jun Jul Aug Sep Oct Nov May Jun Jul Aug Sep Oct Nov
20. Combine with baseline forecasts to guide future projections
Random walk (no)
Florida: Obama, 2012
56
54
q
Obama vote share
q
52 q
q
qq qq
q q
q q
q
q q
50 q q q
qq qq q
q q
q
qq
q q q
48 q
q q
q q
q
q
46
44
May Jun Jul Aug Sep Oct Nov
21. Combine with baseline forecasts to guide future projections
Random walk (no) Mean reversion
Florida: Obama, 2012 Florida: Obama, 2012
56 56
54 54
q q
Obama vote share
Obama vote share
q q
52 q 52 q
q q
qq qq qq qq
q q
q q
q
q q q q
q q
q
q q
50 q q q
qq qq q
q q
q 50 q q q
qq qq q
q q
q
qq qq
q q q q q q
48 q 48 q
q q
q q q q
q q
q q
q q
46 46
44 44
May Jun Jul Aug Sep Oct Nov May Jun Jul Aug Sep Oct Nov
Forecasts compromise between history and the polls
22. A dynamic Bayesian forecasting model
Model specification
yk ∼ Binomial(πi[k]j[k] , nk ) Number of people preferring Democrat
in survey k, in state i, on day j
πij = logit −1 (βij + δj ) Proportion reporting support for the
Democrat in state i on day j
National effects: δj
State components: βij
Election forecasts: πiJ
ˆ
Priors
βiJ ∼ N(logit(hi ), τi ) Informative prior on Election Day, using
historical predictions hi , precisions τi
δJ ≡ 0 Polls assumed accurate, on average
2
βij ∼ N(βi(j+1) , σβ ) Reverse random walk, states
2
δj ∼ N(δ(j+1) , σδ ) Reverse random walk, national
23. A dynamic Bayesian forecasting model
Model specification
yk ∼ Binomial(πi[k]j[k] , nk ) Number of people preferring Democrat
in survey k, in state i, on day j
πij = logit −1 (βij + δj ) Proportion reporting support for the
Democrat in state i on day j
National effects: δj
State components: βij
Estimated for all states simultaneously
Election forecasts: πiJ
ˆ
Priors
βiJ ∼ N(logit(hi ), τi ) Informative prior on Election Day, using
historical predictions hi , precisions τi
δJ ≡ 0 Polls assumed accurate, on average
2
βij ∼ N(βi(j+1) , σβ ) Reverse random walk, states
2
δj ∼ N(δ(j+1) , σδ ) Reverse random walk, national
25. Results: Anchoring to the fundamentals stabilizes forecasts
Electoral Votes
400
OBAMA 332
350
300
250
200
150 ROMNEY 206
Jul Aug Sep Oct Nov
26. There were almost no surprises in 2012
On Election Day, average error = 1.7%
Why didn’t the model do more?
27. There were almost no surprises in 2012
On Election Day, average error = 1.7%
28. There were almost no surprises in 2012
On Election Day, average error = 1.7%
Why didn’t the model improve forecasts by more?
29. The fundamentals and uniform swing were right on target
State election outcomes 2012=2008 line
q
HI
70
q
VT
q
NYq
qRI
MD
qq
CA
MA
60 q qq
DE
NJ CTIL
q
2012 Obama vote
ME
q
WA
q
OR
q
NMq
MI
q
MNq
CO q
NH WI
q NV
q
IA
q
VA PA
qq
OH
q
FL
50 q
NC
AZ q MO
qGA q q
IN
q q
MSSC
q
AK q
MT
q
TX
q
LA q
SD
40 q q
AL KY
q q
TN ND
q
KS
NE
q
AR
q
WV
q q
OK ID
30 q
WY
q
UT
30 40 50 60 70
2008 Obama vote
31. Could the model have done better? Yes
Difference between actual and predicted vote outcomes
6 HI
q
AK
q
↑ Obama performed
better than expected
Election Day forecast error
4
OR
q
MS
ID
qq NJ
q
2 MDq CA
CT
qq MI
q
LA ME
q
MAq
CO
q
q IA q
q
RI q NY q
q q NV NH
q
WI
q
AR TX IN
q NM VA
q
SC
q q WAq
FL
q
0 OK
q
GA
q
MN MO
q
q PA
NC
qq OH
q
AZ
q
q UT q
AL qq
DE ND
q IL
−2 KS
q
VT
q MT
q
KY
WY
qq
TN
NE
qq
−4 SDq
↓ Romney performed
better than expected
−6
WV
q
0 20 40 60 80 100
Number of polls after May 1, 2012
32. Forecasting is only one of many applications for the model
1 Who’s going to win?
2 Which states are going to be competitive?
3 What are current voter preferences in each state?
4 How much does opinion fluctuate during a campaign?
5 What effect does campaign news/activity have on opinion?
6 Are changes in preferences primarily national or local?
7 How useful are historical factors vs. polls for forecasting?
8 How early can accurate forecasts be made?
9 Were some survey firms biased in one direction or the other?