This document describes a study analyzing polling data related to Brexit to determine if the outcome was predictable. It involved collecting 261 polling observations from 2013-2016 from 15 pollsters. Three approaches to analyzing the data were considered: only using the latest poll, combining all polls, and using a weighted non-parametric approach. The best model used a local polynomial regression that accounted for trends over time and uncertainty in undecided voters. This analysis found that while the outcome was close, the polling data predicted Brexit was more likely than not and was not a surprising result.
2. INTRODUCTION
Project Name
Lorem ipsum dolor sit amet,
consectetur adipiscing elit.
Aliquam tincidunt ante nec
sem congue convallis.
Pellentesque vel mauris quis
nisl ornare rutrum in id risus.
2
Previous Study of Brexit
Literature Review
OUTLINE
BACKGROUND
What is Brexit?
Why the result is surprising?
Goal of this Project?
PRELIMINARY
ANALYSIS
Data Description
LATEST POLL
COMBINED POLL
MAIN ANALYSIS:
Non-Parametric
Approach
Track Public Opinion Overtime (NP)
Comparing the result with previous study
Prediction
Maximum Uncertainty Analysis
CONCLUSION
Summary and Recommendation
Answering the research question of this project
4. On 23rd June 2016, UK had a
referendum to decide whether
to STAY or LEAVE EU.
The result of referendum ratified
the exit from EU.
INTRODUCTION Background Main Analysis
Preliminary Analysis Conclusion
5. The result is consider suprising
BREXIT has Great impact on
policy, economy, and business
Many economist, sociolog, and
statistician around the world
failed to predict the Brexit
Main Analysis
Preliminary Analysis Conclusion
Large Pollsters in UK failed to
forecast the final vote.
INTRODUCTION Background
6. INTRODUCTION Background Main Analysis
Preliminary Analysis Conclusion
Main Aim of This Project
Attempt to assess whether BREXIT was evident from polls
Is Brexit really
unpredictable event?
Was Brexit is suprising
based on poll data?
Client’s Need: Create a model in a way that estimates add up to one
8. Introduction BACKGROUND Main Analysis
Preliminary Analysis Conclusion
What Method to work on Polling data?
What is the common approach?
General Background:
9. As polls data before voting days are not updated on a consistent basis, it raises a concern about
how to incorporate the new and old data? Should we use only the latest poll? Should we use all
the historical polls?
Campbell Weight 1999 – Using Regression
Model to forecast is popular way in working
on poll data
LITERATURE REVIEW
9
Brown, LB & Chappel 2014 – Linear
Regression may not always feasible on poll
data
Kernel Estimation
Prediction and Estimation
10. As polls data before voting days are not updated on a consistent basis, it raises a concern about
how to incorporate the new and old data? Should we use only the latest poll? Should we use all
the historical polls?
LITERATURE REVIEW
10
Fabio Ceili 2016
1.Opinion poll are now easily assesible with Internet
and Phone.
2.To understand the outcome of election
Consider opinion trend
Historical voter behaviour
Accessbility and Sampling
11. As polls data before voting days are not updated on a consistent basis, it raises a concern about
how to incorporate the new and old data? Should we use only the latest poll? Should we use all
the historical polls?
LITERATURE REVIEW
11
Chistensen 2008 – Need of Different Assimilation Method
Working on Poll Data
Polls data are not updated on a consistent basis:
1. how to incorporate the new and old data?
2. Should we use only the latest poll?
3. Should we use all the historical polls?
12. As polls data before voting days are not updated on a consistent basis, it raises a concern about
how to incorporate the new and old data? Should we use only the latest poll? Should we use all
the historical polls?
LITERATURE REVIEW
12
Chistensen 2008 – Need of Different Assimilation Method
Working on Poll Data
Polls data are not updated on a consistent basis:
We consider 3 ways to compute and compare the
outcome probabilities
LATEST POLL COMBINED POLLS WEIGHTED POLLS
13. As polls data before voting days are not updated on a consistent basis, it raises a concern about
how to incorporate the new and old data? Should we use only the latest poll? Should we use all
the historical polls?
PREVIOUS STUDY
13
Many previous studies try to forecast whether
Brexit will happen or not
- Only use Latest Survey Data
- Based on Statistic Descriptive
- Solely consider Point Estimate
14. Introduction BACKGROUND Main Analysis
Preliminary Analysis Conclusion
𝑠𝑖 = 𝛼 + 𝛽𝑡𝑖
𝑙𝑖 = 𝛾 + 𝛿𝑡𝑖
Fry, John. A Statistical Reaction to Brexit. Stat Views, 2016.
Wright, G. and Rowe, G. A statistical prediction of the Scottish
referendum. Statistics Views Website, Wiley, 2014
Future response 𝐿∗ to be observed at a value of 𝑡∗
𝐿∗ = 𝛾 + 𝛿𝑡∗
𝛾 + 𝛿𝑡∗ ± t0.025𝜎 1 +
1
𝑛
+
𝑡∗ − 𝑡 2
SS𝑡𝑡
With 95% CI
Previous Studies:
15. Introduction BACKGROUND Main Analysis
Preliminary Analysis Conclusion
This gives an estimate of ONLY 48.7%
voting for Brexit
16. Introduction BACKGROUND Main Analysis
Preliminary Analysis Conclusion
Fry John proceed to claim that the
Brexit referendum is not predictable
as the (linear-regression) prediction
would ultimately turn out to be wrong.
18. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
DATA DESCRIPTION
We only use 261 obs that
range from 2013-2016
The data come 15 pollsters
19. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
DATA DESCRIPTION
Need adjustment that S + L + U =1 Projection of a point on a plane
23. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
DATA DESCRIPTION
Let consider the plane (p) of equation
𝒔 + 𝒍 + 𝒖 = 𝟏
We want that
𝒂𝒙 + 𝒃𝒚 + 𝒄𝒛 − 𝟏 = 𝟎
and a point 𝑴(𝒔, 𝒍, 𝒖)
𝒔 = 𝒔 + 𝒂𝒕𝟎
𝒍 = 𝒍 + 𝒃𝒕𝟎
𝒖 = 𝒖 + 𝒄𝒕𝟎
𝒕𝟎 = −
𝒂𝒔 + 𝒃𝒍 + 𝒄𝒖 + 𝒅
𝒂𝟐 + 𝒃𝟐 + 𝒄𝟐
where
24. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
LATEST POLL
Consider poll that only taken in 22nd Jun 2016
There were 5 pollsters
25. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
LATEST POLL
Consider poll that only taken in 22nd Jun 2016
There were 5 pollsters
26. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
LATEST POLL
Consider poll that only taken in 22nd Jun 2016
There were 5 pollsters
Point estimate
not enough
27. LATEST POLL
We calculated 95% confidence intervals with 3 different methods:
Wald Clopper-Pearson
Agresti-Coull
30. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
COMBINED POLL
Combine all previous polls up the present time and treat it as single sample,
weighting only sample size.
31. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
COMBINED POLL
Combine all previous polls up the present time and treat it as single sample,
weighting only sample size.
However,
34. COMBINED POLLS
How accurate is the assumption of no change in opinion
of surveyed respondent over time?
Is it true that time has no effect of public opinion
EU referendum?
35. Introduction Background Main Analysis
PRELIMINARY
ANALYSIS Conclusion
COMBINED POLL
Combine all previous polls up the present time and treat it as single sample,
weighting only sample size.
We need to know
the pattern of
undecided
Need to
see trend
over time
39. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
Early 2013:
Leave
2014-Mid
2015:
Stay
2016:
Leave Rising
Quickly
Data Distribution is not parametric
45. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
We are looking for the kernel and the bandwidth that minimise the Error
Nadaraya-Watson kernel estimator
Cross
Validation
46. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
We are looking for the kernel and the bandwidth that minimise the Error
Nadaraya-Watson kernel estimator
Cross
Validation
The motivation starts by considering the sum-
of-squared errors
49. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
We have confirm our initial guess
We have trends of public opinion over time
NEXT
Using the pre-referendum polling data to predict the actual
referendum result
50. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
Prediction
Nadaraya-Watson Kernel estimators suffer from 2 bias:
1. boundary bias
(a bias near the endpoints of the 𝑡𝑖 )
2. design bias
(a bias that depends on the distribution of
the 𝑡𝑖 )
Local Polynomial Regression
56. LOCAL POLYNOMIAL
PREDICTION on 23rd June 2016
Point Estimate of Prediction for Brexit Result
95 % Confidence Interval of Prediction for Brexit Result
57. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
PREDICTION
The model indeed predicts the Brexit Referendum.
It suggests that Brexit is more likely to happen than not.
The Prediction interval between Leave and Stay are overlapping.
This finding clearly suggests that Brexit is not a shocking result.
It shown that it would be a very close call.
58. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
Outcome Comparison with Previous Study
Compare our local polynomial regression model with simple linear regression model
61. Introduction Background MAIN ANALYSIS
Preliminary
Analysis
Conclusion
Maximum Uncertainty Analysis
• The number of Undecided is still quite high.
• Gap between Leave and Stay is quite narrow, the last minutes swing from the undecided
may change the result of the referendum.
• The Undecided in the polls clearly creates big uncertainty in this case. For this reason,
we will perform maximum uncertainty analysis
40% Stay, 50% Leave, and 10% Undecided;
45% Stay and 55% Leave
64. MAXIMUM
UNCERTAINTY
The model predicts that Leave would win the referendum result
CI ranges have become narrower
CI overlap between Leave and Stay has reduced
Share Undecided- 50:50
65. MAXIMUM
UNCERTAINTY
The model predicts that Leave would win the referendum result
CI ranges have become narrower
CI overlap between Leave and Stay has reduced
Share Undecided- 50:50
69. Introduction Background
Preliminary
Analysis
CONCLUSION
Main Analysis
Conclusion & Recommendation
The Brexit is actually not a surprising result. The prediction based on polls data has shown
that it would be a very close call. The historical polls data still can be trusted
Why Wrong?
Others
Only rely on point estimate
Descriptive analysis
Latest poll result only
Ignore the Undecided
Here some Answers
Others
Consider the trend over time
Consolidate with CI
Statistical Inference that fit our
case
Work on uncertainty
70. Introduction Background
Preliminary
Analysis
CONCLUSION
Main Analysis
Caveats and Further Improvement
The pollsters used relatively similar sampling methodology
Assumptions which may not be valid
A further study reviewing the robustness of the sampling methodology
Consider study what change in early 2016. Why public change significantly?