Purpose of this project is to analyze the 2016, US Presidential Primary election data to
predict who would be the final nominee from both the democratic and republican party
and draw many other insights as well.
2. Table of Contents
Overview of Dataset
Objectives
Tools Used
Methodology
Analysis & Findings
Assumptions
Prediction
Conclusions
Bibliography
3. Overview of Dataset
Dataset was obtained from Kaggle website
The dataset contains relevant data for the 2016 US Presidential Elections,
including results of primary elections
The dataset consisted of 4 files in csv and zip format, namely,
County_facts- demographic data on counties from US census
County_facts_dictionary- description of columns of County_facts
Primary_results- File containing data about votes and number of votes received
by each candidate in different counties.
4. Objectives
Understanding the primary elections and key terms
Number of candidates who took part in the primary elections from each party
Most popular candidate for each party by different state and with respect to different types of
people
Differentiation in number of votes with respect to party and candidate by each state
Analysing the Non-Swing states (Looking previous 5 election year trends)
Understanding the general elections and key terms
Calculating the number of electoral votes for final presidential nominees
Prediction of the next President of the United States of America
Predictions and models
Popularity of each candidate on the basis of twitter sentiment analysis
Performance comparison of the various tools utilized
6. Methodology
Obtain dataset from
Kaggle.com
Explore the data to find what
its all about
Understand the US primary
elections
Defining objectives
Modifying, cleaning and
transformation of Data in
RStudio
Writing the modified dataset
into a csv file
Carrying out different type of
analysis on the modified data
to draw insights using different
tools and visualizations
Understand the US general
elections
Make certain Assumptions in
order to predict the next
president
Do qualitative & quantitative
analysis keeping in mind the
assumptions made to find out
the next president
Supporting our answer with
the help of certain
mathematical models
Twitter Sentiment Analysis to
find the popularity of final
presidential nominees
Comparison of performance of
tools used for analysis
Drawing conclusions
8. Understanding the US Primary Election
and key terms
Key Terms
National Conventions
Primary
Closed primary
Open primary
New Hampshire Primary
Caucus
Iowa Caucuses
Delegates
Pledged Delegates
Super Delegates
9. Number of candidates who took part in the
primary elections from each party
Based on dataset, a total number of 14 candidates together from both the parties took part in the primary elections, who are as follows:
Democratic
Party
Hillary Clinton Bernie Sanders
Martin O’
Malley
RepublicanParty
Ben Carson
Carly Fiorina
Chris Christie
Donald Trump
Jeb Bush
John Kasich
Marco Rubio
Mike Huckabee
Rand Paul
Ted Cruz
Rick Santorum
15. 2012 ELECTIONS TO SEE THE NON-SWING STATES AND COMPARE IT WITH
THIS YEAR ELECTIONS
16. Understanding the general election
and key terms
Key terms
Electoral College
Electors
Swing states
17. Calculating the number of electors
Number of electors differ for each state
The number of electors are calculated on the basis of number of districts in
each state along with the senate members, which are two for all states
The more the number of districts in each state, the more the number of
electors
Electors are the persons who choose the president of the United States
The electors vote in the favour of the nominee who was popular across
each state
California
53 districts
2 senate
members
55
electors
18. Prediction of the next president of
United States
As the data pertaining to general elections was not available certain
presumptions were made, which are as follows:
The conditions and the number of votes to be cast during the upcoming general
elections would be similar to the conditions during primary elections
Therefore, the same data of primary elections was analysed to draw prediction
insights
Qualitative analysis and current affairs were used to make predictions
Two different predictions were made, one on the basis of party and other on the
basis of final presidential nominee
The predictions are supported by different mathematical models defined by
distinguished professors in their fields
Assumption on the division of votes of the candidates who quit or suspended their
campaign
19. Predictions and models
On the basis of party, the most number of electoral votes went to
republican party, leading to the win of Donald Trump
If we take only the candidates solely, and forget the parties then there can
be two phases as follows,
1st Phase- Winner Hillary Clinton
2nd Phase- Winner Donald Trump
Mathematical models to support our answer include different econometric models
such as, DeSart Model (Jay DeSart), Fair Model (Ray Fair), Primary Model (Helmut
Norpoth), and Electoral Cycle Model (Helmut Norpoth) among others.
21. Performance of various tools utilized
We have carried out similar analysis on both R and Python and based on
our data and skills we came to the following conclusions:
Parameter R Python
Number of lines of code
(average)
145 85
RAM Usage 88% 66%
Average Processing Time
(minutes)
8-10 4-7
Ease of coding Easy Moderate
Number of Packages
used
22-25 4-6
22. Conclusion
As per our analysis the prediction is mainly dependent on the casting of
votes in swing states along with division of votes of Ted Cruz of
Republican party as he has declined to endorse his republican counterpart
Donald Trump.