Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree

Twitter Based Outcome
Predictions of 2019 Indian
General Elections Using Decision
Tree
Ferdin Joe John Joseph
Thai-Nichi Institute of Technology

Overview
Problem Statement
Election Sentiments
Indian General Elections
Existing Methodologies
Our Proposed Methodology
Validation
References
2

Problem Statement
Social Media
A huge corpus of data
which mostly reﬂects the
mood and mentality of
people connected to the
network.
Twitter
A social media
networking
microbloggers across the
globe
● Texts from this
microblog site is
relatively easier to
access and has more
information to mine.
Problem statement
Use Machine Learning to
mine tweets related to
elections and to predict
the outcome.
Strategy: Agile
3

Election
Sentiment
Decoding Democracy…..
● Various surveys are done by
political parties, social
scientists, media houses
collaborating independent
survey agencies.
● This involves questionnaires
and interviews.
● People will be aware of what
the interviewer ask them.
4

Domain Knowledge: Indian General Elections
Prelude
Largest Democracy in
the World
Conducted once in 5
years since 1952
Paperless Ballot. Uses
Electronic Voting
Machine (EVM) since
2004
Process
Multiple Phases
Various states follow
different schedule to vote
● In 2019, India voted
in 7 phases i.e. 7
polling days over a
span of 50 days of
election code of
conduct.
Eﬃciency
Quick results
Most of the election
results are known by
noon of the counting day
5

Pre - Poll
Survey I
Done around an year
or 6 months before
actual polls. Soon
after polls announced.
Pre - Poll
Survey II
After the poll alliances
are formed by various
national and regional
parties.
Pre - Poll
Survey III
Weeks before the
polling starts.
Exit Polls
Survey by interview on
day of polling outside
of polling booths.
Actual Poll
The actual poll
according to the
Election Commision of
India.
7

Social Media
based analysis
Twitter, Facebook, Youtube
feeds are analysed mostly
Twitter is predominant among
social media analysis of election
data. Many politicians are more
active in twitter than other social
media.
Eg. Narendra Modi, Donald Trump,
Barack Obama, Hillary Clinton,
Shinzo Abe
8

Previous Election Predictions using Twitter Data
India (2014) and Pakistan (2013)
Diffusion Centric Model was used to map
sentiments. Multi party environment.
Graphs reveal it was successful in India but
not close correlation in Pakistani context
though it is a single phase poll.
2016 US Presidential Elections
Supervised Learning methodology used
60000 tweets each for Donald Trump and
Hillary Clinton in Dual Party System.
Quensland (Australia) State Election - 2012
Predicted a general trend on who has more popularity. Dual
Party System.
Germany - 2011
Dual Party System. Done on twitter mentions. Not
sentiment of text.
Sweden - 2012
Twitter user behaviour like likes and retweet counts taken.
UK 2017 : Dual Party System. SVM and CNN used but
single phase poll. 9

Problem
Deﬁnition
Taken before Implementation
A twitter sentiment based outcome
prediction good enough to
1. Multi - Party, Multi - Alliance
System
2. Measure possible seats to be
won which is good to track
majority
3. Tracks over days of election
phases
10

Data
Collection
Done for 50 days
before the last polling
day
Data
Cleaning
Identifying the
Relevant tweets
Preprocessing
Tokenization of
tweets, POS tagged,
SVO structure
extracted
Classiﬁer
Decision Tree
Analytics
Scores aggregated
over subsets of
popularity
12
Architecture

Data
Collection
Done for 50 days
before the last polling
day. Tweepy and
Pymongo libraries
used.
Data
Cleaning
English tweets alone are selected
Repeated content is omitted. Image and
video tweets pruned.
Regular Expressions of emojis are removed
Preprocessing Classiﬁer Analytics
13
JSON Format of
tweets is
supported by
MongoDB
Architecture

Data
Collection
Data
Cleaning
Preprocessing
Tokenization of
tweets, POS tagged,
SVO structure
extracted
Classiﬁer Analytics
14
Architecture

Data
Collection
Data
Cleaning
Preprocessing
Decision Tree is used.
Polarity scores of
tweets are aggregated
based on subjectivity.
15
Architecture

Data
Collection
Data
Cleaning
Preprocessing
Daily insights
Phase-wise insights
Trend from beginning till every
phase
16
Architecture

310 - 233
April 2019 - Jan Ki
Bath (Ruling Party
Alliance Vs Opposition
Alliance)
275 - 268
April 2019 - India TV &
CNX (Ruling Party
Alliance Vs Opposition
Alliance)
279 - 264
April 2019 - Times
Now & VMR (Ruling
Party Alliance Vs
Opposition Alliance)
293 - 249
20 May 2019 -
Proposed Twitter
Mining (Ruling Party
Alone Vs Others)
303 - 239
23 May 2019 - Actual
Result (Ruling Party
Alone Vs Others)
18
Party with support of
272+ seats can form
government

Ruling party Trend over 7 phases of poll
19

Phase wise
Difference
Phase 1 - 7
max growth
20

Results
Phase - wise
projection
max growth
21

Today and Road
to Future….
Better accuracy in predicting
outcome on the basis of seats
won by parties
Need to mine non - English tweets
Need to look into location speciﬁc
which can help predict state
assembly elections.
23

References
[1] J. Burgess and A. Bruns, “(Not) the Twitter election: the dynamics of the#
ausvotes conversation in relation to the Australian media ecology,” Journal. Pract.,
vol. 6, no. 3, pp. 384–402, 2012.
[2] V. Kagan, A. Stevens, and V. S. Subrahmanian, “Using twitter sentiment to
forecast the 2013 pakistani election and the 2014 indian election,” IEEE Intell.
Syst., vol. 30, no. 1, pp. 2–5, 2015.
[3] J. Ramteke, S. Shah, D. Godhia, and A. Shaikh, “Election result prediction using
Twitter sentiment analysis,” in 2016 international conference on inventive
computation technologies (ICICT), 2016, vol. 1, pp. 1–5.
[4] A. O. Larsson and H. Moe, “Studying political microblogging: Twitter users in the
2010 Swedish election campaign,” New Media Soc., vol. 14, no. 5, pp. 729–747,
2012.
[5] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election
classification,” Inf. Retr. J., vol. 21, no. 2–3, pp. 183–207, 2018.
[6] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Election forecasts
with Twitter: How 140 characters reflect the political landscape,” Soc. Sci. Comput.
Rev., vol. 29, no. 4, pp. 402–418, 2011.
[7] J. Roesslein, “tweepy Documentation,” Online] http://tweepy. readthedocs.
io/en/v3, vol. 5, 2009.
[8] A. Nayak, MongoDB Cookbook. Packt Publishing Ltd, 2014.
[9] F. J. John Joseph, R. T, and J. J. C, “Classification of correlated subspaces using
HoVer representation of Census Data,” in 2011 International Conference on
Emerging Trends in Electrical and Computer Technology, 2011, pp. 906–911.
[10] E. Loper and S. Bird, “NLTK: the natural language toolkit,” arXiv Prepr.
cs/0205028, 2002.
[11] S. Loria, “textblob Documentation,” 2018.
[12] T. N. Bureau, “Times Now-VMR Opinion Poll For Election 2019: PM Narendra
Modi-led NDA likely to get 279 seats, UPA 149,” Times Now News, New Delhi,
2019.
[13] I. T. News Desk, “Lok Sabha Election 2019: NDA may get thin majority with 275
seats, BJD may retain Odisha, YSR Congress may win Andhra, says India TV-CNX
pre-poll survey,” India TV, 2019.
[14] “2019 Indian General Election,” Wikipedia, 2019. [Online]. Available:
https://en.wikipedia.org/wiki/2019_Indian_general_election.
24

Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree

Similar to Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree (20)

More from Ferdin Joe John Joseph PhD

More from Ferdin Joe John Joseph PhD (20)

Recently uploaded

Recently uploaded (20)

Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree