Twitter Based Outcome
Predictions of 2019 Indian
General Elections Using Decision
Tree
Ferdin Joe John Joseph
Thai-Nichi Institute of Technology
Overview
Problem Statement
Election Sentiments
Indian General Elections
Existing Methodologies
Our Proposed Methodology
Validation
References
2
Problem Statement
Social Media
A huge corpus of data
which mostly reflects the
mood and mentality of
people connected to the
network.
Twitter
A social media
networking
microbloggers across the
globe
● Texts from this
microblog site is
relatively easier to
access and has more
information to mine.
Problem statement
Use Machine Learning to
mine tweets related to
elections and to predict
the outcome.
Strategy: Agile
3
Election
Sentiment
Decoding Democracy…..
● Various surveys are done by
political parties, social
scientists, media houses
collaborating independent
survey agencies.
● This involves questionnaires
and interviews.
● People will be aware of what
the interviewer ask them.
4
Domain Knowledge: Indian General Elections
Prelude
Largest Democracy in
the World
Conducted once in 5
years since 1952
Paperless Ballot. Uses
Electronic Voting
Machine (EVM) since
2004
Process
Multiple Phases
Various states follow
different schedule to vote
● In 2019, India voted
in 7 phases i.e. 7
polling days over a
span of 50 days of
election code of
conduct.
Efficiency
Quick results
Most of the election
results are known by
noon of the counting day
5
Existing Methodologies
6
Pre - Poll
Survey I
Done around an year
or 6 months before
actual polls. Soon
after polls announced.
Pre - Poll
Survey II
After the poll alliances
are formed by various
national and regional
parties.
Pre - Poll
Survey III
Weeks before the
polling starts.
Exit Polls
Survey by interview on
day of polling outside
of polling booths.
Actual Poll
The actual poll
according to the
Election Commision of
India.
7
Social Media
based analysis
Twitter, Facebook, Youtube
feeds are analysed mostly
Twitter is predominant among
social media analysis of election
data. Many politicians are more
active in twitter than other social
media.
Eg. Narendra Modi, Donald Trump,
Barack Obama, Hillary Clinton,
Shinzo Abe
8
Previous Election Predictions using Twitter Data
India (2014) and Pakistan (2013)
Diffusion Centric Model was used to map
sentiments. Multi party environment.
Graphs reveal it was successful in India but
not close correlation in Pakistani context
though it is a single phase poll.
2016 US Presidential Elections
Supervised Learning methodology used
60000 tweets each for Donald Trump and
Hillary Clinton in Dual Party System.
Quensland (Australia) State Election - 2012
Predicted a general trend on who has more popularity. Dual
Party System.
Germany - 2011
Dual Party System. Done on twitter mentions. Not
sentiment of text.
Sweden - 2012
Twitter user behaviour like likes and retweet counts taken.
UK 2017 : Dual Party System. SVM and CNN used but
single phase poll. 9
Problem
Definition
Taken before Implementation
A twitter sentiment based outcome
prediction good enough to
1. Multi - Party, Multi - Alliance
System
2. Measure possible seats to be
won which is good to track
majority
3. Tracks over days of election
phases
10
Implementation
11
Data
Collection
Done for 50 days
before the last polling
day
Data
Cleaning
Identifying the
Relevant tweets
Preprocessing
Tokenization of
tweets, POS tagged,
SVO structure
extracted
Classifier
Decision Tree
Analytics
Scores aggregated
over subsets of
popularity
12
Architecture
Data
Collection
Done for 50 days
before the last polling
day. Tweepy and
Pymongo libraries
used.
Data
Cleaning
English tweets alone are selected
Repeated content is omitted. Image and
video tweets pruned.
Regular Expressions of emojis are removed
Preprocessing Classifier Analytics
13
JSON Format of
tweets is
supported by
MongoDB
Architecture
Data
Collection
Data
Cleaning
Preprocessing
Tokenization of
tweets, POS tagged,
SVO structure
extracted
Classifier Analytics
14
Architecture
Data
Collection
Data
Cleaning
Preprocessing
Decision Tree is used.
Polarity scores of
tweets are aggregated
based on subjectivity.
Classifier Analytics
15
Architecture
Data
Collection
Data
Cleaning
Preprocessing
Daily insights
Phase-wise insights
Trend from beginning till every
phase
Classifier Analytics
16
Architecture
Results
17
310 - 233
April 2019 - Jan Ki
Bath (Ruling Party
Alliance Vs Opposition
Alliance)
275 - 268
April 2019 - India TV &
CNX (Ruling Party
Alliance Vs Opposition
Alliance)
279 - 264
April 2019 - Times
Now & VMR (Ruling
Party Alliance Vs
Opposition Alliance)
293 - 249
20 May 2019 -
Proposed Twitter
Mining (Ruling Party
Alone Vs Others)
303 - 239
23 May 2019 - Actual
Result (Ruling Party
Alone Vs Others)
18
Party with support of
272+ seats can form
government
Ruling party Trend over 7 phases of poll
19
Phase wise
Difference
Phase 1 - 7
max growth
20
Results
Phase - wise
projection
max growth
21
Conclusion
22
Today and Road
to Future….
Better accuracy in predicting
outcome on the basis of seats
won by parties
Need to mine non - English tweets
Need to look into location specific
which can help predict state
assembly elections.
23
References
[1] J. Burgess and A. Bruns, “(Not) the Twitter election: the dynamics of the#
ausvotes conversation in relation to the Australian media ecology,” Journal. Pract.,
vol. 6, no. 3, pp. 384–402, 2012.
[2] V. Kagan, A. Stevens, and V. S. Subrahmanian, “Using twitter sentiment to
forecast the 2013 pakistani election and the 2014 indian election,” IEEE Intell.
Syst., vol. 30, no. 1, pp. 2–5, 2015.
[3] J. Ramteke, S. Shah, D. Godhia, and A. Shaikh, “Election result prediction using
Twitter sentiment analysis,” in 2016 international conference on inventive
computation technologies (ICICT), 2016, vol. 1, pp. 1–5.
[4] A. O. Larsson and H. Moe, “Studying political microblogging: Twitter users in the
2010 Swedish election campaign,” New Media Soc., vol. 14, no. 5, pp. 729–747,
2012.
[5] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election
classification,” Inf. Retr. J., vol. 21, no. 2–3, pp. 183–207, 2018.
[6] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Election forecasts
with Twitter: How 140 characters reflect the political landscape,” Soc. Sci. Comput.
Rev., vol. 29, no. 4, pp. 402–418, 2011.
[7] J. Roesslein, “tweepy Documentation,” Online] http://tweepy. readthedocs.
io/en/v3, vol. 5, 2009.
[8] A. Nayak, MongoDB Cookbook. Packt Publishing Ltd, 2014.
[9] F. J. John Joseph, R. T, and J. J. C, “Classification of correlated subspaces using
HoVer representation of Census Data,” in 2011 International Conference on
Emerging Trends in Electrical and Computer Technology, 2011, pp. 906–911.
[10] E. Loper and S. Bird, “NLTK: the natural language toolkit,” arXiv Prepr.
cs/0205028, 2002.
[11] S. Loria, “textblob Documentation,” 2018.
[12] T. N. Bureau, “Times Now-VMR Opinion Poll For Election 2019: PM Narendra
Modi-led NDA likely to get 279 seats, UPA 149,” Times Now News, New Delhi,
2019.
[13] I. T. News Desk, “Lok Sabha Election 2019: NDA may get thin majority with 275
seats, BJD may retain Odisha, YSR Congress may win Andhra, says India TV-CNX
pre-poll survey,” India TV, 2019.
[14] “2019 Indian General Election,” Wikipedia, 2019. [Online]. Available:
https://en.wikipedia.org/wiki/2019_Indian_general_election.
24
Thank You!
25

Twitter Based Outcome Predictions of 2019 Indian General Elections Using Decision Tree

  • 1.
    Twitter Based Outcome Predictionsof 2019 Indian General Elections Using Decision Tree Ferdin Joe John Joseph Thai-Nichi Institute of Technology
  • 2.
    Overview Problem Statement Election Sentiments IndianGeneral Elections Existing Methodologies Our Proposed Methodology Validation References 2
  • 3.
    Problem Statement Social Media Ahuge corpus of data which mostly reflects the mood and mentality of people connected to the network. Twitter A social media networking microbloggers across the globe ● Texts from this microblog site is relatively easier to access and has more information to mine. Problem statement Use Machine Learning to mine tweets related to elections and to predict the outcome. Strategy: Agile 3
  • 4.
    Election Sentiment Decoding Democracy….. ● Varioussurveys are done by political parties, social scientists, media houses collaborating independent survey agencies. ● This involves questionnaires and interviews. ● People will be aware of what the interviewer ask them. 4
  • 5.
    Domain Knowledge: IndianGeneral Elections Prelude Largest Democracy in the World Conducted once in 5 years since 1952 Paperless Ballot. Uses Electronic Voting Machine (EVM) since 2004 Process Multiple Phases Various states follow different schedule to vote ● In 2019, India voted in 7 phases i.e. 7 polling days over a span of 50 days of election code of conduct. Efficiency Quick results Most of the election results are known by noon of the counting day 5
  • 6.
  • 7.
    Pre - Poll SurveyI Done around an year or 6 months before actual polls. Soon after polls announced. Pre - Poll Survey II After the poll alliances are formed by various national and regional parties. Pre - Poll Survey III Weeks before the polling starts. Exit Polls Survey by interview on day of polling outside of polling booths. Actual Poll The actual poll according to the Election Commision of India. 7
  • 8.
    Social Media based analysis Twitter,Facebook, Youtube feeds are analysed mostly Twitter is predominant among social media analysis of election data. Many politicians are more active in twitter than other social media. Eg. Narendra Modi, Donald Trump, Barack Obama, Hillary Clinton, Shinzo Abe 8
  • 9.
    Previous Election Predictionsusing Twitter Data India (2014) and Pakistan (2013) Diffusion Centric Model was used to map sentiments. Multi party environment. Graphs reveal it was successful in India but not close correlation in Pakistani context though it is a single phase poll. 2016 US Presidential Elections Supervised Learning methodology used 60000 tweets each for Donald Trump and Hillary Clinton in Dual Party System. Quensland (Australia) State Election - 2012 Predicted a general trend on who has more popularity. Dual Party System. Germany - 2011 Dual Party System. Done on twitter mentions. Not sentiment of text. Sweden - 2012 Twitter user behaviour like likes and retweet counts taken. UK 2017 : Dual Party System. SVM and CNN used but single phase poll. 9
  • 10.
    Problem Definition Taken before Implementation Atwitter sentiment based outcome prediction good enough to 1. Multi - Party, Multi - Alliance System 2. Measure possible seats to be won which is good to track majority 3. Tracks over days of election phases 10
  • 11.
  • 12.
    Data Collection Done for 50days before the last polling day Data Cleaning Identifying the Relevant tweets Preprocessing Tokenization of tweets, POS tagged, SVO structure extracted Classifier Decision Tree Analytics Scores aggregated over subsets of popularity 12 Architecture
  • 13.
    Data Collection Done for 50days before the last polling day. Tweepy and Pymongo libraries used. Data Cleaning English tweets alone are selected Repeated content is omitted. Image and video tweets pruned. Regular Expressions of emojis are removed Preprocessing Classifier Analytics 13 JSON Format of tweets is supported by MongoDB Architecture
  • 14.
    Data Collection Data Cleaning Preprocessing Tokenization of tweets, POStagged, SVO structure extracted Classifier Analytics 14 Architecture
  • 15.
    Data Collection Data Cleaning Preprocessing Decision Tree isused. Polarity scores of tweets are aggregated based on subjectivity. Classifier Analytics 15 Architecture
  • 16.
    Data Collection Data Cleaning Preprocessing Daily insights Phase-wise insights Trendfrom beginning till every phase Classifier Analytics 16 Architecture
  • 17.
  • 18.
    310 - 233 April2019 - Jan Ki Bath (Ruling Party Alliance Vs Opposition Alliance) 275 - 268 April 2019 - India TV & CNX (Ruling Party Alliance Vs Opposition Alliance) 279 - 264 April 2019 - Times Now & VMR (Ruling Party Alliance Vs Opposition Alliance) 293 - 249 20 May 2019 - Proposed Twitter Mining (Ruling Party Alone Vs Others) 303 - 239 23 May 2019 - Actual Result (Ruling Party Alone Vs Others) 18 Party with support of 272+ seats can form government
  • 19.
    Ruling party Trendover 7 phases of poll 19
  • 20.
  • 21.
  • 22.
  • 23.
    Today and Road toFuture…. Better accuracy in predicting outcome on the basis of seats won by parties Need to mine non - English tweets Need to look into location specific which can help predict state assembly elections. 23
  • 24.
    References [1] J. Burgessand A. Bruns, “(Not) the Twitter election: the dynamics of the# ausvotes conversation in relation to the Australian media ecology,” Journal. Pract., vol. 6, no. 3, pp. 384–402, 2012. [2] V. Kagan, A. Stevens, and V. S. Subrahmanian, “Using twitter sentiment to forecast the 2013 pakistani election and the 2014 indian election,” IEEE Intell. Syst., vol. 30, no. 1, pp. 2–5, 2015. [3] J. Ramteke, S. Shah, D. Godhia, and A. Shaikh, “Election result prediction using Twitter sentiment analysis,” in 2016 international conference on inventive computation technologies (ICICT), 2016, vol. 1, pp. 1–5. [4] A. O. Larsson and H. Moe, “Studying political microblogging: Twitter users in the 2010 Swedish election campaign,” New Media Soc., vol. 14, no. 5, pp. 729–747, 2012. [5] X. Yang, C. Macdonald, and I. Ounis, “Using word embeddings in twitter election classification,” Inf. Retr. J., vol. 21, no. 2–3, pp. 183–207, 2018. [6] A. Tumasjan, T. O. Sprenger, P. G. Sandner, and I. M. Welpe, “Election forecasts with Twitter: How 140 characters reflect the political landscape,” Soc. Sci. Comput. Rev., vol. 29, no. 4, pp. 402–418, 2011. [7] J. Roesslein, “tweepy Documentation,” Online] http://tweepy. readthedocs. io/en/v3, vol. 5, 2009. [8] A. Nayak, MongoDB Cookbook. Packt Publishing Ltd, 2014. [9] F. J. John Joseph, R. T, and J. J. C, “Classification of correlated subspaces using HoVer representation of Census Data,” in 2011 International Conference on Emerging Trends in Electrical and Computer Technology, 2011, pp. 906–911. [10] E. Loper and S. Bird, “NLTK: the natural language toolkit,” arXiv Prepr. cs/0205028, 2002. [11] S. Loria, “textblob Documentation,” 2018. [12] T. N. Bureau, “Times Now-VMR Opinion Poll For Election 2019: PM Narendra Modi-led NDA likely to get 279 seats, UPA 149,” Times Now News, New Delhi, 2019. [13] I. T. News Desk, “Lok Sabha Election 2019: NDA may get thin majority with 275 seats, BJD may retain Odisha, YSR Congress may win Andhra, says India TV-CNX pre-poll survey,” India TV, 2019. [14] “2019 Indian General Election,” Wikipedia, 2019. [Online]. Available: https://en.wikipedia.org/wiki/2019_Indian_general_election. 24
  • 25.