This talk explores the application of machine learning algorithms on the social network Facebook users' activity data to develop predictive models of election outcomes. The main goal is to compare their accuracy and reliability with the models obtained by traditional public opinion polls. Four different approaches to machine learning are used to develop predictive models: an approach based on: error, information, similarity, and probability. The most effective approach is revealed. Obtaining equally effective predictive models with faster and easier access to data makes a significant scientific and social contribution to research in this area.
[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning algorithms predict political elections?
1. How can Facebook data and
machine learning algorithms
predict political elections?
2. Introduction
◉ Political elections are the daily routine of every
democratic society and most European
countries have an average of one election
every year (local, parliamentary, presidential, EU
parliament ) .
◉ The eternal question: predicting the results.
○ everyone is interested: citizens, political actors who,
create strategies and activities .
2
3. Introduction
◉ Development of social networks: powerful
tool for communication with voters!
○ Can we develop predictive model based on
candidates activity on the social network Facebook
that is comparable to traditional models based on
polls?
3
4. Related work
◉ Review of the literature identified lack of
previous research:
◉ a model for measuring the effectiveness of political campaigns on
social networks has not been developed: variables that are suitable for
evaluation in one specific domain are not suitable for evaluation in
another domain,
◉ in most research, data was collected only on Twitter ,
◉ most of the previous works do not have a component of predictive
analytics,
◉ no models have been developed that would determine to what extent
activity on the social network Facebook contributes to predicting the
outcome of the election. 4
5. Related work
◉ Review of the literature identified lack of
previous research:
◉ Various models are based on the number of friends or followers on
social networks: the conclusions are generalized based on too few
variables that do not include the interaction between the candidate
and the potential voter.
◉ Most models are based on sentiment analysis - not sufficient to
determine the impact on the outcome of the election.
◉ Current methods are not sufficient - necessity of machine learning.
◉ Many models are not empirically confirmed.
5
6. Research objectives
◉ To determine the predictive power of models based on the
social network Facebook and compare it with other types of
research .
◉ To determine which variables are the most significant
predictors of the outcome of the election .
◉ To determine the significance of the temporal data component
of the Facebook social network .
◉ To argue which of the four machine learning methods
provides the most accurate predictive models of local election
outcomes.
6
9. Research methodology –
CRISP DM
9
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Research goals:
- determine the predictive power of
election outcome models based on
the social network Facebook and
compare it with other types of
research,
- determine which variables are the
most significant predictors of the
outcome of the election.
10. Research methodology –
CRISP DM
10
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
France local elections: all cities
with more than 100 000
habitants (41 city),
225 candidates – examination
and downloading data from their
Facebook pages
Number of events, photos, links,
videos, statuses – their
comments, shares and likes.
11. Research methodology –
CRISP DM
11
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
25 variables included:
- 24 input variables: the candidate's
activity and the reactions of the page
followers to the candidate's activity,
two variables related to the gender
of the candidate and the affiliation of
the candidate to a political party.
- one output variable: the result in
the elections measured by the
percentage of votes of the
candidates in the elections.
12. Research methodology –
CRISP DM
12
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Why Facebook?
Chen and Chang (2017), claim that
political campaigns are increasingly
"fleeing" to Facebook - social network
with the largest number of users.
Facebook is the most used source of
political news for Millennials and
Generation X (ages 18 to 51) (Pew
Research Center, 2015).
France - 70% of the population of
those who use social networks is on
the Facebook (Chaffey, 2019).
Singh and colleagues (2020), who
conducted the research only on
Twitter, stated the need to include
other social networks.
13. Research methodology –
CRISP DM
13
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Attributes taken into account are:
city and turnout in the city, party, total
number of page likes, number of photos,
number of statuses (text), number of
links, number of videos, number of
events created, number of photo shares,
number of status shares, number of link
shares, number of video shares, number
of likes photo, number of status likes,
number of link likes, number of video
likes, number of photo comments,
number of status comments, number of
link comments, number of video
comments, publication time.
Transformation into relative sizes with
respect to the number of voters.
14. Research methodology –
CRISP DM
14
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Data description through descriptive
statistics .
Most variables have an exponential
distribution of values:
- characterized by a high probability of
occurrence of smaller values, and a low
probability of occurrence of large values.
- values of the arithmetic mean are higher
than the values of the median.
15. Research methodology –
CRISP DM
15
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Correlation analysis
- the highest correlation between the
number of event sharing and number of
event likes ,
- correlation is positive, which indicates
that with the increase in event likes,
the number of event shares also
increases, and vice versa .
- variable Result - has the highest linear
relationship with the variable Total
number of page likes
16. Research methodology –
CRISP DM
16
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Outliers
- ML algorithms are sensitive to extreme
values.
- Interquartiles were used to identify
outliers.
- Missing values
- Imputation was performed by inserting
data instead of missing values .
- missing values were replaced by the
mean values of the variables.
17. Research methodology –
CRISP DM
17
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Data normalization
- data were normalized with respect to
the number of voters in each city. The
original value of each variable is
divided by the number of voters in the
city to which it refers.
- Min- max normalization - all variables
to a scale from 0 to 1.
18. Research methodology –
CRISP DM
18
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
- Predictive models were developed
using four machine learning
algorithms:
- optimization of hyperparameters was
carried out of each individual
algorithm in order to prevent
overtraining of the model, and to
obtain high-quality, reliable and
accurate predictive models.
19. Research methodology –
CRISP DM
19
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Four types of machine learning
algorithms
- Machine learning based on
information (Classification and
regression trees).
- Machine learning based on similarity
(k-nearest neighbors),
- Machine learning based on probability
(Bayesian networks),
- Error-based machine learning (artificial
neural networks).
22. Model comparison
◉ In order to compare the predictive models obtained by
machine learning algorithms and the results of pre-election
polls, data provided by the French Institute for Public Opinion
Research, IFOP, was used.
◉ The data refer to 6 cities: Paris, Lyon , Marseille , Rennes ,
Nantes and Bordeaux , thus including data for a total of 44
candidates in the elections.
◉ There is difference in error of predictive models obtained by
machine learning algorithms and polling!
22
23. Research methodology –
CRISP DM
23
EVALUATION
DEPLOYMENT
PROBLEM UNDERSTANDING
DATA PREPARATION
DATA UNDERSTANDING
MODELLING
Determination of the most significant
predictors of election outcomes based on
the obtained predictive models.
Sensitivity analysis of each predictive
model was performed.
24. Prediction of results
N kNN rank ANN rank Bayes rank
Decision
tree rank
Average
rank
Overall page
likes
1 1 1 1 1
Number of
events
6 7 3 4 1
Number of
photos
5 8 5 9 6
Number of
links
3 9 7 7 6.5
Number of
statuses
4 3 9 3 4.75
24
25. Prediction of results
N kNN rank ANN rank Bayes rank
Decision
tree rank
Average
rank
Number of
photo likes
8 6 8 6 7
Number of
link likes
2 2 2 2 2
Number of
status likes
7 5 6 10 7
Gender 9 10 10 8 9.25
Political
party
10 4 4 5 5.75
25
26. Results discussion
◉ Stability and consistency of the models.
◉ The total number of likes on the candidate's page is the most
significant predictor in all models.
◉ The number of link likes is the second most significant
predictor also in all four models.
◉ The number of statuses is the third strongest predictor of
election results in two of the four observed models.
26
27. Results discussion
◉ The best quality models are obtained by applying the artificial
neural network algorithm.
◉ Explanation lies in the data types : dummy variables are
mostly numeric continuous attributes .
◉ The models with the lowest reliability and accuracy values
were obtained by the Naive Bayesian classifier
◉ Naive Bayesian classifier works with categorical variables, so it
is necessary to transform the continuous output into a
categorical one .
27
28. Results discussion
◉ Social media as an indicator for predicting election outcomes
◉ the absolute number of Facebook followers is a very good
predictor of election outcomes
◉ the content that the candidates place via social media, as well
as the reactions to the specific content that they share, is to a
certain extent an indicator of the outcome of the election
◉ The results of the research lead to a better understanding of
the way in which social networks present the views of voters,
and indicate how the electorate can be influenced through
social networks.
28
29. Guidelines
◉ How to run campaign on Facebook?
◉ Achieving visibility on Facebook - to get a large number of
page likes or followers (various methods of increasing
visibility), which generates a high place among average
Facebook users in the news feed and there is a greater
transmission of messages and an indirect influence on the
voters.
◉ To present content from ordinary life that gives greater
visibility than classic political messages that are transfer
through other communication channels
29
30. Research limitations
◉ Only one country, France, and one elections were included.
◉ Each country has its own specificities that should be taken into
account when conducting research.
◉ The opinion of the user base of one social network, Facebook,
is not representative of the entire population, but it contributes
to the phenomenon of explaining the results.
◉ The data refer to a very specific time, the beginning of March
2020 - the beginning of the spread of the COVID-19 virus and
the lockdown .
◉ Four machine learning algorithms were used - number of
algorithms left out.
30
31. Research contrubtions
◉ knowledge about models based on data from public opinion
polls and social networks is systematized
◉ predictive models of election outcomes based on social
network data were developed and evaluated and compared
with data obtained from surveys
◉ the predictive power of the variables of the social network
Facebook was established
◉ guidelines for using machine learning algorithms on social
media data are developed.
31
32. “
"The whole art of politics consists
in the rational management of
human irrationalities."
Karl Paul Reinhold Niebuhr
32