General Online Research Conference
GOR 13, 4-6 March 2013
Baden-Wuerttemberg Cooperative State University Mannheim, Germany
Wlodzimierz Gogolek, Institute of Journalism, University of Warsaw
Pawel Kuczma, Institute of Journalism, University of Warsaw
Forecasting General Election Results in Poland
2011 on the basis of Social Media content
Contact: p.kuczma@id.uw.edu.pl
Forecasting General Election Results in Poland
2011 on the basis of Social Media content
Paweł Kuczma, Włodzimierz Gogołek
Institute of Journalism
University of Warsaw
Graphic source: http://jonnewman12.files.wordpress.com/2010/05/sm-crystal-ball.png
GOR 13, 4-6 March 2013
Baden-Wuerttemberg Cooperative State University Mannheim
Data Refining* Process
Results
Data from Social
Media & News Portals
Qualitative
Analysis
Quantitative
Analysis
Włodzimierz Gogołek, Refining network information (Rafinacja informacji sieciowej), [in:], Aleksander Jastriebow, Maria
Raczyńska, Informatyka w dobie XXI wieku, Nauka, Technika, Edukacja a nowoczesne technologie informatyczne, Radom 2011,
Politechnika Radomska, s. 229 - 238.
The purpose of this study:
The purpose of this study was to define factors allowing prediction of the outcome of
the general election in Poland in October 2011 basing on big data resources from
Social Media websites in pre-election period in, what we call, data refining process.
The research question was:
Is it possible to predict the action (cast a vote for a political party in general election)
on the basis of quantitative (number of content related to the subject of the
research) and qualitative (the contexts in which they appear and their emotional
values) content analysis on Social Media?
Study Details - Methodology
Study Details - Methodology
Type of content analysed:
 Social Media websites (such as social networking sites, forums, blogs and microblogs)
 News portals (websites with content written by professionals) for comparative analysis
Time of the study: Analysed content was published in pre-election period in 2011
(between March 1st and Octobrer 31st)
The following indicators of the content were examined :
 Quantitative assessment:
- Amount of the content about candidates,
- Trends / dynamics of changes in the amount of the content,;
 Qualitative assessment:
- Contexts analysis (finding topic content),
- Sentiment analysis (distinction between positive and negative content).
Quantitative assessment
Share of voice by different types of Social Media
websites
Most important sources were Blogs and Forums – 97% of content
Blogs
57.72%
Forums
39.92%
Facebook
0.31%
Twitter
2.05%
Election results
Source: State Electoral Comission
Party Vote Share
PO 39,18%
PiS 29,89%
Ruch Palikota 10,02%
SLD 8,24%
PSL 8,36%
PJN 2,19%
Parties’ visibility online, especially in Social Media
Two the most popular parties in Social Media
gained biggest users attention in this channel
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
Ruch Palikota PiS PJN PO PSL SLD
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Ruch Palikota PiS PJN PO PSL SLD
Blogs Forums
Two the most popular parties in Social Media
gained biggest users’ attention in both
channels along with SLD
Ruch Palikota,
2%
PiS, 22%
PJN, 6%
PO, 37%
PSL, 11%
SLD, 22%
Ruch Palikota,
4%
PiS, 23%
PJN, 9%
PO, 32%
PSL, 12%
SLD, 22%
Blogs Forums
Qualitative assessment
Social Media context od content
Edukacja, 907
Finanse, 1843 Gospodarka, 1195
Infrastruktura, 1710
Katastrofa, 2323
Kościół, 634
Kultura, 764
Media, , 4231
Nauka i Szkolnictwo
Wyższe, 662
Obrona, 137
Prawo, 5664
Reformy, 368
Sprawy Zagraniczne, 1804
Środowiska, 0 UE, 797
Władza, 0 Zdrowie,
810
Media contexts vs. Essential contexts
Media context
35%
Essential
context
65%
Media contexts – which appear strongly in media during the campaign period
Essential contexts – connected directly with powers of the government
Sentiment analysis in Social Media (forums)
Data gathered: 1.03-31.10.2011
0
50
100
150
200
250
300
350
SLD
PSL
PO
PJN
PiS
Palikot
0
50
100
150
200
250
300
350
400
450
SLD
PSL
PO
PJN
PiS
Palikot
Positive Negative
Data gathered: 1.03-31.10.2011
0
50
100
150
200
250
300
350
400
SLD PSL PO PJN PiS Palikot
0
50
100
150
200
250
300
350
400
450
SLD PSL PO PJN PiS Palikot
Positive Negative
Sentiment analysis in Social Media (blogs)
Negative content
Data gathered: 1.03-31.10.2011
-140
-120
-100
-80
-60
-40
-20
0
20
40
60
marzec kwiecień maj czerwiec lipiec sierpień wrzesień
PO PIS
Research Results
• This study confirms that content from Social Media is valuable source of
information reflecting political preferences of internet users, which finds
its expression in voting for candidates from certain political party during
the election.
• Thanks to that analysis there was possible to predict names of political
parties which dominated the parliament after the election and predicted
that a new party would get to the parliament for the first time. Results
show that the research hypothesis was generally proven.
• This research confirmed a method supporting the diagnosis of the
condition and dynamics of changes of parties. Therefore it can be used to
influence democratic processes with the use of Social Media.
Next Steps
Next step are:
• Importance of content generated by Social Media users leads its way to
even deeper analysis an research including advancement in sentiment
analysis
• Deeper social network analysis (who are the users, how they behave
online, what are factors that influence them the most)
• Extention of the method for different fields (stock exchange, complex
social processes)
Thank you
Danke Schön
Paweł Kuczma, Włodzimierz Gogołek
p.kuczma@id.uw.edu.pl

Forecasting General Election Results in Poland 2011 on the Basis of Social Media Content

  • 1.
    General Online ResearchConference GOR 13, 4-6 March 2013 Baden-Wuerttemberg Cooperative State University Mannheim, Germany Wlodzimierz Gogolek, Institute of Journalism, University of Warsaw Pawel Kuczma, Institute of Journalism, University of Warsaw Forecasting General Election Results in Poland 2011 on the basis of Social Media content Contact: p.kuczma@id.uw.edu.pl
  • 2.
    Forecasting General ElectionResults in Poland 2011 on the basis of Social Media content Paweł Kuczma, Włodzimierz Gogołek Institute of Journalism University of Warsaw Graphic source: http://jonnewman12.files.wordpress.com/2010/05/sm-crystal-ball.png GOR 13, 4-6 March 2013 Baden-Wuerttemberg Cooperative State University Mannheim
  • 3.
    Data Refining* Process Results Datafrom Social Media & News Portals Qualitative Analysis Quantitative Analysis Włodzimierz Gogołek, Refining network information (Rafinacja informacji sieciowej), [in:], Aleksander Jastriebow, Maria Raczyńska, Informatyka w dobie XXI wieku, Nauka, Technika, Edukacja a nowoczesne technologie informatyczne, Radom 2011, Politechnika Radomska, s. 229 - 238.
  • 4.
    The purpose ofthis study: The purpose of this study was to define factors allowing prediction of the outcome of the general election in Poland in October 2011 basing on big data resources from Social Media websites in pre-election period in, what we call, data refining process. The research question was: Is it possible to predict the action (cast a vote for a political party in general election) on the basis of quantitative (number of content related to the subject of the research) and qualitative (the contexts in which they appear and their emotional values) content analysis on Social Media? Study Details - Methodology
  • 5.
    Study Details -Methodology Type of content analysed:  Social Media websites (such as social networking sites, forums, blogs and microblogs)  News portals (websites with content written by professionals) for comparative analysis Time of the study: Analysed content was published in pre-election period in 2011 (between March 1st and Octobrer 31st) The following indicators of the content were examined :  Quantitative assessment: - Amount of the content about candidates, - Trends / dynamics of changes in the amount of the content,;  Qualitative assessment: - Contexts analysis (finding topic content), - Sentiment analysis (distinction between positive and negative content).
  • 6.
  • 7.
    Share of voiceby different types of Social Media websites Most important sources were Blogs and Forums – 97% of content Blogs 57.72% Forums 39.92% Facebook 0.31% Twitter 2.05%
  • 8.
    Election results Source: StateElectoral Comission Party Vote Share PO 39,18% PiS 29,89% Ruch Palikota 10,02% SLD 8,24% PSL 8,36% PJN 2,19%
  • 9.
    Parties’ visibility online,especially in Social Media
  • 10.
    Two the mostpopular parties in Social Media gained biggest users attention in this channel 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 Ruch Palikota PiS PJN PO PSL SLD 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Ruch Palikota PiS PJN PO PSL SLD Blogs Forums
  • 11.
    Two the mostpopular parties in Social Media gained biggest users’ attention in both channels along with SLD Ruch Palikota, 2% PiS, 22% PJN, 6% PO, 37% PSL, 11% SLD, 22% Ruch Palikota, 4% PiS, 23% PJN, 9% PO, 32% PSL, 12% SLD, 22% Blogs Forums
  • 12.
  • 13.
    Social Media contextod content Edukacja, 907 Finanse, 1843 Gospodarka, 1195 Infrastruktura, 1710 Katastrofa, 2323 Kościół, 634 Kultura, 764 Media, , 4231 Nauka i Szkolnictwo Wyższe, 662 Obrona, 137 Prawo, 5664 Reformy, 368 Sprawy Zagraniczne, 1804 Środowiska, 0 UE, 797 Władza, 0 Zdrowie, 810
  • 14.
    Media contexts vs.Essential contexts Media context 35% Essential context 65% Media contexts – which appear strongly in media during the campaign period Essential contexts – connected directly with powers of the government
  • 15.
    Sentiment analysis inSocial Media (forums) Data gathered: 1.03-31.10.2011 0 50 100 150 200 250 300 350 SLD PSL PO PJN PiS Palikot 0 50 100 150 200 250 300 350 400 450 SLD PSL PO PJN PiS Palikot Positive Negative
  • 16.
    Data gathered: 1.03-31.10.2011 0 50 100 150 200 250 300 350 400 SLDPSL PO PJN PiS Palikot 0 50 100 150 200 250 300 350 400 450 SLD PSL PO PJN PiS Palikot Positive Negative Sentiment analysis in Social Media (blogs)
  • 17.
    Negative content Data gathered:1.03-31.10.2011 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 marzec kwiecień maj czerwiec lipiec sierpień wrzesień PO PIS
  • 18.
    Research Results • Thisstudy confirms that content from Social Media is valuable source of information reflecting political preferences of internet users, which finds its expression in voting for candidates from certain political party during the election. • Thanks to that analysis there was possible to predict names of political parties which dominated the parliament after the election and predicted that a new party would get to the parliament for the first time. Results show that the research hypothesis was generally proven. • This research confirmed a method supporting the diagnosis of the condition and dynamics of changes of parties. Therefore it can be used to influence democratic processes with the use of Social Media.
  • 19.
    Next Steps Next stepare: • Importance of content generated by Social Media users leads its way to even deeper analysis an research including advancement in sentiment analysis • Deeper social network analysis (who are the users, how they behave online, what are factors that influence them the most) • Extention of the method for different fields (stock exchange, complex social processes)
  • 20.
    Thank you Danke Schön PawełKuczma, Włodzimierz Gogołek p.kuczma@id.uw.edu.pl