SlideShare a Scribd company logo
1 of 60
FRENCH ELECTION 2017
TWITTER DATA ANALYSIS
French Election 2017 – Twitter Data Analysis
MS Engineering
in Computer Science
Data Mining
2016/2017
French Election 2017 – Twitter Data Analysis
THIS IS THE STORY!
• The 2017 French presidential election was held on 23 April and
7 May 2017.
• As no candidate won a majority in the first round, a run-off was
held between the top two candidates, Emmanuel Macron of
“En Marche!” and Marine Le Pen of the “National Front (FN)”,
which Macron won by a decisive margin on 7 May.
French Election 2017 – Twitter Data Analysis
OUTLINE
1.Presentation of Data
1.Tableau
1.Clustering Analysis
1.GDELT PROJECT
1. PRESENTATION OF DATA
LET’S INTRODUCE DATA
French Election 2017 – Twitter Data Analysis
SOME NUMBERS, ...
• 7 dataset
•Excel files:
•English Tweets
•French Tweets
•Events
•French demography
•1 round Election Results
•Foreign Votes
•2 round Election Results
• #ENTweets = ~1.048.575
• #ENUsers = ~303.485
• #FRTweets = ~10.000.000
• #FRUsers = ~953.793
• #Period:
•03/04/2017 - 05/05/2017
•03/04/2017 – 07/05/2017
French Election 2017 – Twitter Data Analysis
TWITTER DATASET
• Obtained from kaggle website in .sqlite format
• Standard fields from twitter datasets extracted with stream api.
• Candidates mentions
• Load, preprocess and enrich the dataset
• Pass hour from millisecond to %D %H:%M format
• Map Location parameter to City-Country using GeoText
• Sentiment analysis using vaderSentiment package python
2.TABLEAU
VISUAL ANALYSIS
LET’S START WITH TABLEAU!
• Produces interactive data visualization products focused
on business intelligence
• Version: Tableau Desktop
• Data storage: locally, using Exctract
• Connect with Excel files
French Election 2017 – Twitter Data Analysis
French Election 2017 – Twitter Data Analysis
WHO, WHEN AND WHERE
PEOPLE TWEETED?
French Election 2017 – Twitter Data Analysis
TREND ENGLISH
TWEETS AND RETWEETS
French Election 2017 – Twitter Data Analysis
TREND FRENCH
TWEETS AND RETWEETS
French Election 2017 – Twitter Data Analysis
WHAT ARE
USER SENTIMENTS?
French Election 2017 – Twitter Data Analysis
TREND ENGLISH SENTIMENTS
TREND FRENCH SENTIMENTS
French Election 2017 – Twitter Data Analysis
French Election 2017 – Twitter Data Analysis
HOW ARE
TWEETS AND SENTIMENTS
TIMELINES?
French Election 2017 – Twitter Data Analysis
ENGLISH TWEETS VS SENTIMENTS
French Election 2017 – Twitter Data Analysis
FRENCH TWEETS VS SENTIMENTS
French Election 2017 – Twitter Data Analysis
POPULARITY
HOW WAS
1ST ROUND ELECTIONS?
French Election 2017 – Twitter Data Analysis
FOREIGN VOTES VS ENGLISH
TWEETS POPULARITY
French Election 2017 – Twitter Data Analysis
FRENCH VOTES VS ENGLISH
TWEETS POPULARITY
French Election 2017 – Twitter Data Analysis
FOREIGN VOTES VS FRENCH
TWEETS POPULARITY
French Election 2017 – Twitter Data Analysis
FRENCH VOTES VS FRENCH TWEETS
POPULARITY
French Election 2017 – Twitter Data Analysis
POPULARITY
HOW WAS
2ST ROUND ELECTIONS?
French Election 2017 – Twitter Data AnalysisFrench Election 2017 – Twitter Data Analysis
FRENCH VOTES VS ENGLISH
TWEETS POPULARITY
French Election 2017 – Twitter Data Analysis
FRENCH VOTES VS FRENCH TWEETS
POPULARITY
3.CLUSTERING
WORKING WITH DATA
French Election 2017 – Twitter Data Analysis
CLUSTER ANALYSIS
• Main objective to group objects
• Document Cluster
• Sklearn and nltk packages
• Filter retweets (235021 tweets)
French Election 2017 – Twitter Data Analysis
FIRST STEPS
K-Means
• Create a TF-IDF
• Term frequency
• Inverse Document freq
• Run multiple K-Means with different K ∈ [2,8]
French Election 2017 – Twitter Data Analysis
French Election 2017 – Twitter Data Analysis
K-Means
Num_cluster 3 Data Distribution
Cluster0 -> 168032
Cluster1 -> 30611
Cluster2 -> 36378
French Election 2017 – Twitter Data Analysis
K-Means
Num_cluster 6 Data Distribution
Cluster0 -> 89636
Cluster1 -> 34058
Cluster2 -> 25070
Cluster3 -> 29517
Cluster4 -> 45524
Cluster5 -> 11216
33
K-Means
• Time execution
• 1:30-2 min per run
• There is one dominant cluster
• Repeated terms: macron, lepen, france
• Sparse data (According to cosine similarity)
• Few information about the cluster
French Election 2017 – Twitter Data Analysis
34
LET’S TRY
ANOTHER APROACH
French Election 2017 – Twitter Data Analysis
French Election 2017 – Twitter Data Analysis
TOPIC MODELLING
• Discover the abstract topic from documents
• Two different approaches
• Latent Semantic Analysis
• Non-Negative Matrix factorization
French Election 2017 – Twitter Data Analysis
Latent Semantic Analysis
• Analyzing relationships between documents
• SVD technique for reducing the space
• TruncatedSVD
French Election 2017 – Twitter Data Analysis
LSA
Num_cluster 2 Data Distribution
Cluster0 -> 135344
Cluster1 -> 99677
3838
Num_cluster 4 Data Distribution
Cluster0 -> 39063
Cluster1 -> 36906
Cluster2 -> 124231
Cluster3 -> 34821
LSA
French Election 2017 – Twitter Data Analysis
3939French Election 2017 – Twitter Data Analysis
LSA
Num_cluster 5 Data Distribution
Cluster0 -> 37545
Cluster1 -> 36851
Cluster2 -> 75845
Cluster3 -> 34046
Cluster4 -> 50734
40
LSA
• Reduce time execution
• 20 secs aprox per run
• Maintain one big cluster for n_cluster < 5
• Data is more concentrated and distributed
• Still have few information about the clusters
French Election 2017 – Twitter Data Analysis
French Election 2017 – Twitter Data Analysis
Non-Negative Matrix
Factorization
• Decompose in two matrix with k topics
• V -> term-document matrix
• W -> term-topic matrix
• H -> topic-document matrix
French Election 2017 – Twitter Data Analysis
Non-Negative Matrix
Factorization
Num_cluster 3 Data Distribution
Cluster0 -> 87637
Cluster1 -> 74663
Cluster2 -> 72721
43French Election 2017 – Twitter Data Analysis
Non-Negative Matrix
Factorization
Num_cluster 4 Data Distribution
Cluster0 -> 67791
Cluster1 -> 66108
Cluster2 -> 59462
Cluster3 -> 41660
French Election 2017 – Twitter Data Analysis
Non-Negative Matrix
Factorization
Num_cluster 6 Data Distribution
Cluster0 -> 59772
Cluster1 -> 57190
Cluster2 -> 50852
Cluster3 -> 37733
Cluster4 -> 16251
Cluster5 -> 13223
French Election 2017 – Twitter Data Analysis
NMF TOPICS
• Information about the topics
• Top words
• Not all the terms are repeated
• Terms are more distributed
French Election 2017 – Twitter Data Analysis
Non-Negative Matrix
Factorization - KMeans
• Reduce time execution
• Less than 10 secs
• Data is distributed among the clusters
• More information about the topics and clustering
• Some sparse data
4.GDELT PROJECT
WORKING WITH DATA
French Election 2017 – Twitter Data Analysis
GDELT PROJECT
• GDELT monitors the world's news media from nearly
every corner of every country
in print, broadcast, and web formats, in over 100
languages, every moment of every day
• Uses natural language and data mining algorithms
French Election 2017 – Twitter Data Analysis
GDELT DATASET
• Monitored data from 1979 to nowadays
• New information added each 15 minutes
• Total set divided by dates
• 57 fields (date,actors,actions,events,location,etc)
French Election 2017 – Twitter Data Analysis
APACHE FLINK
• Apache Flink® is an open-source stream processing
framework for distributed, high-performing, always-
available, and accurate data streaming applications
• Java
French Election 2017 – Twitter Data Analysis
APPROXIMATION
• Perform analysis of the French events during the elections
• Find some patterns between events and tweets
• Important fields: <date,actor,action,event,location>
French Election 2017 – Twitter Data Analysis
ANALYSIS
• Evolution of the events each day
• Trending leaders
• Trending events perday
French Election 2017 – Twitter Data Analysis
MapReduce
• Java API
• Multidimensional tuples
• MapReduce transformations
French Election 2017 – Twitter Data Analysis
EVOLUTION of EVENTS
• Nº of events perday during elections happened in France
French Election 2017 – Twitter Data Analysis
EVOLUTION of EVENTS
• Evolution nº of tweets perday
French Election 2017 – Twitter Data Analysis
COMPARISON
French Election 2017 – Twitter Data Analysis
TRENDING LEADERS
• Leaders with higher number of mentions
French Election 2017 – Twitter Data Analysis
TRENDING LEADERS
• 1000 entries -> special results in the first 30 registers
French Election 2017 – Twitter Data Analysis
GDELT Conclusion
• Politics themes happen same days of tweet peaks
• They are among the 30 most mentioned
• Big presence of politics themes in media could increment
the use of twitter
• François Fillon the unique candidate among 1000 most
mentioned leaders
French Election 2017 – Twitter Data Analysis
THANKYOU!
LINKS
• CODE:
•https://github.com/dieguer22/DataMiningProject

More Related Content

Similar to French Election 2017 - Twitter Data Analysis

Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Paolo Nesi
 

Similar to French Election 2017 - Twitter Data Analysis (20)

Big Data presentation for Statistics Canada
Big Data presentation for Statistics CanadaBig Data presentation for Statistics Canada
Big Data presentation for Statistics Canada
 
Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.Twitter as a data source for official statistics: first results.
Twitter as a data source for official statistics: first results.
 
Practical Tools Social Media For Consumer Insight (Guest Lecture)
Practical Tools Social Media For Consumer Insight (Guest Lecture) Practical Tools Social Media For Consumer Insight (Guest Lecture)
Practical Tools Social Media For Consumer Insight (Guest Lecture)
 
News-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networksNews-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networks
 
News-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networksNews-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networks
 
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterSocial Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
Big Data @ CBS
Big Data @ CBSBig Data @ CBS
Big Data @ CBS
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Semantic-based Process Analysis
Semantic-based Process AnalysisSemantic-based Process Analysis
Semantic-based Process Analysis
 
Social Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research CentreSocial Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research Centre
 
#Sm4events at #AFPFC 2016
#Sm4events at #AFPFC 2016#Sm4events at #AFPFC 2016
#Sm4events at #AFPFC 2016
 
Social Media Analytics Lecture
Social Media Analytics LectureSocial Media Analytics Lecture
Social Media Analytics Lecture
 
Social Media Analytics on Canadian Airlines
Social Media Analytics on Canadian AirlinesSocial Media Analytics on Canadian Airlines
Social Media Analytics on Canadian Airlines
 
[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning al...
[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning al...[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning al...
[DSC Europe 23] Alen Kisic - How can do Facebook data and machine learning al...
 
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analyt...
 
Km4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart CityKm4city: Open Urban Platform for a Sentient Smart City
Km4city: Open Urban Platform for a Sentient Smart City
 
New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.New data sources for statistics: Experiences at Statistics Netherlands.
New data sources for statistics: Experiences at Statistics Netherlands.
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of Twitter
 

Recently uploaded

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 

French Election 2017 - Twitter Data Analysis

  • 2. French Election 2017 – Twitter Data Analysis MS Engineering in Computer Science Data Mining 2016/2017
  • 3. French Election 2017 – Twitter Data Analysis THIS IS THE STORY! • The 2017 French presidential election was held on 23 April and 7 May 2017. • As no candidate won a majority in the first round, a run-off was held between the top two candidates, Emmanuel Macron of “En Marche!” and Marine Le Pen of the “National Front (FN)”, which Macron won by a decisive margin on 7 May.
  • 4. French Election 2017 – Twitter Data Analysis OUTLINE 1.Presentation of Data 1.Tableau 1.Clustering Analysis 1.GDELT PROJECT
  • 5. 1. PRESENTATION OF DATA LET’S INTRODUCE DATA
  • 6. French Election 2017 – Twitter Data Analysis SOME NUMBERS, ... • 7 dataset •Excel files: •English Tweets •French Tweets •Events •French demography •1 round Election Results •Foreign Votes •2 round Election Results • #ENTweets = ~1.048.575 • #ENUsers = ~303.485 • #FRTweets = ~10.000.000 • #FRUsers = ~953.793 • #Period: •03/04/2017 - 05/05/2017 •03/04/2017 – 07/05/2017
  • 7. French Election 2017 – Twitter Data Analysis TWITTER DATASET • Obtained from kaggle website in .sqlite format • Standard fields from twitter datasets extracted with stream api. • Candidates mentions • Load, preprocess and enrich the dataset • Pass hour from millisecond to %D %H:%M format • Map Location parameter to City-Country using GeoText • Sentiment analysis using vaderSentiment package python
  • 9. LET’S START WITH TABLEAU! • Produces interactive data visualization products focused on business intelligence • Version: Tableau Desktop • Data storage: locally, using Exctract • Connect with Excel files French Election 2017 – Twitter Data Analysis
  • 10. French Election 2017 – Twitter Data Analysis WHO, WHEN AND WHERE PEOPLE TWEETED?
  • 11. French Election 2017 – Twitter Data Analysis TREND ENGLISH TWEETS AND RETWEETS
  • 12. French Election 2017 – Twitter Data Analysis TREND FRENCH TWEETS AND RETWEETS
  • 13. French Election 2017 – Twitter Data Analysis WHAT ARE USER SENTIMENTS?
  • 14. French Election 2017 – Twitter Data Analysis TREND ENGLISH SENTIMENTS
  • 15. TREND FRENCH SENTIMENTS French Election 2017 – Twitter Data Analysis
  • 16. French Election 2017 – Twitter Data Analysis HOW ARE TWEETS AND SENTIMENTS TIMELINES?
  • 17. French Election 2017 – Twitter Data Analysis ENGLISH TWEETS VS SENTIMENTS
  • 18. French Election 2017 – Twitter Data Analysis FRENCH TWEETS VS SENTIMENTS
  • 19. French Election 2017 – Twitter Data Analysis POPULARITY HOW WAS 1ST ROUND ELECTIONS?
  • 20. French Election 2017 – Twitter Data Analysis FOREIGN VOTES VS ENGLISH TWEETS POPULARITY
  • 21. French Election 2017 – Twitter Data Analysis FRENCH VOTES VS ENGLISH TWEETS POPULARITY
  • 22. French Election 2017 – Twitter Data Analysis FOREIGN VOTES VS FRENCH TWEETS POPULARITY
  • 23. French Election 2017 – Twitter Data Analysis FRENCH VOTES VS FRENCH TWEETS POPULARITY
  • 24. French Election 2017 – Twitter Data Analysis POPULARITY HOW WAS 2ST ROUND ELECTIONS?
  • 25. French Election 2017 – Twitter Data AnalysisFrench Election 2017 – Twitter Data Analysis FRENCH VOTES VS ENGLISH TWEETS POPULARITY
  • 26. French Election 2017 – Twitter Data Analysis FRENCH VOTES VS FRENCH TWEETS POPULARITY
  • 28. French Election 2017 – Twitter Data Analysis CLUSTER ANALYSIS • Main objective to group objects • Document Cluster • Sklearn and nltk packages • Filter retweets (235021 tweets)
  • 29. French Election 2017 – Twitter Data Analysis FIRST STEPS
  • 30. K-Means • Create a TF-IDF • Term frequency • Inverse Document freq • Run multiple K-Means with different K ∈ [2,8] French Election 2017 – Twitter Data Analysis
  • 31. French Election 2017 – Twitter Data Analysis K-Means Num_cluster 3 Data Distribution Cluster0 -> 168032 Cluster1 -> 30611 Cluster2 -> 36378
  • 32. French Election 2017 – Twitter Data Analysis K-Means Num_cluster 6 Data Distribution Cluster0 -> 89636 Cluster1 -> 34058 Cluster2 -> 25070 Cluster3 -> 29517 Cluster4 -> 45524 Cluster5 -> 11216
  • 33. 33 K-Means • Time execution • 1:30-2 min per run • There is one dominant cluster • Repeated terms: macron, lepen, france • Sparse data (According to cosine similarity) • Few information about the cluster French Election 2017 – Twitter Data Analysis
  • 34. 34 LET’S TRY ANOTHER APROACH French Election 2017 – Twitter Data Analysis
  • 35. French Election 2017 – Twitter Data Analysis TOPIC MODELLING • Discover the abstract topic from documents • Two different approaches • Latent Semantic Analysis • Non-Negative Matrix factorization
  • 36. French Election 2017 – Twitter Data Analysis Latent Semantic Analysis • Analyzing relationships between documents • SVD technique for reducing the space • TruncatedSVD
  • 37. French Election 2017 – Twitter Data Analysis LSA Num_cluster 2 Data Distribution Cluster0 -> 135344 Cluster1 -> 99677
  • 38. 3838 Num_cluster 4 Data Distribution Cluster0 -> 39063 Cluster1 -> 36906 Cluster2 -> 124231 Cluster3 -> 34821 LSA French Election 2017 – Twitter Data Analysis
  • 39. 3939French Election 2017 – Twitter Data Analysis LSA Num_cluster 5 Data Distribution Cluster0 -> 37545 Cluster1 -> 36851 Cluster2 -> 75845 Cluster3 -> 34046 Cluster4 -> 50734
  • 40. 40 LSA • Reduce time execution • 20 secs aprox per run • Maintain one big cluster for n_cluster < 5 • Data is more concentrated and distributed • Still have few information about the clusters French Election 2017 – Twitter Data Analysis
  • 41. French Election 2017 – Twitter Data Analysis Non-Negative Matrix Factorization • Decompose in two matrix with k topics • V -> term-document matrix • W -> term-topic matrix • H -> topic-document matrix
  • 42. French Election 2017 – Twitter Data Analysis Non-Negative Matrix Factorization Num_cluster 3 Data Distribution Cluster0 -> 87637 Cluster1 -> 74663 Cluster2 -> 72721
  • 43. 43French Election 2017 – Twitter Data Analysis Non-Negative Matrix Factorization Num_cluster 4 Data Distribution Cluster0 -> 67791 Cluster1 -> 66108 Cluster2 -> 59462 Cluster3 -> 41660
  • 44. French Election 2017 – Twitter Data Analysis Non-Negative Matrix Factorization Num_cluster 6 Data Distribution Cluster0 -> 59772 Cluster1 -> 57190 Cluster2 -> 50852 Cluster3 -> 37733 Cluster4 -> 16251 Cluster5 -> 13223
  • 45. French Election 2017 – Twitter Data Analysis NMF TOPICS • Information about the topics • Top words • Not all the terms are repeated • Terms are more distributed
  • 46. French Election 2017 – Twitter Data Analysis Non-Negative Matrix Factorization - KMeans • Reduce time execution • Less than 10 secs • Data is distributed among the clusters • More information about the topics and clustering • Some sparse data
  • 48. French Election 2017 – Twitter Data Analysis GDELT PROJECT • GDELT monitors the world's news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages, every moment of every day • Uses natural language and data mining algorithms
  • 49. French Election 2017 – Twitter Data Analysis GDELT DATASET • Monitored data from 1979 to nowadays • New information added each 15 minutes • Total set divided by dates • 57 fields (date,actors,actions,events,location,etc)
  • 50. French Election 2017 – Twitter Data Analysis APACHE FLINK • Apache Flink® is an open-source stream processing framework for distributed, high-performing, always- available, and accurate data streaming applications • Java
  • 51. French Election 2017 – Twitter Data Analysis APPROXIMATION • Perform analysis of the French events during the elections • Find some patterns between events and tweets • Important fields: <date,actor,action,event,location>
  • 52. French Election 2017 – Twitter Data Analysis ANALYSIS • Evolution of the events each day • Trending leaders • Trending events perday
  • 53. French Election 2017 – Twitter Data Analysis MapReduce • Java API • Multidimensional tuples • MapReduce transformations
  • 54. French Election 2017 – Twitter Data Analysis EVOLUTION of EVENTS • Nº of events perday during elections happened in France
  • 55. French Election 2017 – Twitter Data Analysis EVOLUTION of EVENTS • Evolution nº of tweets perday
  • 56. French Election 2017 – Twitter Data Analysis COMPARISON
  • 57. French Election 2017 – Twitter Data Analysis TRENDING LEADERS • Leaders with higher number of mentions
  • 58. French Election 2017 – Twitter Data Analysis TRENDING LEADERS • 1000 entries -> special results in the first 30 registers
  • 59. French Election 2017 – Twitter Data Analysis GDELT Conclusion • Politics themes happen same days of tweet peaks • They are among the 30 most mentioned • Big presence of politics themes in media could increment the use of twitter • François Fillon the unique candidate among 1000 most mentioned leaders
  • 60. French Election 2017 – Twitter Data Analysis THANKYOU! LINKS • CODE: •https://github.com/dieguer22/DataMiningProject