Introduction
Framework
Sentiment analysis
Case studies
Conclusions
A Descriptive Analysis of Twitter Activity Around
Bosto...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Summary
1 Introduction
Motivation
Objectives
Case studi...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Motivat...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Objecti...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Motivation
Objectives
Case studies
Introduction
Case st...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Framework overview
Framework messaging
Framework compon...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
Classifier
Sentiment analysis
Overview
Supervis...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Overview
Classifier
Sentiment analysis
Classifier
Naïve B...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Bost...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Bost...
Case study
Boston Terror attack: activity
Apr 17 Apr 19 Apr 21 Apr 23
010002500
Time
Tweets
Tweets
Apr 17 Apr 19 Apr 21 Ap...
Case study
Boston Terror attack: activity
Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00
50150
Time
Tweets
Tw...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Poli...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Poli...
Case study
Political analysis: Activity
Tue Wed Thu
015003500
Time
Tweets
Tweets
Tue Wed Thu
05001500
Time
Non−retweets
Tw...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Poli...
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Boston Terror Attack
Political analysis
Case study
Poli...
Case study
Political analysis: Normalized sentiment
Tue Wed Thu
0.00.20.40.60.81.0
Time
Positive
Introduction
Framework
Sentiment analysis
Case studies
Conclusions
Conclusions and future work
We developed a framework th...
Thanks for your attention!
David F. Barrero
david@aut.uah.es
@dfbarrero
Upcoming SlideShare
Loading in...5
×

Presentacion

247

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
247
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Presentacion

  1. 1. Introduction Framework Sentiment analysis Case studies Conclusions A Descriptive Analysis of Twitter Activity Around Boston Terror Attacks Álvaro Cuesta David F. Barrero María D. R-Moreno Computer Engineering Department Universidad de Alcalá, Spain ICCCI 2013 Craiova, Romania September 11, 2013 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 1 / 25
  2. 2. Introduction Framework Sentiment analysis Case studies Conclusions Summary 1 Introduction Motivation Objectives Case studies 2 Framework Framework overview Framework messaging Framework components 3 Sentiment analysis Overview Classifier 4 Case studies Boston Terror Attack Political analysis 5 Conclusions and future work ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 2 / 25
  3. 3. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Motivation Great expansion of social networks in the last years One of the most successfull ones is Twitter Microblogging platform Short messages known as tweets Open nature Twitter offers great research opportunities Open nature Distributed human sensor network Easy data extraction, difficult data processing Twitter + sentiment analysis Lack of tools for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 3 / 25
  4. 4. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Objectives Twitter offers excelent API ... however there is a need of some infraestructure (mainly storage and reporting) Objectives 1 Develop a framework for Twitter data extraction and analysis 2 Provide reporting tools 3 Foundation for sentiment analysis in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 4 / 25
  5. 5. Introduction Framework Sentiment analysis Case studies Conclusions Motivation Objectives Case studies Introduction Case studies In order to assess the framework, we have included two study cases Event driven - Boston terror attack Regular usage - Political activity on Twitter in Spanish ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 5 / 25
  6. 6. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Overview Requirements Easy to use, extensible, massive data processing Design decisions Modular design: Collection of independent scripts Focus on open data formats Built around the database: MongoDB Set of independent scripts interchanging data ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 6 / 25
  7. 7. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework messaging ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 7 / 25
  8. 8. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Miner Miner Extracts and stores tweets Stream API Several filters Written in Python ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 8 / 25
  9. 9. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Database Database Storage for futher processing MongoDB NoSQL database High performance ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 9 / 25
  10. 10. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Reporting Reporting CSV export for futher processing R processing Extensibility Powerful libraries ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 10 / 25
  11. 11. Introduction Framework Sentiment analysis Case studies Conclusions Framework overview Framework messaging Framework components Framework architecture Framework components: Sentiment analysis Sentiment analysis Supervised learning Need of labeling Tools for labeling Classifier building Classifier testing ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 11 / 25
  12. 12. Introduction Framework Sentiment analysis Case studies Conclusions Overview Classifier Sentiment analysis Overview Supervised learning with Natural Language Toolkit (NLTK) Three classes: “Positive”, “negative” and “neutral” Need of labeled corpus Several ones in English ... ... none in Spanish Need of thousands manually classified tweets Collaborative labeling Web application to label tweets ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 12 / 25
  13. 13. Introduction Framework Sentiment analysis Case studies Conclusions Overview Classifier Sentiment analysis Classifier Naïve Bayes classifier Stop words removed Some parameters to set Optimus parameter setting depends on the dataset Need of classifier evaluation Tester Cross validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 13 / 25
  14. 14. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack Main objective Evaluate the platform Secondary objective Describe activity around an event Stream by string filter The event Terror attack on 15 Apr 2013 14:49 (GMT-4) in Boston Internet witch-hunt motivated by the release of some photos Shooting and manhunt Data adquisition Begin: Tue, 16 Apr 2013 00:43 (GMT) End: Tue, 23 Apr 2013 00:43 (GMT) Filter: “Maratón de Boston” (Boston Marathon in Spanish) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 14 / 25
  15. 15. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Boston Terror Attack: Dataset description Value Relative Average Tweets 28,892 1.16/user No-retweets 16,029 55.48 % Reweets 12,863 44.52 % Geolocalized 255 0.88 % Users 24,989 Mentions 18,937 65.54 % Replies 849 2.94 % Non-replies 18,088 62.61 % Size 96.39 MB 3.38 KB/tweet Index size 0.91 MB Disk 132.99 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 15 / 25
  16. 16. Case study Boston Terror attack: activity Apr 17 Apr 19 Apr 21 Apr 23 010002500 Time Tweets Tweets Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Non−retweets Tweets (excluding RTs) Apr 17 Apr 19 Apr 21 Apr 23 04001000 Time Retweets Retweets Dashed line: Bombing Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  17. 17. Case study Boston Terror attack: activity Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 50150 Time Tweets Tweets Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 2060120 Time Non−retweets Tweets (excluding RTs) Thu 23:00 Fri 04:00 Fri 09:00 Fri 14:00 Fri 19:00 Sat 00:00 0204060 Time Retweets Retweets Dotted line: Photo release Solid line: Shooting Gray background: Manhunt
  18. 18. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Overview Main objective Evaluate sentiment analysis Secondary objective Describe regular Twitter activity Stream by user filter Selection of Spanish political actors Selected by activity and controversy Account owner Accounts Political party @PPopular, @PSOE, @iunida, @UPyD Politician @agarzon, @EduMadina, @ToniCanto1, @Re- villaMiguelA, @ccifuentes, @_Rubalcaba_ Journalist @jordievole, @iescolar Activist organization @LA_PAH Data adquisition From Tue, 16 Apr 2013 00:00 (GMT) End: 18 Apr 2013 04:00 (GMT) Filter: Account name (“@account”) ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 18 / 25
  19. 19. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Dataset description Value Relative Average Tweets 65,043 1.9/user No-retweets 28,175 43.32 % Reweets 36,868 56.68 % Geolocalized 528 0.81 % Users 34,195 Mentions 56,713 87.19 % Non-replies 46,981 72.23 % Replies 9,732 14.96 % Size 227.51 MB 3.58 KB/tweet Index size 2.05 MB Disk 237.95 MB ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 19 / 25
  20. 20. Case study Political analysis: Activity Tue Wed Thu 015003500 Time Tweets Tweets Tue Wed Thu 05001500 Time Non−retweets Tweets (excluding RTs) Tue Wed Thu 010002000 Time Retweets Retweets
  21. 21. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis 9, 884 tweets were manually classified in a collaborative way 4, 739 non-neutral tweets 1, 062 positives, 3, 677 negatives Unbalanced dataset We tried several parameters for the Naïve Bayes classifier N-grams: {1}, {2}, {3}, {1, 2}, {1, 3} and {2, 3} Minimum score: 0, 1, 2, 3, 4, 5, 6 and 10 10-fold cross-validation ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 21 / 25
  22. 22. Introduction Framework Sentiment analysis Case studies Conclusions Boston Terror Attack Political analysis Case study Political analysis: Sentiment analysis Accuracy NaiveBayes-1_2-min3 0.8543 NaiveBayes-1-min3 0.8510 NaiveBayes-1_3-min3 0.8507 NaiveBayes-1-min4 0.8476 NaiveBayes-1_3-min5 0.8474 NaiveBayes-1_2-min4 0.8469 NaiveBayes-1_3-min4 0.8467 NaiveBayes-1_3-min1 0.8459 NaiveBayes-1-min6 0.8452 NaiveBayes-1-min1 0.8448 NaiveBayes-1_2-min5 0.8446 NaiveBayes-1_3-min6 0.8438 NaiveBayes-1_2-min6 0.8436 NaiveBayes-1-min5 0.8406 NaiveBayes-1_2-min1 0.8389 NaiveBayes-2_3-min6 0.8385 ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 22 / 25
  23. 23. Case study Political analysis: Normalized sentiment Tue Wed Thu 0.00.20.40.60.81.0 Time Positive
  24. 24. Introduction Framework Sentiment analysis Case studies Conclusions Conclusions and future work We developed a framework that eases data extraction and analysis on Twitter Ready for production It will be released soon with a free licence We briefly described two case studies Event driven activity - Boston terror attacks Regular activity - Political activity Sentiment analysis is intrinsically difficult Future work Lemmalization Natural language processing Time series analysis ICCCI 2013, Craiova, Romania A Descriptive Analysis of Twitter Activity 24 / 25
  25. 25. Thanks for your attention! David F. Barrero david@aut.uah.es @dfbarrero
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×