Graph Visualization Tool for Twittersphere users          based on a high-scalable    Extract, Transform and Load System  ...
INDEXINTRODUCTION                Cierzo Development and SMMART                            Structure of Twitter            ...
INTRODUCTION        CIERZO DEVELOPMENT AND SMMART     DISTRIBUTED COMPUTATION        STRUCTURE OF TWITTER              PIP...
INTRODUCTION         CIERZO DEVELOPMENT AND SMMART     DISTRIBUTED COMPUTATION         STRUCTURE OF TWITTER              P...
INTRODUCTION          CIERZO DEVELOPMENT AND SMMART       DISTRIBUTED COMPUTATION          STRUCTURE OF TWITTER           ...
INTRODUCTION         CIERZO DEVELOPMENT AND SMMART     DISTRIBUTED COMPUTATION         STRUCTURE OF TWITTER              P...
INTRODUCTION            CIERZO DEVELOPMENT AND SMMART     DISTRIBUTED COMPUTATION            STRUCTURE OF TWITTER         ...
INTRODUCTION                        CIERZO DEVELOPMENT AND SMMART            DISTRIBUTED COMPUTATION                      ...
#spanishrevolution#yeswecamp#15m
INTRODUCTION     DISTRIBUTED COMPUTATION     HADOOP              PIPELINE DESIGN    AMAZON EC2                     RESULTS...
INTRODUCTION     DISTRIBUTED COMPUTATION    HADOOP              PIPELINE DESIGN   AMAZON EC2                     RESULTSDI...
INTRODUCTION     DISTRIBUTED COMPUTATION         HADOOP              PIPELINE DESIGN        AMAZON EC2                    ...
INTRODUCTION    CRAWLING MODULE      DISTRIBUTED COMPUTATION    METADATA EXTRACTION MODULE               PIPELINE DESIGN  ...
INTRODUCTION    CRAWLING MODULE          DISTRIBUTED COMPUTATION    METADATA EXTRACTION MODULE                   PIPELINE ...
INTRODUCTION    CRAWLING MODULE     DISTRIBUTED COMPUTATION    METADATA EXTRACTION MODULE              PIPELINE DESIGN   I...
INTRODUCTION          CRAWLING MODULE        DISTRIBUTED COMPUTATION          METADATA EXTRACTION MODULE                 P...
INTRODUCTION           CRAWLING MODULE        DISTRIBUTED COMPUTATION           METADATA EXTRACTION MODULE                ...
INTRODUCTION                 WESTERN SAHARA CONFLICT         DISTRIBUTED COMPUTATION                 PATXI LÓPEZ          ...
INTRODUCTION        WESTERN SAHARA CONFLICT     DISTRIBUTED COMPUTATION        PATXI LÓPEZ              PIPELINE DESIGN   ...
INTRODUCTION    WESTERN SAHARA CONFLICT     DISTRIBUTED COMPUTATION    PATXI LÓPEZ              PIPELINE DESIGN   CONCLUSI...
INTRODUCTION                     WESTERN SAHARA CONFLICT          DISTRIBUTED COMPUTATION                     PATXI LÓPEZ ...
INTRODUCTION        WESTERN SAHARA CONFLICT      DISTRIBUTED COMPUTATION        PATXI LÓPEZ               PIPELINE DESIGN ...
INTRODUCTION    WESTERN SAHARA CONFLICT      DISTRIBUTED COMPUTATION    PATXI LÓPEZ               PIPELINE DESIGN   CONCLU...
INTRODUCTION      WESTERN SAHARA CONFLICT      DISTRIBUTED COMPUTATION      PATXI LÓPEZ               PIPELINE DESIGN     ...
INTRODUCTION     WESTERN SAHARA CONFLICT      DISTRIBUTED COMPUTATION     PATXI LÓPEZ               PIPELINE DESIGN    CON...
Thanks for your attention
Graph Visualization Tool for Twittersphere users based on a high-scalable Extract, Transform and Load System
Graph Visualization Tool for Twittersphere users based on a high-scalable Extract, Transform and Load System
Upcoming SlideShare
Loading in …5
×

Graph Visualization Tool for Twittersphere users based on a high-scalable Extract, Transform and Load System

3,031 views

Published on

Published in: Technology, Business
  • Be the first to comment

Graph Visualization Tool for Twittersphere users based on a high-scalable Extract, Transform and Load System

  1. 1. Graph Visualization Tool for Twittersphere users based on a high-scalable Extract, Transform and Load System Pablo Aragón, Íñigo García and Antonio García May, 27th 2011
  2. 2. INDEXINTRODUCTION Cierzo Development and SMMART Structure of Twitter Volume of Twitter Detection of influencersDISTRIBUTED COMPUTATION Hadoop Amazon EC2PIPELINE DESIGN Crawling Module Metadata Extraction Module Indexing Module Graph Visualization ModuleRESULTS Western Sahara Conflict Patxi López Conclusions Future work
  3. 3. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: CIERZO DEVELOPMENT AND SMMART SMMART (Social Media Marketing Analysis and Reporting Tool) is the system developed by Cierzo Development for: Corporate social reputation Measuring effectiveness of marketing campaigns Detection of new trends
  4. 4. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: STRUCTURE OF TWITTER Structure of a profile
  5. 5. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: STRUCTURE OF TWITTERA user can set a relationship with another user by: Reply: Update that begins with @username Mention: Update that contains @username in the body of the tweet Retweet: Update that contains the body of another user tweet by specifying the original author
  6. 6. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: VOLUME OF THE TWITTER More than 200M users publishing millions of tweets per day
  7. 7. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: DETECTION OF INFLUENCERS Old metrics based on data as: Absolute info: Number of followers Relative info: Quotient of following users and followers
  8. 8. INTRODUCTION CIERZO DEVELOPMENT AND SMMART DISTRIBUTED COMPUTATION STRUCTURE OF TWITTER PIPELINE DESIGN VOLUME OF TWITTER RESULTS DETECTION OF INFLUENCERSINTRODUCTION: DETECTION OF INFLUENCERSAvailable search engines track Twitter and list results,but they do not set a value to the users from the response.
  9. 9. #spanishrevolution#yeswecamp#15m
  10. 10. INTRODUCTION DISTRIBUTED COMPUTATION HADOOP PIPELINE DESIGN AMAZON EC2 RESULTSDISTRIBUTED COMPUTATION Management of large volumes at the lowest cost Automatic adjustment to the daily growth of users and the oscillations in the frequency of publication
  11. 11. INTRODUCTION DISTRIBUTED COMPUTATION HADOOP PIPELINE DESIGN AMAZON EC2 RESULTSDISTRIBUTED COMPUTATION: HADOOP Map Reduce Distributed File System
  12. 12. INTRODUCTION DISTRIBUTED COMPUTATION HADOOP PIPELINE DESIGN AMAZON EC2 RESULTSDISTRIBUTED COMPUTATION: AMAZON EC2 Definition of a Hadoop node as a machine image in Amazon Elastic Compute Cloud. The system balancing mechanism adds and removes Hadoop nodes in real time on demand.
  13. 13. INTRODUCTION CRAWLING MODULE DISTRIBUTED COMPUTATION METADATA EXTRACTION MODULE PIPELINE DESIGN INDEXING MODULE RESULTS GRAPH VISUALIZATION MODULEPIPELINE DESIGN
  14. 14. INTRODUCTION CRAWLING MODULE DISTRIBUTED COMPUTATION METADATA EXTRACTION MODULE PIPELINE DESIGN INDEXING MODULE RESULTS GRAPH VISUALIZATION MODULEPIPELINE DESIGN: CRAWLING MODULEBased on Nutch1. Crawl the Twitter profiles stored in a DB2. Extract outlinks to new profiles
  15. 15. INTRODUCTION CRAWLING MODULE DISTRIBUTED COMPUTATION METADATA EXTRACTION MODULE PIPELINE DESIGN INDEXING MODULE RESULTS GRAPH VISUALIZATION MODULEPIPELINE DESIGN: METADATA EXTRACTION MODULE The portion of HTML of a tweet contains a set of metadata: Textual content Publication date Author Mention to other users
  16. 16. INTRODUCTION CRAWLING MODULE DISTRIBUTED COMPUTATION METADATA EXTRACTION MODULE PIPELINE DESIGN INDEXING MODULE RESULTS GRAPH VISUALIZATION MODULEPIPELINE DESIGN: INDEXING MODULE Apache Solr (enterprise search server based on Lucene) Sorting algorithms Stemming Stopwords filters Faceted searchs Multicore architecture sharding by publication date.
  17. 17. INTRODUCTION CRAWLING MODULE DISTRIBUTED COMPUTATION METADATA EXTRACTION MODULE PIPELINE DESIGN INDEXING MODULE RESULTS GRAPH VISUALIZATION MODULEPIPELINE DESIGN: GRAPH VISUALIZATION MODULE The Graph Visualization module transforms the responses from the index into a graph by the force-based multilevel layout Yifan Hu’s algorithm provided in Gephi Toolkit.
  18. 18. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: WESTERN SAHARA CONFLICT In November 2010, Moroccan security forces involved in a camp in Western Sahara. This action was criticized by part of the Spanish society.
  19. 19. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: WESTERN SAHARA CONFLICT Search content:‘sahara’ language:’es’ date:[2010-11-10 TO 2010-11-18] Results 1721 users 3925 tweets 707 mentions
  20. 20. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: WESTERN SAHARA CONFLICT
  21. 21. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: PATXI LÓPEZ Patxi López holds the position of the President of the Basque Country Government. His campaign included strategies in social networks.
  22. 22. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: PATXI LÓPEZ Search mention:‘patxi_lopez’ language:’es’ date:[2010-11-10 TO 2010-11-18] Results 186 users 196 tweets 366 mentions
  23. 23. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: PATXI LÓPEZ
  24. 24. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: CONCLUSIONS The implemented tool identifies main influencers in a specific topic or around a concrete user The high-scalable design adapts to a large social network as Twitter Enterprises can deploy social media monitoring systems using exclusively open source technologies The tool provides information for performing crisis management
  25. 25. INTRODUCTION WESTERN SAHARA CONFLICT DISTRIBUTED COMPUTATION PATXI LÓPEZ PIPELINE DESIGN CONCLUSIONS RESULTS FUTURE WORKRESULTS: FUTURE WORK New versions for more social media sources Real-time results New data mining applications Predictive models
  26. 26. Thanks for your attention

×