Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analytics, NLP and Sentiment Analysis

591 views

Published on

Citizens as sensors to
Assess sentiment on services, events, …
Response of consumers wrt…
Early detection of critical conditions
Information channel
Opinion leaders
Communities
Formation
Predicting volume of visitors for tuning the services
Collecting Tweets
on the basis of several criterial, searches
Multiple users may have multiple searches and multiple purposes (views on those searches)  minimization of searches
With high reliable model exploiting Twitter Search and/or Stream API
Performing NLP and Sentiment Analysis
Real time or daily
Multiple languages
Compute Metrics:
different kinds, etc.: volume, users, etc..
Low and High Level,
Work on daily, hourly and real time
Downloading data: raw data and derived metrics
Providing support for drilling down on data
to perform inspection and analysis, faceted search for instance
perform data analytics
exploiting metrics on Machine Learning and predictive tools
Providing API for querying, dashboard


The IEEE Smart World Congress originated from the 2005 Workshop on Ubiquitous Smart Worlds (USW, Taipei) and the 2005 Symposium on Ubiquitous Intelligence and Smart World (UISW, Nagasaki). SmartWorld 2017 in San Francisco is the next edition after the successful SmartWorld 2016 in Toulouse France and SmartWorld 2015 in Beijing China. SmartWorld 2017 is to provide a high-profile, leading-edge platform for researchers and engineers to exchange and explore state-of-art advances and innovations in graceful integrations of Cyber, Physical, Social, and Thinking Worlds for the theme
http://ieee-smartworld.org/2017/smartworld/

Published in: Data & Analytics
  • Be the first to comment

Twitter Vigilance: a Multi-User platform for Cross-Domain Twitter Data Analytics, NLP and Sentiment Analysis

  1. 1. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org DISIT lab, IEEE SCI 2017, Freemont CA USA Daniele Cenni, Paolo Nesi, Gianni Pantaleo, Imad Zaza University of Florence, Department of Information Engineering, DISIT Lab, http://www.disit.org , http://www.sii-mobility.org , http://www.km4city.org paolo.nesi@unifi.it Twitter Vigilance: a Multi-User platform for Cross- Domain Twitter Data Analytics, NLP and Sentiment Analysis
  2. 2. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Exploiting Social Media Data • Mainly Natural Language (multiple), specific slang, – E.g., Twitter with its # Hashtags, @ citations, etc. – Most of the posts are scarcely geolocated • Main Domain Analysis – Social and market analysis – Predictive model – Early warning, anomaly detection • Derived Metrics may be of many kind and have to be validate to use them DISIT lab, IEEE SCI 2017, Freemont CA USA
  3. 3. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Prediction/Assessment • Football game results as related to the volume of Tweets • Number of votes on political elections, via sentiment analysis, SA • Size and inception of contagious diseases • marketability of consumer goods • public health seasonal flu • box-office revenues for movies • places to be visited, most visited • number of people in locations like airports • audience of TV programmes, political TV shows • weather forecast information • Appreciation of services DISIT lab, IEEE SCI 2017, Freemont CA USA
  4. 4. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Twitter Vigilance • http://www.disit.org/tv • http://www.disit.org/rttv • Citizens as sensors to – Assess sentiment on services, events, … – Response of consumers wrt… – Early detection of critical conditions – Information channel – Opinion leaders – Communities – Formation – Predicting volume of visitors for tuning the services DISIT lab, IEEE SCI 2017, Freemont CA USA
  5. 5. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Requirements • Collecting Tweets –on the basis of several criterial, searches • Multiple users may have multiple searches and multiple purposes (views on those searches)  minimization of searches –With high reliable model exploiting Twitter Search and/or Stream API • Performing NLP and Sentiment Analysis –Real time or daily –Multiple languages DISIT lab, IEEE SCI 2017, Freemont CA USA
  6. 6. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org State of the art DISIT lab, IEEE SCI 2017, Freemont CA USA Service Twitter Metrics(e.g.# oftweets, retweetsover time) Sentiment analysis NLPAnalysis API availability Usernetwork analysis Dataanalysis basedon geolocation Realtime Analytics Fullfaceted Search Metricsfor assessingrecall efficiency Minimization ofsearchesto Twitter SAS N Y N Y N N Y N N na Keyhole Aggre- gate N N N Aggre -gate Y N N N na Tweetreach Aggre- gate N N N Aggre -gate Y N N N na Brandwatch N N N N Y Y Y N N na Followewonk N N N N Y Y Y N N na Twitris N Y N N N Y Y N N na OSoMe Y N N Y Y Y Y N N na Twitter Vigilance Y Y Y Y Y N Y Y Y Y
  7. 7. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Twitter Vigilance Public Views • TV: Twitter Vigilance main tool (http://disit.org/tv/), collecting and analyzing tweets daily; • RTTV: Real-time twitter Vigilance (http://disit.org/rttv/), collecting and analyzing tweets in real time; • TVSolr: Twitter Vigilance Advanced search • (http://tvsolr.disit.org/), indexing tweets and faceted search DISIT lab, IEEE SCI 2017, Freemont CA USA
  8. 8. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Architecture DISIT lab, IEEE SCI 2017, Freemont CA USA
  9. 9. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Twitter Vigilance Users can • create and edit customized channels as a collection of searches on API – Per channel and per search • crawls tweets, computes metrics, and shows results of Twitter Data, as: volume metrics about tweets, retweets and user statistics, NLP and Sentiment Analyses based metrics • provides public access to metric results computed on channels and search analysis • Allows the researchers to download resulting metrics values (through API service) over time for further analysis DISIT lab, IEEE SCI 2017, Freemont CA USA
  10. 10. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Several Channels DISIT lab, IEEE SCI 2017, Freemont CA USA
  11. 11. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org A Channel DISIT lab, IEEE SCI 2017, Freemont CA USA Its searches
  12. 12. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Twitter Syntax for Searches • String substring: Caldo • Hashtag: #Caldo, • Citations: @CivilProtection, @paolonesi • From users: From:@paolonesi • Etc. • ….ANDed and ORed DISIT lab, IEEE SCI 2017, Freemont CA USA
  13. 13. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Metrics’ Kinds • Volume Metrics – Number of TW, number of RTW • User Metrics – Number of distinct users – Number of followers, following • NLP and SA metrics – Counting word, adjective, noun, verbs, …. – Estimating SA, weighting with SentiWordNet (extended to Italian) • High level metrics (compositing all the other metrics) – Addition of metrics.. – Ratio among metrics, e.g.: num of TW/num of RTW,… – Cumulated metrics over time, e.g.: number of TW in the last X days.. • All: (i) per day, per hour, etc. (ii) per channel, per search • Recently: we added the possibility of using metrics as firing conditions for alerts and bot on Twitter. DISIT lab, IEEE SCI 2017, Freemont CA USA
  14. 14. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Problem addressed Strong Limitations of the Search API of Twitter • minimizing the number of searches on the basis of the user requests: – different users with their queries request tweets already requested by others • Recovering of parent Tweets from Orphan reTweets taken in the searching process Analytics: • High performance solution based on HDFS, Hadoop for NLP and SA, exploiting MapReduce programming model • Estimating the network of influencer • Computing metrics and prediction in real time. DISIT lab, IEEE SCI 2017, Freemont CA USA
  15. 15. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Sentiment Analysis DISIT lab, IEEE SCI 2017, Freemont CA USA
  16. 16. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org DISIT lab, IEEE SCI 2017, Freemont CA USA
  17. 17. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Influence Network DISIT lab, IEEE SCI 2017, Freemont CA USA
  18. 18. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org DISIT lab, IEEE SCI 2017, Freemont CA USA Early Warning Predictive models Hot flows Attendance at long lasting events: EXPO2015 Attendance at recurrent events: TV, footbal
  19. 19. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Efficiency in retrieval DISIT lab, IEEE SCI 2017, Freemont CA USA Posts Volume (Tweets + Retweets) Range # Recovered Original Tweets # Missing Original Tweets % Original Tweets Coverage (CoTWO) # Twitter Search API requests # Saturations on Twitter Search API requests % Saturations on Twitter Search API requests (S%) % Not-Saturated Twitter Search API requests (1- S%) < 10k 18571 2033 89,05% 124299 1 0,00% 100,00% [ 10k, 50k ) 130051 13716 89,45% 399170 100 0,03% 99,97% [ 50k, 100k ) 96171 10278 89,31% 123804 165 0,13% 99,87% [ 100k, 500k ) 997833 86755 91,31% 849062 1589 0,19% 99,81% [ 500k, 1M ) 930646 61632 93,38% 439956 1998 0,45% 99,55% [ 1M, 5M ) 6454463 439628 93,19% 2787485 31585 1,13% 98,87% > 5M 14714124 899035 93,89% 4509184 64284 1,43% 98,57%
  20. 20. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Original Tweets coverage and Twitter Search API DISIT lab, IEEE SCI 2017, Freemont CA USA
  21. 21. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Dependance on RTW/TW ratio DISIT lab, IEEE SCI 2017, Freemont CA USA
  22. 22. DISIT Lab, Distributed Data Intelligence and Technologies Distributed Systems and Internet Technologies Department of Information Engineering (DINFO) http://www.disit.dinfo.unifi.it http://www.disit.org Conclusions • Twitter Vigilance is now operative since 2 years with many institutional users: ARPAT, LAMMA, UNIFI, CNR,.. • It presents an high efficiency in recovering twitter data despite to the complexity and provided API. • It has been used/validated with data coming from several scenarios and domains • for early warning and prediction in the domain of: – social communication, hot in Tuscany, rain measures, etc. – Disaster alerts: water bomb – TV audience (X factor, etc.), large events as Expo 2015 • New version is providing direct metrics estimation which can be composed by users, and resulting data can be downloaded DISIT lab, IEEE SCI 2017, Freemont CA USA

×