Twitris – System for Understanding  Perceptions From Social Data	                      	                        	         ...
Twitris - Motivation	1.  Information Overload"•    WHAT to be aware of"•    Multiple Storylines about same event!!"       ...
Twitris - Motivation	2. Evolution of Citizen Observation"     •  with location, time and occurrence of other        events...
Twitris - Motivation	3. Big picture of the event"   –  How to find out "     •  Location and time based interesting facts f...
Twitris: Twitter + Tetris	•  Twitris lets you browse citizen reports using social   perceptions as the fulcrum"   –  What ...
Twitris Architecture	                       4                              21                  3                          ...
Data Collection and Preprocessing: Semi-automated Tweet Crawler	Extract topically relevant tweets using Twitter search   A...
Data Collection and Preprocessing:      Metadata Extraction	 •    Tweet published date-time, author, location" •    Locati...
Key Phrase Extraction:	    1. Spatio-Temporal Clustering	•  Objective: from volume of tweets to event descriptive key   ph...
Key Phrase Extraction: 	    Spatio-Temporal clustering	Temporal navigation   Spatial Markers                              ...
Key Phrase Extraction:	"              2. N-gram generation	""""""     “President Obama in trying to regain control of    t...
Key Phrase Extraction:	        3. n-gram Weight Calculation	A n-gramʼs weight is calculated by""         1.  Thematic Impo...
Key Phrase Extraction:	3.1.A Thematic Importance of a n-gram	A.  Exploiting Redundancy"   1.  TF-IDF of n-gram (Lucene Ind...
Key Phrase Extraction:	    3.1.B Thematic Importance of a n-gram	B. Exploiting Variability"     –  Contextually relevant w...
Key Phrase Extraction:	3.2 Thematic-Temporal Importance	•  Temporal Importance of the n-gram"     •  always popular vs. cu...
Key Phrase Extraction:	3.3 Thematic-Temporal-Spatial Importance	•  Descriptors that occur all over the world not as   inte...
Key Phrase Extraction: Results	TFIDF vs. Spatio-Temporal-Thematic(STT) Scores ofDescriptors"                              ...
Key Phrase Extraction: Example	•  Objective: from volume of tweets to event descriptive key phrases,   preserving spatio-t...
Analysis of Embedded Links	•  Due 140 character tweet size limit people are   increasingly integrating hyperlinks into twe...
External Context for        Understanding Event	•  Wikipedia articles"•  Related news"                                 20
Twitris: Widgets	                      21
Sentiment Analysis	•  using statistical and machine"   learning techniques                                    22
Entity-Relationship Graph	•  using semantically annotated Dbpedia"   entities mentioned in the tweets "                   ...
Tweet Traffic Analysis	•  Event popularity over a period of time"                                             24
Twitris:  Functional    Overview	                 25
Twitris: Demo, Quick Show	    •  http://twitris.knoesis.org/                                     26
Ongoing work	                  27
Continuous Semantics 	Domain models to enhance understanding of the content"                                              ...
Coordination	•  Coordinating needs and resources in disaster   situation"  –  Analyze SMS and Web reports from disaster lo...
Twitris Team 	Meena Nagarajan                              Amit Sheth              Hemant Purohit      Ashutosh Jadhav    ...
References	1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org"2.  Nagarajan, M., Gomadam, K., ...
       Thanks!	        	     Questions?	                     32
Upcoming SlideShare
Loading in …5
×

Twitris - Web Information System 2011 Course

656 views

Published on

Twitris - Web Information System 2011 Course

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
656
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Twitris - Web Information System 2011 Course

  1. 1. Twitris – System for Understanding Perceptions From Social Data http://twitris.knoesis.org/ Ohio Center of Excellence in Knowledge Enabled Computing (Kno.e.sis) Wright State University, Dayton, OH 1
  2. 2. Twitris - Motivation 1.  Information Overload"•  WHAT to be aware of"•  Multiple Storylines about same event!!" Image: http://bit.ly/etFezl 2
  3. 3. Twitris - Motivation 2. Evolution of Citizen Observation" •  with location, time and occurrence of other events" 3
  4. 4. Twitris - Motivation 3. Big picture of the event" –  How to find out " •  Location and time based interesting facts for an event from Twitter" •  Event related information from other sources (images, videos, news and Wikipedia articles)" " 4
  5. 5. Twitris: Twitter + Tetris •  Twitris lets you browse citizen reports using social perceptions as the fulcrum" –  What is being said about an event (theme)" –  Where (spatial)" –  When (temporal)"•  Contextual information from web resources like news, Wikipedia articles, Flickr, TwitPic and Youtube"•  Study diversity and change in perceptions" 5
  6. 6. Twitris Architecture 4 21 3 6
  7. 7. Data Collection and Preprocessing: Semi-automated Tweet Crawler Extract topically relevant tweets using Twitter search API and search keywords" –  Because tweets are not pre-categorized!"Strategy: Semi-automated Multithread Continuous " " Tweet Crawler"" l  Start with manually selected keywords (seed)" l  Crawl using keywords, hashtags" l  Periodically update keywords used for crawl " (to capture evolution of the topic)" l  Continue crawl" 7
  8. 8. Data Collection and Preprocessing: Metadata Extraction •  Tweet published date-time, author, location" •  Location from where tweet is originated" −  From the tweet" −  From authorʼs profile" •  Location: Dayton, OH (Google geocoder service)" •  Location: “best place in the world” (fail!)" •  Location Geocode lookup" •  Cache (location, latitude, longitude) for speedup" " 8
  9. 9. Key Phrase Extraction: 1. Spatio-Temporal Clustering •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions!"1.  Spatio-temporal clustering"" –  Group observations based on location and time" " –  Global events (Iran Election Protest, Japan Earthquake)" •  clusters by country and day" " –  Local events (Heathcare reform debate, Austin Plane crash)" •  clusters by state and day" 9
  10. 10. Key Phrase Extraction: Spatio-Temporal clustering Temporal navigation Spatial Markers 10
  11. 11. Key Phrase Extraction: " 2. N-gram generation """""" “President Obama in trying to regain control of the health-care debate will likely shift his pitch in September”" "1-grams: President, Obama, in, trying, to, regain, ..." "2-grams: “President Obama”, “Obama in”, “in trying”, “trying to”... "3-grams: “President Obama in”, “Obama in trying”; “in trying to”..." 11
  12. 12. Key Phrase Extraction: 3. n-gram Weight Calculation A n-gramʼs weight is calculated by"" 1.  Thematic Importance" –  redundancy: statistically discriminatory in nature" –  variability: contextually important" 2.  Spatial Importance (local vs. global popularity)" 3.  Temporal Importance (always popular vs. currently trending)" " 12
  13. 13. Key Phrase Extraction: 3.1.A Thematic Importance of a n-gram A.  Exploiting Redundancy" 1.  TF-IDF of n-gram (Lucene Index)" 2.  Amplify by fraction of nouns in the n-gram (Stanford Natural Language Parser)" 3.  Amplify by fraction of non-stop words (ʻgoing to tryʼ)" 4.  Pick higher order n-gram (for overlapping segments and same TF-IDF)" 5.  Select top 5 n-grams for further analysis"
  14. 14. Key Phrase Extraction: 3.1.B Thematic Importance of a n-gram B. Exploiting Variability" –  Contextually relevant words boost statistical importance"•  Focus word (fw) : “n-gram”""•  Associated words (awi) : top 5 co-occurring words in spatio-temporal set of tweets"•  Association strength: Point-wise Mutual Information"
  15. 15. Key Phrase Extraction: 3.2 Thematic-Temporal Importance •  Temporal Importance of the n-gram" •  always popular vs. currently trending"•  Certain descriptors always dominate observations" –  Obama, President in the US presidential election"" •  To allow less popular, interesting descriptors to surface, we discount thematic score proportional to recent popularity" •  Spatio-temporal-thematic score of a descriptor" "= thematic score - spatio-temporal discounts" 15
  16. 16. Key Phrase Extraction: 3.3 Thematic-Temporal-Spatial Importance •  Descriptors that occur all over the world not as interesting as those local to a region " –  (local vs. global popularity)"•  Discount thematic-temporal score proportional to number of spatial sets (not local) that mention the descriptor"•  Final Spatio-Temporal-Thematic (STT) weight of a " n-gram is" 16
  17. 17. Key Phrase Extraction: Results TFIDF vs. Spatio-Temporal-Thematic(STT) Scores ofDescriptors" 17
  18. 18. Key Phrase Extraction: Example •  Objective: from volume of tweets to event descriptive key phrases, preserving spatio-temporal-thematic aspects of social perceptions 18
  19. 19. Analysis of Embedded Links •  Due 140 character tweet size limit people are increasingly integrating hyperlinks into tweets (Articles, blogs, Images, video)"•  Steps: " –  Extraction and resolution of links" –  Provide hyperlink to articles, blogs" –  Check semantic relevance for images and videos" •  Based on title and description " 19
  20. 20. External Context for Understanding Event •  Wikipedia articles"•  Related news" 20
  21. 21. Twitris: Widgets 21
  22. 22. Sentiment Analysis •  using statistical and machine" learning techniques 22
  23. 23. Entity-Relationship Graph •  using semantically annotated Dbpedia" entities mentioned in the tweets " 23
  24. 24. Tweet Traffic Analysis •  Event popularity over a period of time" 24
  25. 25. Twitris:  Functional    Overview 25
  26. 26. Twitris: Demo, Quick Show •  http://twitris.knoesis.org/ 26
  27. 27. Ongoing work 27
  28. 28. Continuous Semantics Domain models to enhance understanding of the content" 28
  29. 29. Coordination •  Coordinating needs and resources in disaster situation" –  Analyze SMS and Web reports from disaster location" –  Use domain models for efficient and timely coordination" Image: http://bit.ly/hcp4PG 29
  30. 30. Twitris Team Meena Nagarajan Amit Sheth Hemant Purohit Ashutosh Jadhav Lu Chen Pramod Anantharam Pavan Kapanipathi
  31. 31. References 1.  Twitris: Twitter through space, time and theme. http://twitris.knoesis.org"2.  Nagarajan, M., Gomadam, K., Sheth, A.P., Ranabahu, A., Jadhav, A., Mutharaju, R.: Spatio-temporal- thematic analysis of citizen-sensor data - challenges and experiences. In: Web Information Systems Engineering. (2009)"3.  Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced Browsing, Semantic Web Challenge 2009, 8th International Semantic Web Conference, Oct. 25-29 2009, Washington, DC, USA"4.  A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009"5.  A. Sheth, Citizen Sensing, Social Signals, and Enriching Human Experience- IEEE Internet Computing, July/August 2009."6.  Thomas, C., Mehra, P., Brooks, R., Sheth, A.P.: Growing fields of interest – using an expand and reduce strategy for domain model extraction. In: Web Intelligence. (2008) 496–502"7.  Mendes PN, Passant A, Kapanipathi P, Sheth AP, Linked Open Social Signals, WI2010 IEEE/WIC/ ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010"8.  Meenakshi Nagarajan, Hemant Purohit, Amit Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices. 4th Intl AAAI Conference on Weblogs and Social Media, ICWSM 2010" 31 * All the trademarks belong to their respective owners
  32. 32.   Thanks! Questions? 32

×