Successfully reported this slideshow.

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

1,550 views

Published on

Slides

Published in: Technology
  • Be the first to comment

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

  1. 1. Twitter, Twinder, Twitcident: Filteringand Search on Social Web StreamsData Bridges Workshop, Inria, Paris, April 12th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  2. 2. 200,000,000 number of tweets published per day Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
  3. 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  4. 4. 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 4
  5. 5. Challenges 1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic? 2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)? Twinder Filtering Search & filtering Browsing and search frameworkTwitter streams topic information need Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
  6. 6. Search & Filtering Browsing Twitter streams topic information need1. Filtering of Twitter streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
  7. 7. Filtering onTwitter Query: www2012 Typical approach: Keyword-based matchingAre there further features that can be used asindicators for estimating the relevance of a tweetfor a topic? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
  8. 8. Syntactical feature: hashtagsIs a tweet more relevant ifitcontains a #hashtag? Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags. #Hashtag Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
  9. 9. Syntactical feature: URLsIs a tweetthatcontains a URL more relevant? Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL. URL Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
  10. 10. Syntactical feature: “mentions” Is a tweetthatmentions@somebodymore relevant? Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets.Reply @mention Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
  11. 11. Syntacticalfeature: lengthDoes the length of a tweetinfluenceitsrelevancefor a topic? 54 characters (9 words) vs. 140 characters (20 words) Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
  12. 12. Overview of featuresTopic-sensitive and topic-insensitive features Topic sensitive Topic insensitive Keyword-based Syntactical features relevance What about the semantics? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
  13. 13. Semanticfeatures: number of entitiesFindsemantics in a tweettoestimate the relevance dbp:Tim_Berners-Lee dbp:World_Wide_Web dbp:WWW_Conference dbp:France dbp:Lyon Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
  14. 14. Semanticfeatures: diversityThe types of entitiesthat are featuredby a tweet matter Place Place Place Place Person Thing vs. I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon. Event Place Place Place Place Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
  15. 15. Semanticfeatures: sentiment Opinionsexpressed in tweets are interestingLooking forward to the WWWconference :-) Yes! vs. I plan to visit Paris, Bordeaux, vs. Grenoble, Nice, Marseille and Lyon. Why are the big players not releasing query logs to the WWW community? :-( #fail :-) neutral :-( Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
  16. 16. Semanticrelatedness Exploitsemantics to relate query withtweets dbp:International_World_Wide_Web_Conferencedbp:Tim_Berners-Lee Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
  17. 17. Overview of featuresBynow, we have 4 types of features. Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context? Context? What kind of contextual features might be helpful? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
  18. 18. Contextual feature: authority of the publisher Itmatterswhopublished a tweet Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
  19. 19. Contextual feature: time w.r.t. queryWhen was a tweetpublished? Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic. Tweet query March 31 April 16 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
  20. 20. Summary of Features Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context-based Context Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
  21. 21. ResultsAchievedfor the TREC MicroblogChallengeFeatures Precision Recall F-measurekeyword relevance 0.3040 0.2924 0.2981without semanticssemantic relevance 0.3363 0.3053 0.4828 0.2931 0.3965 0.2991all features 0.3674 0.4736 0.4138 Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features. Challenge the future 21
  22. 22. Importance of Features Topic-sensitive Topic-insensitive 2 2 Keyword-based Syntactical 1 1 0 0 Keyword-based relevance hasHashtag hasURL isReply length -1 -1 2 2 1 Semantic-based 1 Semantics 0 0 Relevance Relatedness #entities diversity sentiment -1 -1Semantic relatedness, URLs, !isReply, diversity and 2 Context-based Context 2sentiment are good indicators for estimating the 1 1 0 0relevance of a tweet. -1 Temporal context Keyword-based relevance -1 Social context Keyword-based relevance Challenge the future 22
  23. 23. Search & Filtering Browsing Twitter streams topic information need2. Search & Browsing in Twitter Streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
  24. 24. Idea: Faceted Search Expand Query: Current Query: Suggestions: Eindhoven Music + Guilty Simpson + Area51 Results: 1. Yskiddd: Next saturday Locations more... @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my Events more... homeytown Eindhoven. #realliveshit #iwillspinrecords2 Music Artists: 2. Usee123: Cool #EV3door7980 !!! + Guilty Simpson http://bit.ly/igyyRhL + Bryan Adams + Elton John 3. sanmiquelmusic: This Saturday Im joining @KrusadersMusic to Intents + Golden Earring more... Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
  25. 25. Adaptive Faceted Search user Adaptive Faceted Search How to adapt theHow to represent facet-value pairthe content of a ranking to the User and Context Modeling tweet? current demands of facet extraction the user?  query suggestions Semantic Enrichment Twitter posts Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
  26. 26. Facet Extraction and Semantic Enrichment powered by Julian Assange @bob: Julian Assange got Tweet-based arrested enrichmentJulian Assange Julian Assange Julian Assange arrested Link-based London Julian Assange, the founder of Julian Assange enrichment WikiLeaks, is under arrest in WikiLeaks London… LondonWikiLeaks Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
  27. 27. Faceted-search vs. hashtag-based (keyword) search Faceted search based on semantic enrichment of tweets outperforms hashtgag-based search significantly. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
  28. 28. Impact of link-based enrichment Personalized strategy outperforms baseline significantly Link-based enrichment improves quality for both strategies Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
  29. 29. Twitcident application Search & Filtering Browsing Twitter streams topic information needTwitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations) Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
  30. 30. 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 30
  31. 31. Search & Browsing Automatic Filtering Twitcident PipelineTwitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 31
  32. 32. Faceted SearchFiltered Twitter stream 32
  33. 33. Real-time visualizations 33
  34. 34. Could we see it coming? Popular artist made a joke Impact about the weather storm Term usage 25 minutes before the incident 1. heavy weather, hail balls, lightning, pitch black… 2. drama, panic, hell, serious, extreme…“ ” 34
  35. 35. Spotting eye witnesses 35
  36. 36. Real-time information from eyewitness 36
  37. 37. SummaryAutomatic Filtering of Tweets: [#MSM@WWW ’12]• Topic-sensitive and topic-insensitive features• Semantic features (semantic relatedness, diversity, sentiment are beneficial)Search and browsing: [ISWC ’11]• Faceted Search• Personalization & contextualization helpsApplication: [Hypertext ‘12, Demo@WWW’12]• Twitcident: fulfilling information needs during incidentsFuture works:• Weak signal detection based on tweets• Duplicate detection Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
  38. 38. Thank you! @fabianabel http://wis.ewi.tudelft.nl/Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38

×