Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

1,600 views

Published on

Slides

Published in: Technology
  • Be the first to comment

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

  1. 1. Twitter, Twinder, Twitcident: Filteringand Search on Social Web StreamsData Bridges Workshop, Inria, Paris, April 12th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  2. 2. 200,000,000 number of tweets published per day Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
  3. 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  4. 4. 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 4
  5. 5. Challenges 1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic? 2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)? Twinder Filtering Search & filtering Browsing and search frameworkTwitter streams topic information need Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
  6. 6. Search & Filtering Browsing Twitter streams topic information need1. Filtering of Twitter streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
  7. 7. Filtering onTwitter Query: www2012 Typical approach: Keyword-based matchingAre there further features that can be used asindicators for estimating the relevance of a tweetfor a topic? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
  8. 8. Syntactical feature: hashtagsIs a tweet more relevant ifitcontains a #hashtag? Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags. #Hashtag Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
  9. 9. Syntactical feature: URLsIs a tweetthatcontains a URL more relevant? Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL. URL Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
  10. 10. Syntactical feature: “mentions” Is a tweetthatmentions@somebodymore relevant? Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets.Reply @mention Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
  11. 11. Syntacticalfeature: lengthDoes the length of a tweetinfluenceitsrelevancefor a topic? 54 characters (9 words) vs. 140 characters (20 words) Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
  12. 12. Overview of featuresTopic-sensitive and topic-insensitive features Topic sensitive Topic insensitive Keyword-based Syntactical features relevance What about the semantics? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
  13. 13. Semanticfeatures: number of entitiesFindsemantics in a tweettoestimate the relevance dbp:Tim_Berners-Lee dbp:World_Wide_Web dbp:WWW_Conference dbp:France dbp:Lyon Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
  14. 14. Semanticfeatures: diversityThe types of entitiesthat are featuredby a tweet matter Place Place Place Place Person Thing vs. I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon. Event Place Place Place Place Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
  15. 15. Semanticfeatures: sentiment Opinionsexpressed in tweets are interestingLooking forward to the WWWconference :-) Yes! vs. I plan to visit Paris, Bordeaux, vs. Grenoble, Nice, Marseille and Lyon. Why are the big players not releasing query logs to the WWW community? :-( #fail :-) neutral :-( Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
  16. 16. Semanticrelatedness Exploitsemantics to relate query withtweets dbp:International_World_Wide_Web_Conferencedbp:Tim_Berners-Lee Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
  17. 17. Overview of featuresBynow, we have 4 types of features. Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context? Context? What kind of contextual features might be helpful? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
  18. 18. Contextual feature: authority of the publisher Itmatterswhopublished a tweet Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
  19. 19. Contextual feature: time w.r.t. queryWhen was a tweetpublished? Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic. Tweet query March 31 April 16 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
  20. 20. Summary of Features Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context-based Context Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
  21. 21. ResultsAchievedfor the TREC MicroblogChallengeFeatures Precision Recall F-measurekeyword relevance 0.3040 0.2924 0.2981without semanticssemantic relevance 0.3363 0.3053 0.4828 0.2931 0.3965 0.2991all features 0.3674 0.4736 0.4138 Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features. Challenge the future 21
  22. 22. Importance of Features Topic-sensitive Topic-insensitive 2 2 Keyword-based Syntactical 1 1 0 0 Keyword-based relevance hasHashtag hasURL isReply length -1 -1 2 2 1 Semantic-based 1 Semantics 0 0 Relevance Relatedness #entities diversity sentiment -1 -1Semantic relatedness, URLs, !isReply, diversity and 2 Context-based Context 2sentiment are good indicators for estimating the 1 1 0 0relevance of a tweet. -1 Temporal context Keyword-based relevance -1 Social context Keyword-based relevance Challenge the future 22
  23. 23. Search & Filtering Browsing Twitter streams topic information need2. Search & Browsing in Twitter Streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
  24. 24. Idea: Faceted Search Expand Query: Current Query: Suggestions: Eindhoven Music + Guilty Simpson + Area51 Results: 1. Yskiddd: Next saturday Locations more... @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my Events more... homeytown Eindhoven. #realliveshit #iwillspinrecords2 Music Artists: 2. Usee123: Cool #EV3door7980 !!! + Guilty Simpson http://bit.ly/igyyRhL + Bryan Adams + Elton John 3. sanmiquelmusic: This Saturday Im joining @KrusadersMusic to Intents + Golden Earring more... Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
  25. 25. Adaptive Faceted Search user Adaptive Faceted Search How to adapt theHow to represent facet-value pairthe content of a ranking to the User and Context Modeling tweet? current demands of facet extraction the user?  query suggestions Semantic Enrichment Twitter posts Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
  26. 26. Facet Extraction and Semantic Enrichment powered by Julian Assange @bob: Julian Assange got Tweet-based arrested enrichmentJulian Assange Julian Assange Julian Assange arrested Link-based London Julian Assange, the founder of Julian Assange enrichment WikiLeaks, is under arrest in WikiLeaks London… LondonWikiLeaks Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
  27. 27. Faceted-search vs. hashtag-based (keyword) search Faceted search based on semantic enrichment of tweets outperforms hashtgag-based search significantly. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
  28. 28. Impact of link-based enrichment Personalized strategy outperforms baseline significantly Link-based enrichment improves quality for both strategies Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
  29. 29. Twitcident application Search & Filtering Browsing Twitter streams topic information needTwitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations) Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
  30. 30. 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 30
  31. 31. Search & Browsing Automatic Filtering Twitcident PipelineTwitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 31
  32. 32. Faceted SearchFiltered Twitter stream 32
  33. 33. Real-time visualizations 33
  34. 34. Could we see it coming? Popular artist made a joke Impact about the weather storm Term usage 25 minutes before the incident 1. heavy weather, hail balls, lightning, pitch black… 2. drama, panic, hell, serious, extreme…“ ” 34
  35. 35. Spotting eye witnesses 35
  36. 36. Real-time information from eyewitness 36
  37. 37. SummaryAutomatic Filtering of Tweets: [#MSM@WWW ’12]• Topic-sensitive and topic-insensitive features• Semantic features (semantic relatedness, diversity, sentiment are beneficial)Search and browsing: [ISWC ’11]• Faceted Search• Personalization & contextualization helpsApplication: [Hypertext ‘12, Demo@WWW’12]• Twitcident: fulfilling information needs during incidentsFuture works:• Weak signal detection based on tweets• Duplicate detection Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
  38. 38. Thank you! @fabianabel http://wis.ewi.tudelft.nl/Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38

×