Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams
Upcoming SlideShare
Loading in...5
×
 

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams

on

  • 516 views

Slides presented at the EIT DataBridges Workshop in Paris: http://team.inria.fr/oak/share/workshop-databridges-april-12-2012/

Slides presented at the EIT DataBridges Workshop in Paris: http://team.inria.fr/oak/share/workshop-databridges-april-12-2012/

Statistics

Views

Total Views
516
Views on SlideShare
516
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Motivation:Information overloadPersonalised “better” search
  • Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  • Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  • Traditional Twitter SearchHighlight what does keyword matching means, the keywords, in search query and tweets.
  • Title -> syntactical featuresBox in the tweetGreen boxes for the hypothesesFlow from keyword-based relevance to … Slide 5-8, flow
  • Subtitle -> question?
  • Introduction to the usage of @, including mentions, and reply. Reply tweets frequently occur in private conversations. Therefore particularly, make a hypothesis about reply tweet.
  • The 21st International World Wide Web Conference #www2012 will take place in Lyon, France April 16-20 2012 @www2012Lyon www2012.wwwconference.orgSubtitle questionOne short, one longcomparison
  • Fade in the question later.
  • Fade in the entities one by one.
  • Fade in the entities one by one.
  • Fade in the entities one by one.
  • Not highlight www, lyon, france
  • 18Can we utilize the contextual features.
  • Titles,
  • Timeline
  • Number of features.
  • ComparisonFade in the pairsHighlightTextbox -> Conclusion, (precision)
  • Very time consuming and overwhelming indeed!
  • entity extraction and semantic enrichment and relation discovery.
  • Eet en drankbonnent.w.v. 75 a 100 euro beschikbaarstellen.
  • Case #1: vroegsignalering
  • Case 1:handhaving (beeldrondom incident)
  • Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.

Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams Twitter, Twinder, Twitcident: Filtering and Search in Social Web Streams Presentation Transcript

  • Twitter, Twinder, Twitcident: Filteringand Search on Social Web StreamsData Bridges Workshop, Inria, Paris, April 12th 2012 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
  • 200,000,000 number of tweets published per day Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 2
  • Pukkelpop 2011 People tweet about everything, everywhere :-) 3
  • 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 4
  • Challenges 1. (Automatic) Filtering: Given a topic (e.g. expressed via some keywords), how can one automatically identify those tweets that are relevant to the topic? 2. Search & Browsing: How can one improve search and browsing capabilities so that users can explore information in the streams of tweets (that are relevant for a topic)? Twinder Filtering Search & filtering Browsing and search frameworkTwitter streams topic information need Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 5
  • Search & Filtering Browsing Twitter streams topic information need1. Filtering of Twitter streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 6
  • Filtering onTwitter Query: www2012 Typical approach: Keyword-based matchingAre there further features that can be used asindicators for estimating the relevance of a tweetfor a topic? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 7
  • Syntactical feature: hashtagsIs a tweet more relevant ifitcontains a #hashtag? Hypothesis: tweets that contain hashtags are more likely to be relevant than tweets that do not contain hashtags. #Hashtag Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 8
  • Syntactical feature: URLsIs a tweetthatcontains a URL more relevant? Hypothesis: tweets that contain a URL are more likely to be relevant than tweets that do not contain a URL. URL Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 9
  • Syntactical feature: “mentions” Is a tweetthatmentions@somebodymore relevant? Hypothesis: tweets that are formulated as a reply to another tweet are less likely to be relevant than other tweets.Reply @mention Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 10
  • Syntacticalfeature: lengthDoes the length of a tweetinfluenceitsrelevancefor a topic? 54 characters (9 words) vs. 140 characters (20 words) Hypothesis: the longer a tweet, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 11
  • Overview of featuresTopic-sensitive and topic-insensitive features Topic sensitive Topic insensitive Keyword-based Syntactical features relevance What about the semantics? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 12
  • Semanticfeatures: number of entitiesFindsemantics in a tweettoestimate the relevance dbp:Tim_Berners-Lee dbp:World_Wide_Web dbp:WWW_Conference dbp:France dbp:Lyon Hypothesis: the more entities a tweet mentions, the more likely it is to be relevant and interesting. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 13
  • Semanticfeatures: diversityThe types of entitiesthat are featuredby a tweet matter Place Place Place Place Person Thing vs. I plan to visit Paris, Bordeaux, Grenoble, Nice, Marseille and Lyon. Event Place Place Place Place Hypothesis: the higher the diversity of entities that are mentioned in a tweet, the more likely it is to be relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 14
  • Semanticfeatures: sentiment Opinionsexpressed in tweets are interestingLooking forward to the WWWconference :-) Yes! vs. I plan to visit Paris, Bordeaux, vs. Grenoble, Nice, Marseille and Lyon. Why are the big players not releasing query logs to the WWW community? :-( #fail :-) neutral :-( Hypothesis: the likelihood of a tweet’s relevance is influenced by its sentiment polarity. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 15
  • Semanticrelatedness Exploitsemantics to relate query withtweets dbp:International_World_Wide_Web_Conferencedbp:Tim_Berners-Lee Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 16
  • Overview of featuresBynow, we have 4 types of features. Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context? Context? What kind of contextual features might be helpful? Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 17
  • Contextual feature: authority of the publisher Itmatterswhopublished a tweet Hypothesis: the higher the number of tweets that have been published by the creator of a tweet, the more likely it is that the tweet is relevant. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 18
  • Contextual feature: time w.r.t. queryWhen was a tweetpublished? Hypothesis: the lower the temporal distance between the query time and the creation time of a tweet, the more likely is the tweet relevant to the topic. Tweet query March 31 April 16 Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 19
  • Summary of Features Topic sensitive Topic insensitive Keyword-based Syntactical Semantic-based Semantics Context-based Context Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 20
  • ResultsAchievedfor the TREC MicroblogChallengeFeatures Precision Recall F-measurekeyword relevance 0.3040 0.2924 0.2981without semanticssemantic relevance 0.3363 0.3053 0.4828 0.2931 0.3965 0.2991all features 0.3674 0.4736 0.4138 Overall, we can achieve the precision and recall of over 35% and 45% respectively by applying all the features. Challenge the future 21
  • Importance of Features Topic-sensitive Topic-insensitive 2 2 Keyword-based Syntactical 1 1 0 0 Keyword-based relevance hasHashtag hasURL isReply length -1 -1 2 2 1 Semantic-based 1 Semantics 0 0 Relevance Relatedness #entities diversity sentiment -1 -1Semantic relatedness, URLs, !isReply, diversity and 2 Context-based Context 2sentiment are good indicators for estimating the 1 1 0 0relevance of a tweet. -1 Temporal context Keyword-based relevance -1 Social context Keyword-based relevance Challenge the future 22
  • Search & Filtering Browsing Twitter streams topic information need2. Search & Browsing in Twitter Streams Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 23
  • Idea: Faceted Search Expand Query: Current Query: Suggestions: Eindhoven Music + Guilty Simpson + Area51 Results: 1. Yskiddd: Next saturday Locations more... @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my Events more... homeytown Eindhoven. #realliveshit #iwillspinrecords2 Music Artists: 2. Usee123: Cool #EV3door7980 !!! + Guilty Simpson http://bit.ly/igyyRhL + Bryan Adams + Elton John 3. sanmiquelmusic: This Saturday Im joining @KrusadersMusic to Intents + Golden Earring more... Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 24
  • Adaptive Faceted Search user Adaptive Faceted Search How to adapt theHow to represent facet-value pairthe content of a ranking to the User and Context Modeling tweet? current demands of facet extraction the user?  query suggestions Semantic Enrichment Twitter posts Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 25
  • Facet Extraction and Semantic Enrichment powered by Julian Assange @bob: Julian Assange got Tweet-based arrested enrichmentJulian Assange Julian Assange Julian Assange arrested Link-based London Julian Assange, the founder of Julian Assange enrichment WikiLeaks, is under arrest in WikiLeaks London… LondonWikiLeaks Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 26
  • Faceted-search vs. hashtag-based (keyword) search Faceted search based on semantic enrichment of tweets outperforms hashtgag-based search significantly. Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 27
  • Impact of link-based enrichment Personalized strategy outperforms baseline significantly Link-based enrichment improves quality for both strategies Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 28
  • Twitcident application Search & Filtering Browsing Twitter streams topic information needTwitcident: Applying filter & search functionality for distilling information from Twitter during incidents (e.g. fires, extreme weather situations) Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 29
  • 200,000,000Pukkelpop 2011became a tragedy Filtering 81,000 tweets in four hours Search & Browsing 30
  • Search & Browsing Automatic Filtering Twitcident PipelineTwitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 31
  • Faceted SearchFiltered Twitter stream 32
  • Real-time visualizations 33
  • Could we see it coming? Popular artist made a joke Impact about the weather storm Term usage 25 minutes before the incident 1. heavy weather, hail balls, lightning, pitch black… 2. drama, panic, hell, serious, extreme…“ ” 34
  • Spotting eye witnesses 35
  • Real-time information from eyewitness 36
  • SummaryAutomatic Filtering of Tweets: [#MSM@WWW ’12]• Topic-sensitive and topic-insensitive features• Semantic features (semantic relatedness, diversity, sentiment are beneficial)Search and browsing: [ISWC ’11]• Faceted Search• Personalization & contextualization helpsApplication: [Hypertext ‘12, Demo@WWW’12]• Twitcident: fulfilling information needs during incidentsFuture works:• Weak signal detection based on tweets• Duplicate detection Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 37
  • Thank you! @fabianabel http://wis.ewi.tudelft.nl/Twitter, Twinder, Twitcident: Filtering and Search on Social Web Streams 38