Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June...
200,000,000    number of tweets published per day Semantics + Filtering + Search = Twitcident - Exploring Information in S...
Pukkelpop 2011                 People tweet about everything,                                 everywhere :-)              ...
200,000,000Pukkelpop 2011became a tragedy                            Filtering Useful tweets?    81,000 tweets in four hou...
Case NijmegenTrain accident                 5
First tweet…         And then your train blasts off full of the               anvils. #Nijmegen #veolia    Semantics + Fil...
First picture…                       Astonishing! My train rams the platform at                      Nijmegen! http://pic....
Traditional news media  A train ramed the anvils at Nijmegen.    Semantics + Filtering + Search = Twitcident - Exploring I...
Research Challenges  1. (Automatic) Filtering: Given an incident, how can one     automatically identify those tweets that...
Search &                                                                 Analytics                                        ...
Twitcident system                                                                                                         ...
Search &                                                                 Analytics                                        ...
Incident detection                                                                                                        ...
Incident Profiling•For an incident i:  • The profile of an incident is    described as a set of tuples.                   ...
Search &                                                                 Analytics                                        ...
Social Media Aggregation• Collecting Twitter messages, pictures, and  videos from Social Media Platforms e.g. Twitter,  Ph...
Search &                                                                 Analytics                                        ...
Semantic Enrichment•Named Entity Recognition•Classification : Casualties, Damages, Risks…•Linkage : External Resources•Met...
Search &                                                                 Analytics                                        ...
Filtering•Which tweets are relevant to the incidents?  • Preprocessing : Language detection  • Semantic Filtering : Compar...
Search &                                                                 Analytics                                        ...
Faceted Search•Strategies (ranking)  • Frequency-based  • Time-sensitive based  • Personalized    Semantics + Filtering + ...
Real-time analytics    What type of things are mentioned in the tweets?   Impact Area  What aspects are mentioned over tim...
Evaluation - Dataset•Twitter corpus (TREC Microblog Track 2011 )  • 16 million tweets (Jan. 24th – Feb. 8th, 2011 )  • 4,7...
EvaluationFor tweets Filtering (1/2)         ! "( %                ! " &%         ! " %                                   ...
EvaluationFor tweets Filtering (2/2)The semantic strategy is more robust andachieves higher precisions for complex topics....
Evaluation For Faceted Search (1/2)                                           +%                                &#$- %    ...
EvaluationFor Faceted Search (2/2)         ! ") +%                    ! ")  %           ! ") %     ! "#&%                 ...
Conclusions• What we have done:  • Twitcident, a framework for filtering, searching, and   analyzing information about inc...
Thank you!                            @wisdelft                      http://twitcident.org                    Ke Tao      ...
Upcoming SlideShare
Loading in …5
×

Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

877 views

Published on

Talk by Ke Tao (from Web Information Systems, TU Delft) at 23rd ACM Conference on Hypertext and Social Media, June 28 2012, Milwaukee, WI, USA

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
877
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  • people tweet about everything, e.g. when they are at some festival like Pukkelpop(普客pop) they may report about their experiences...
  • this festival actually became a disaster (5 people died) - 80k tweets where published in the first 4 hours (during the incident, the emergency services had problems in getting an overview on the situation) -> how can one (a) automatically filter information from Twitter and (b) provide search and analytics? (s4)
  • there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  • there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  • there are millions of tweets posted every dayMotivation:Information overloadPersonalised “better” search
  • Research challenges here.
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Twitcident pipeline = how we tackle these challengesWe get information from emergency broadcasters, or even formulate something we want to monitor during the big events in advance. In these ways, we can get the basic information about the incidents or events.Then we do the automatic filtering by 4 steps. First we construct the profiles of the incidents, including the metadata of the incidents such as location, the names of organization and people involved. Next we aggregate the information like texts, pictures, and videos from social web, especially on Twitter. Then, we extract the semantics from these media, try to know more about what are these information talking about, where were these information posted. Then we filter the aggregated information in order to get the incident-relevant media. Further refine.On top of these, we use search and various analytics to satisfy the information need from authorities and general public.
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Search, Filtering, AnalyticsWWW 2008Koren et al. Personalized Interactive Faceted Search
  • Search, Filtering, Analytics
  • Search, Filtering, Analytics
  • Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams

    1. 1. Semantics + Filtering + Search = TwitcidentExploring Information in Social Web StreamsHypertext 2012, Milwaukee, WI – June 28 Fabian Abel, Claudia Hauff, Geert-Jan Houben, Richard Stronkman, Ke Tao Web Information Systems, TU Delft, the Netherlands Delft University of Technology
    2. 2. 200,000,000 number of tweets published per day Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 2
    3. 3. Pukkelpop 2011 People tweet about everything, everywhere :-) 3
    4. 4. 200,000,000Pukkelpop 2011became a tragedy Filtering Useful tweets? 81,000 tweets in four hours Search & Analytics 4
    5. 5. Case NijmegenTrain accident 5
    6. 6. First tweet… And then your train blasts off full of the anvils. #Nijmegen #veolia Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 6
    7. 7. First picture… Astonishing! My train rams the platform at Nijmegen! http://pic.twitter.com/QVVfJHyd Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 7
    8. 8. Traditional news media A train ramed the anvils at Nijmegen. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 8
    9. 9. Research Challenges 1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident? 2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets? Search & Filtering AnalyticsTwitter streams topic information need Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 9
    10. 10. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 10
    11. 11. Twitcident system ! "#$% ( %6% , 8&#*( % ( , - . , ( / % % ( % ( ( $& $5 . $"7 % + 7 $" $9 , 1% ? #)$% #% + >*( ! " #$% ( ) $&#*( % ( , - . , ( /%0( *% + ( 1% #& % + . - ! "#$% %( . 2 % . *$% 4$%0( *% + ( 1% /. ( *( #3. . - :#;% #*)"% <( :3;% 2 *( /% )7 % ( 3%$*( #+ % =7$( <. #2! & :);% #2 )& >, 5- %F i gu r e 2: Scr een sh ot of t h e T w i t ci d en t sy st em : ( a) sear ch an d fi l t er i n g fu n ct i on al i t y t o ex p l or e an d r et r i ev ep ar t i cu l ar T w i t t er m essages, ( b ) m essages t h at ar e r el at ed t o t h e gi ven i nci dent ( h er e: fi r es i n T ex as) an dm at ch t h e Semanticsy+of t h e u ser+an d ( c) r ealTwitcident t-i cs of t h e m at ch i n g m essages. gi ven qu er Filtering Search = t i m e an al y Exploring Information in Social Web Streams 11 In t he T wit cident syst em, bot h facet ed search and re- incident is det ect ed t hen t he T wit cident framework t rans-
    12. 12. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 12
    13. 13. Incident detection Twitter 2. P2000 Broadcast•Twiticident relies on Initial query: (Moerdijk OR Chemie-Pack) Broadcasted incident AND (fire OR smoke OR Refined query based on Emergency (i) description: flame…) SINCE:2011-01-05 incident profiling: Prio 1 fire : : Vlasweg : 4 4782PW 1. 3. (Moerdijk OR Dordrecht…) AND Moerdijk :: Chemie Pack (#moerdijkFire OR toxic…) Broadcasting Services for detecting incidents. Twitcident Framework 4. • In the Netherlands : P2000 communication network (ii) Incident in Twitcident: Twitcident system Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 13
    14. 14. Incident Profiling•For an incident i: • The profile of an incident is described as a set of tuples. Location, 0.4 Netherlands • Each tuple includes a facet- Incident, 0.5 value pair (f, v) and its Train accident weight to the incident i. Location, 0.8 Nijmegen Orgranization, 0.6 Veolia Incident, 1.0 Crash Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 14
    15. 15. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 15
    16. 16. Social Media Aggregation• Collecting Twitter messages, pictures, and videos from Social Media Platforms e.g. Twitter, PhotoBucket, Vimeo Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 16
    17. 17. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 17
    18. 18. Semantic Enrichment•Named Entity Recognition•Classification : Casualties, Damages, Risks…•Linkage : External Resources•Metadata extraction Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 18
    19. 19. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 19
    20. 20. Filtering•Which tweets are relevant to the incidents? • Preprocessing : Language detection • Semantic Filtering : Compare tweet with P(i) • Semantic Filtering with News Context • P’(i) : P(i) complemented with f-v pairs from news Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 20
    21. 21. Search & Analytics Automatic Filtering Twitcident PipelineSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 21
    22. 22. Faceted Search•Strategies (ranking) • Frequency-based • Time-sensitive based • Personalized Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 22
    23. 23. Real-time analytics What type of things are mentioned in the tweets? Impact Area What aspects are mentioned over time? What do people report about over time? Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 23
    24. 24. Evaluation - Dataset•Twitter corpus (TREC Microblog Track 2011 ) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 ) • 4,766,901 tweets classified as English • 6.2 million entity-extractions•News (Same time period) • 62 RSS News Feeds • 13,959 News Articles • 357,559 entity-extractions Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 24
    25. 25. EvaluationFor tweets Filtering (1/2) ! "( % ! " &% ! " % ! ") #% ! ") % ! "+$% G HI % ! "+% ! "$ % ! "#( % I J &! % ! "$% ! "&) % I J $! % ! "#% ! "$*% ! "&% ! "#$% ! "#) % K- 2/5% 5 ! "& % ! "&&% ! "&#% !% , - . /012% , - . /012% B/<- 50- C 4 % 346- 74 % 5 08% 346- 74 9 4 % D- E9 >7F% 5 74 5 08% 6: 346- 08% ; - 9 <% =>06- ?6@ 4 /1>0% /5ASemantic strategies outperform the keyword-based filtering regarding all metrics. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 25
    26. 26. EvaluationFor tweets Filtering (2/2)The semantic strategy is more robust andachieves higher precisions for complex topics. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 26
    27. 27. Evaluation For Faceted Search (1/2) +% &#$- % ! "#$% +01#*2"1% % (1"3 ! "*% ! "& % ! "#$% () *+ #,% ! " #% ! " % .! &&/% % &" ! ") % ! "( % !% ,-. / 0. 1234567. 8% 67: 9 4 : 6; 567. 8% : 67: 9 4 6; 567. 8% ,62. 9 8% 6-2: % ,62. 9. 8% 6-2: %<. 3= >-8% 6-2: % . 7. 7. 7. with semantic enrichment without semantic enrichmentThe semantic faceted search strategy improvesthe search performance by 34.8% and 22.4%. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 27
    28. 28. EvaluationFor Faceted Search (2/2) ! ") +% ! ") % ! ") % ! "#&% ! "#( % ! "#+% ! "#% ! ", +% EF +% ! ", % ! "#$% ! "#+% ! "#&% ! " *% EF ! % ! "#) % ! "# % ! " +% ! ", +% ! " % G HH% ! " ( % ! "! +% ! " , % !% % % .% @% 7 7 2? ; 56. 0. 058 34 D3 >. 2 12 C: = 0. B3 </. /0 .: 0A -. 89 A3 with semantic enrichment without semantic enrichmentThe strategies with semantic enrichment outperformthe strategy without semantic enrichment inpredicting the appropriate facet-values. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 28 Adaptive Faceted Search on Twitter 3
    29. 29. Conclusions• What we have done: • Twitcident, a framework for filtering, searching, and analyzing information about incidents that people publish in their Social Web Streams• What we have achieved: • Better filtering of Twitter messages for a given incident. • Better search for relevant information about an incident within the filtered messages. Semantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 29
    30. 30. Thank you! @wisdelft http://twitcident.org Ke Tao @taubauSemantics + Filtering + Search = Twitcident - Exploring Information in Social Web Streams 30

    ×