Learning Semantic Relationships between Entities in Twitter<br />ICWE, Cyprus, June 22, 2011<br />IlknurCelik, Fabian Abel...
What we do: Science and Engineering for the Personal Web<br />domains: news  social mediacultural heritage  public datae-l...
60,000,000<br />number of tweets published per day<br />
1<br />number of tweets per day that are interesting for me<br />
Searching on Twitter<br />
Issues with Multiple Keywords Search<br />
Let’s try to search with One Keyword<br />
Page 1<br />
Page 2<br />
Page 3<br />
Page 60!!<br />Music Artist<br />Next Saturday @thatsimpsonguyaka Guilty Simpson will be performing at<br />Area51 in my h...
Is there an easier way?Faceted Search can help<br />Current Query:<br />Expand Query:<br />Results:<br />Yskiddd: Next sa...
Location:<br />Eindhoven <br />Music Artist:<br />Guilty Simpson<br />Location:<br />Area51<br />Semantic relationships be...
Relation Discovery Framework<br />Relation Discovery Framework<br />temporal <br />constraints<br />relation <br />type<br...
 Query suggestions
 Schema enrichment</li></li></ul><li>Entity Extraction and Semantic Enrichment<br />powered by<br />Julian Assange<br />@b...
Relation Learning Strategies<br />entities<br />time period<br />Relation: 	<br />relation(e1, e2, type, tstart, tend, wei...
Research Questions<br />Which strategy performs best in detecting relationships between entities?<br />Does the accuracy d...
Dataset<br />more than:<br /> 20,000<br />Twitter users<br />2<br />months<br />10,000,000<br />WikiLeaks founder, Julian ...
Dataset Characteristics<br />
Tweets and news articles per day<br />50,000-400,000<br />tweets per day<br />100-1000 <br />news articles<br />per day<br />
Entities referenced per day<br />10,000-100,000<br />entity ref. <br />in tweets<br />per day<br />5,000-20,000 <br />enti...
Number of Distinct Entities per Entity Types<br />39 types of entities<br />
Performance of Relation Learning Strategies<br />
Our Ground Truth of true relations<br />Based on DBpedia:<br />We mapped entities to their corresponding DBpedia resources...
1. Which strategy performs best in detecting relationships between entities?<br />
Accuracy of relation discovery <br />Combining both tweet-based and news-based strategies allows for highest accuracy<br /...
F-Measure@k<br />Combined strategy (and news-based) increase in performance.<br />Tweet-based strategy saturates quickly<b...
2. Does the accuracy depend on the type of entities which are involved in a relation?<br />
Does the accuracy depend on the type of entities?<br />87% <br />precision<br />92% <br />Relationships which involve even...
Does the accuracy depend on the type of entities? (cont.)<br />Relationships between events can be detected with highest p...
3. How do the strategies perform for discovering relationships which have temporal constraints?<br />
Relationships with temporal constraints<br />Tweet-based strategy performs better in discovering relationships that are va...
Where do relationships emerge faster?<br />Speed of strategies is domain-dependent<br />time difference (in days) of first...
Conclusions and Future Work<br />What we did: relation discovery framework based on Twitter<br />Findings:<br />Strategy t...
Relation Discovery for Adaptive Faceted Search<br />Current Query:<br />2. Analyze (temporal)relationships of entities tha...
Thank you!<br />IlknurCelik, Fabian Abel, Geert-Jan Houben<br />Twitter: @persweb<br />http://wis.ewi.tudelft.nl/tweetum/<...
Upcoming SlideShare
Loading in …5
×

Learning Semantic Relationships between Entities in Twitter

2,201 views

Published on

Slides presented at ICWE 2011: Learning Semantic Relationships between Entities in Twitter

Supporting web site: http://wis.ewi.tudelft.nl/icwe2011/relation-learning/

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,201
On SlideShare
0
From Embeds
0
Number of Embeds
479
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Motivation:Information overloadPersonalised “better” search
  • Why do people search on Twitter rather than Google?Real time info &amp; opinion about almost anything
  • Example: HT’11 @Eindhoven, looking for some entertainment events...http://search.twitter.com/http://search.twitter.com/advanced
  • Space limitation + selecting keywords (abbreviations –shorthand notations + colloquial expressions)
  • Highlight 60
  • Very time consuming and overwhelming indeed!
  • entity extraction and semantic enrichment and relation discovery.
  • large dataset of more than 10 million tweets and 70,000 news articles
  • 100-1000 news articles per day50,000 and 400,000 tweets per dayTwo of the minima were caused by temporary unavailability of the Twitter monitoring service.
  • approximately 10,000-100,000 entity references per day for tweetsapproximately 5,000-20,000 entity references per day for News~40% of the tweets had no entities99.3% of the news articles had at least one entityoverlap of entities: 72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the news media.
  • 39 different entity typesPersons, locations and organizations were mentioned most often, followed by movies, music albums, sport events and political eventswe analyzed specific types of relations such as relationships between persons and locations or organizations and events in detail
  • Person/Group-Event relationships cover relations between persons and political events, persons and sport events, organizations and sport events, etcinteresting to see that the Tweet+News-based strategy discovers relationships, between persons/groups and events with higher precision 0.92 and 0.87 regarding P@10 and P@20 than people&apos;s relations to products (0.23 and 0.26) or locations (0.73 and 0.6).
  • relations between entities that are of the same typerelationships between two events can also be discovered with high precision, followed by relations between locations...
  • Twitter is more appropriate for inferring relationships, which have temporal constraints, than the news media.Tweet-based strategy improves precision (P@5) by 22.7%
  • relationships between persons and movies or music albums emerge much faster (14.7 and 5.1 days respectively) in Twitter than in the traditional news media.
  • Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.
  • Very time consuming and overwhelming indeed!
  • Learning Semantic Relationships between Entities in Twitter

    1. 1. Learning Semantic Relationships between Entities in Twitter<br />ICWE, Cyprus, June 22, 2011<br />IlknurCelik, Fabian Abel, Geert-Jan Houben<br />Web Information Systems, TU Delft<br />
    2. 2. What we do: Science and Engineering for the Personal Web<br />domains: news social mediacultural heritage public datae-learning<br />Personalized<br />Recommendations<br />Personalized Search<br />Adaptive Systems <br />Analysis and <br />User Modeling<br />Semantic Enrichment, Linkage and Alignment<br />user/usage data<br />Social Web<br />
    3. 3. 60,000,000<br />number of tweets published per day<br />
    4. 4. 1<br />number of tweets per day that are interesting for me<br />
    5. 5. Searching on Twitter<br />
    6. 6. Issues with Multiple Keywords Search<br />
    7. 7. Let’s try to search with One Keyword<br />
    8. 8. Page 1<br />
    9. 9. Page 2<br />
    10. 10. Page 3<br />
    11. 11. Page 60!!<br />Music Artist<br />Next Saturday @thatsimpsonguyaka Guilty Simpson will be performing at<br />Area51 in my hometwonEindhoven. #realliveshit #iwillspinrecords<br />about 9 hours ago via Blackberry<br />tweet I was <br />looking for<br />Locations<br />
    12. 12. Is there an easier way?Faceted Search can help<br />Current Query:<br />Expand Query:<br />Results:<br />Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2<br />Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL <br />sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents<br />Eindhoven<br />Music<br />Locations more...<br />Events more...<br />Music Artists:<br />+ Guilty Simpson<br />+ Bryan Adams<br />+ Elton John<br />+ Golden Earring<br />+ Rihanna<br />+ The eagles<br />+ 3 Doors Down<br />more...<br />
    13. 13. Location:<br />Eindhoven <br />Music Artist:<br />Guilty Simpson<br />Location:<br />Area51<br />Semantic relationships between entities are essential to realize such applications.<br />
    14. 14. Relation Discovery Framework<br />Relation Discovery Framework<br />temporal <br />constraints<br />relation <br />type<br />typed relations<br />microblog<br />posts<br />Entity extraction &<br />semantic enrichment<br />Location A<br />Person A<br />Location A<br />Person A<br />isLocatedIn<br />Relation <br />discovery<br />Location B<br />Group A<br />Person A<br />Group A<br />involvedIn<br />news articles<br />Event A<br />weighting <br />scheme<br />source<br />selection<br />Applications<br /><ul><li> Browsing support
    15. 15. Query suggestions
    16. 16. Schema enrichment</li></li></ul><li>Entity Extraction and Semantic Enrichment<br />powered by<br />Julian Assange<br />@bob: JulianAssange got arrested http://bit.ly/5d4r2t<br />Julian Assange<br />Tweet-based<br />enrichment<br />Julian Assange<br />Julian Assange<br /> JulianAssangearrested<br />JulianAssange, the founder of<br />WikiLeaks, is under arrest in<br />London…<br />News-based<br />enrichment<br />Julian Assange<br />London<br />WikiLeaks<br />London<br />WikiLeaks<br />
    17. 17. Relation Learning Strategies<br />entities<br />time period<br />Relation: <br />relation(e1, e2, type, tstart, tend, weight)<br />RelationLearningstrategy: <br />Input:entity e1 and e2, time period (tstart, tend)<br />Challenge:inferweightand type of the relationfor the given<br />Weightingaccording to co-occurrence frequency:<br />Tweet-based: count co-occurrence in tweets<br />News-based: count co-occurrence in news<br />Tweet-News-based: count co-occurrence in both tweets and news<br />type/label of relation<br />relatedness<br />
    18. 18. Research Questions<br />Which strategy performs best in detecting relationships between entities?<br />Does the accuracy depend on the type of entities which are involved in a relation?<br />How do the strategies perform for discovering relationships which have temporal constraints (trending relationships)?<br />
    19. 19. Dataset<br />more than:<br /> 20,000<br />Twitter users<br />2<br />months<br />10,000,000<br />WikiLeaks founder, Julian Assange, under arrest in London<br />tweets<br />75,000<br />news<br />time<br />Dec 15<br />Jan 15<br />Nov 15<br />
    20. 20. Dataset Characteristics<br />
    21. 21. Tweets and news articles per day<br />50,000-400,000<br />tweets per day<br />100-1000 <br />news articles<br />per day<br />
    22. 22. Entities referenced per day<br />10,000-100,000<br />entity ref. <br />in tweets<br />per day<br />5,000-20,000 <br />entity ref.<br />in news<br />per day<br />~40% tweets do not mention any (recognizable) entity<br />72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the mainstream news media<br />99.3% of the news articles mention at least one (recognizable) entity<br />
    23. 23. Number of Distinct Entities per Entity Types<br />39 types of entities<br />
    24. 24. Performance of Relation Learning Strategies<br />
    25. 25. Our Ground Truth of true relations<br />Based on DBpedia:<br />We mapped entities to their corresponding DBpedia resources<br />No appropriate DBpedia URIs for more than 35% of the entities<br />We analyzed whether there is a direct relation between two entities<br />Based on user study:<br />Participants judged whether two entities are really:<br />related (62.6% were rated as related)<br />related in the given time period (57.3% were rated as related)<br />Overall: 2588 judgments<br />Thank you!<br />
    26. 26. 1. Which strategy performs best in detecting relationships between entities?<br />
    27. 27. Accuracy of relation discovery <br />Combining both tweet-based and news-based strategies allows for highest accuracy<br />Based on user study<br />Based on DBpedia<br />
    28. 28. F-Measure@k<br />Combined strategy (and news-based) increase in performance.<br />Tweet-based strategy saturates quickly<br />
    29. 29. 2. Does the accuracy depend on the type of entities which are involved in a relation?<br />
    30. 30. Does the accuracy depend on the type of entities?<br />87% <br />precision<br />92% <br />Relationships which involve events can be discovered with high precision<br />26% <br />precision<br />23% <br />
    31. 31. Does the accuracy depend on the type of entities? (cont.)<br />Relationships between events can be detected with highest precision.<br />Relationships between persons/groups are difficult to detect.<br />
    32. 32. 3. How do the strategies perform for discovering relationships which have temporal constraints?<br />
    33. 33. Relationships with temporal constraints<br />Tweet-based strategy performs better in discovering relationships that are valid only for a specific period in time<br />
    34. 34. Where do relationships emerge faster?<br />Speed of strategies is domain-dependent<br />time difference (in days) of first occurrence of relationship<br />News is faster<br />Twitter is faster<br />
    35. 35. Conclusions and Future Work<br />What we did: relation discovery framework based on Twitter<br />Findings:<br />Strategy that considers both tweets and (linked) news articles allows for highest accuracy<br />Performance varies for different domains (e.g. event-relationships can be detected with highest precision)<br />Tweet-based strategy allows for detecting relationships, which have a restricted temporal validity, with high precision (and fast)<br />Ongoing work: Adaptive Faceted Search on Twitter<br />http://wis.ewi.tudelft.nl/tweetum/<br />
    36. 36. Relation Discovery for Adaptive Faceted Search<br />Current Query:<br />2. Analyze (temporal)relationships of entities that appear in the user profile to adapt facet ranking.<br />Expand Query:<br />Results:<br />Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2<br />Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL <br />sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents<br />Eindhoven<br />Music<br />Locations more...<br />Events more...<br />Music Artists:<br />+ Guilty Simpson<br />+ Bryan Adams<br />+ Elton John<br />+ Golden Earring<br />+ Rihanna<br />+ The eagles<br />+ 3 Doors Down<br />more...<br />1. Analyze (temporal)relationships of entities of the “current query” to adapt facet ranking.<br />user<br />
    37. 37. Thank you!<br />IlknurCelik, Fabian Abel, Geert-Jan Houben<br />Twitter: @persweb<br />http://wis.ewi.tudelft.nl/tweetum/<br />
    38. 38. The Social Web<br />Help me to tackle the information overload!<br /> Who is this? What are his <br />personal demands? How <br />can we make him happy?<br />Recommend me news articles that now interest me!<br />Help me to find interesting (social) media!<br />Give me personalized support when I do my online training!<br />Personalize my Web experience!<br />Do not bother me with advertisements that are not interesting for me!<br />

    ×