Learning Semantic Relationships between Entities in Twitter
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Learning Semantic Relationships between Entities in Twitter

  • 2,040 views
Uploaded on

Slides presented at ICWE 2011: Learning Semantic Relationships between Entities in Twitter...

Slides presented at ICWE 2011: Learning Semantic Relationships between Entities in Twitter

Supporting web site: http://wis.ewi.tudelft.nl/icwe2011/relation-learning/

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,040
On Slideshare
1,639
From Embeds
401
Number of Embeds
3

Actions

Shares
Downloads
33
Comments
0
Likes
1

Embeds 401

http://wis.ewi.tudelft.nl 319
http://www.wis.ewi.tudelft.nl 79
http://translate.googleusercontent.com 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Motivation:Information overloadPersonalised “better” search
  • Why do people search on Twitter rather than Google?Real time info & opinion about almost anything
  • Example: HT’11 @Eindhoven, looking for some entertainment events...http://search.twitter.com/http://search.twitter.com/advanced
  • Space limitation + selecting keywords (abbreviations –shorthand notations + colloquial expressions)
  • Highlight 60
  • Very time consuming and overwhelming indeed!
  • entity extraction and semantic enrichment and relation discovery.
  • large dataset of more than 10 million tweets and 70,000 news articles
  • 100-1000 news articles per day50,000 and 400,000 tweets per dayTwo of the minima were caused by temporary unavailability of the Twitter monitoring service.
  • approximately 10,000-100,000 entity references per day for tweetsapproximately 5,000-20,000 entity references per day for News~40% of the tweets had no entities99.3% of the news articles had at least one entityoverlap of entities: 72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the news media.
  • 39 different entity typesPersons, locations and organizations were mentioned most often, followed by movies, music albums, sport events and political eventswe analyzed specific types of relations such as relationships between persons and locations or organizations and events in detail
  • Person/Group-Event relationships cover relations between persons and political events, persons and sport events, organizations and sport events, etcinteresting to see that the Tweet+News-based strategy discovers relationships, between persons/groups and events with higher precision 0.92 and 0.87 regarding P@10 and P@20 than people's relations to products (0.23 and 0.26) or locations (0.73 and 0.6).
  • relations between entities that are of the same typerelationships between two events can also be discovered with high precision, followed by relations between locations...
  • Twitter is more appropriate for inferring relationships, which have temporal constraints, than the news media.Tweet-based strategy improves precision (P@5) by 22.7%
  • relationships between persons and movies or music albums emerge much faster (14.7 and 5.1 days respectively) in Twitter than in the traditional news media.
  • Our framework extracts typed entities from enriched tweets/news and provides strategies for detecting semantic (trending) relationships between entities. We:investigated the precision and recall of the relation detection strategies,analyzed how the strategies perform for each type of relationships andWhich strategy performs best in detecting relationships between entities?Does the accuracy depend on the type of entities which are involved in a relation?How do the strategies perform for discovering relationships which have temporal constraints, and how fast can the strategies detect (trending) relationships?evaluated the quality and speed for discovering trending relationships that possibly have a limited temporal validity.
  • Very time consuming and overwhelming indeed!

Transcript

  • 1. Learning Semantic Relationships between Entities in Twitter
    ICWE, Cyprus, June 22, 2011
    IlknurCelik, Fabian Abel, Geert-Jan Houben
    Web Information Systems, TU Delft
  • 2. What we do: Science and Engineering for the Personal Web
    domains: news social mediacultural heritage public datae-learning
    Personalized
    Recommendations
    Personalized Search
    Adaptive Systems
    Analysis and
    User Modeling
    Semantic Enrichment, Linkage and Alignment
    user/usage data
    Social Web
  • 3. 60,000,000
    number of tweets published per day
  • 4. 1
    number of tweets per day that are interesting for me
  • 5. Searching on Twitter
  • 6. Issues with Multiple Keywords Search
  • 7. Let’s try to search with One Keyword
  • 8. Page 1
  • 9. Page 2
  • 10. Page 3
  • 11. Page 60!!
    Music Artist
    Next Saturday @thatsimpsonguyaka Guilty Simpson will be performing at
    Area51 in my hometwonEindhoven. #realliveshit #iwillspinrecords
    about 9 hours ago via Blackberry
    tweet I was
    looking for
    Locations
  • 12. Is there an easier way?Faceted Search can help
    Current Query:
    Expand Query:
    Results:
    Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2
    Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL
    sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents
    Eindhoven
    Music
    Locations more...
    Events more...
    Music Artists:
    + Guilty Simpson
    + Bryan Adams
    + Elton John
    + Golden Earring
    + Rihanna
    + The eagles
    + 3 Doors Down
    more...
  • 13. Location:
    Eindhoven
    Music Artist:
    Guilty Simpson
    Location:
    Area51
    Semantic relationships between entities are essential to realize such applications.
  • 14. Relation Discovery Framework
    Relation Discovery Framework
    temporal
    constraints
    relation
    type
    typed relations
    microblog
    posts
    Entity extraction &
    semantic enrichment
    Location A
    Person A
    Location A
    Person A
    isLocatedIn
    Relation
    discovery
    Location B
    Group A
    Person A
    Group A
    involvedIn
    news articles
    Event A
    weighting
    scheme
    source
    selection
    Applications
    • Browsing support
    • 15. Query suggestions
    • 16. Schema enrichment
  • Entity Extraction and Semantic Enrichment
    powered by
    Julian Assange
    @bob: JulianAssange got arrested http://bit.ly/5d4r2t
    Julian Assange
    Tweet-based
    enrichment
    Julian Assange
    Julian Assange
    JulianAssangearrested
    JulianAssange, the founder of
    WikiLeaks, is under arrest in
    London…
    News-based
    enrichment
    Julian Assange
    London
    WikiLeaks
    London
    WikiLeaks
  • 17. Relation Learning Strategies
    entities
    time period
    Relation:
    relation(e1, e2, type, tstart, tend, weight)
    RelationLearningstrategy:
    Input:entity e1 and e2, time period (tstart, tend)
    Challenge:inferweightand type of the relationfor the given
    Weightingaccording to co-occurrence frequency:
    Tweet-based: count co-occurrence in tweets
    News-based: count co-occurrence in news
    Tweet-News-based: count co-occurrence in both tweets and news
    type/label of relation
    relatedness
  • 18. Research Questions
    Which strategy performs best in detecting relationships between entities?
    Does the accuracy depend on the type of entities which are involved in a relation?
    How do the strategies perform for discovering relationships which have temporal constraints (trending relationships)?
  • 19. Dataset
    more than:
    20,000
    Twitter users
    2
    months
    10,000,000
    WikiLeaks founder, Julian Assange, under arrest in London
    tweets
    75,000
    news
    time
    Dec 15
    Jan 15
    Nov 15
  • 20. Dataset Characteristics
  • 21. Tweets and news articles per day
    50,000-400,000
    tweets per day
    100-1000
    news articles
    per day
  • 22. Entities referenced per day
    10,000-100,000
    entity ref.
    in tweets
    per day
    5,000-20,000
    entity ref.
    in news
    per day
    ~40% tweets do not mention any (recognizable) entity
    72.6% of the top 1000 mentioned entities in Twitter are also mentioned in the mainstream news media
    99.3% of the news articles mention at least one (recognizable) entity
  • 23. Number of Distinct Entities per Entity Types
    39 types of entities
  • 24. Performance of Relation Learning Strategies
  • 25. Our Ground Truth of true relations
    Based on DBpedia:
    We mapped entities to their corresponding DBpedia resources
    No appropriate DBpedia URIs for more than 35% of the entities
    We analyzed whether there is a direct relation between two entities
    Based on user study:
    Participants judged whether two entities are really:
    related (62.6% were rated as related)
    related in the given time period (57.3% were rated as related)
    Overall: 2588 judgments
    Thank you!
  • 26. 1. Which strategy performs best in detecting relationships between entities?
  • 27. Accuracy of relation discovery
    Combining both tweet-based and news-based strategies allows for highest accuracy
    Based on user study
    Based on DBpedia
  • 28. F-Measure@k
    Combined strategy (and news-based) increase in performance.
    Tweet-based strategy saturates quickly
  • 29. 2. Does the accuracy depend on the type of entities which are involved in a relation?
  • 30. Does the accuracy depend on the type of entities?
    87%
    precision
    92%
    Relationships which involve events can be discovered with high precision
    26%
    precision
    23%
  • 31. Does the accuracy depend on the type of entities? (cont.)
    Relationships between events can be detected with highest precision.
    Relationships between persons/groups are difficult to detect.
  • 32. 3. How do the strategies perform for discovering relationships which have temporal constraints?
  • 33. Relationships with temporal constraints
    Tweet-based strategy performs better in discovering relationships that are valid only for a specific period in time
  • 34. Where do relationships emerge faster?
    Speed of strategies is domain-dependent
    time difference (in days) of first occurrence of relationship
    News is faster
    Twitter is faster
  • 35. Conclusions and Future Work
    What we did: relation discovery framework based on Twitter
    Findings:
    Strategy that considers both tweets and (linked) news articles allows for highest accuracy
    Performance varies for different domains (e.g. event-relationships can be detected with highest precision)
    Tweet-based strategy allows for detecting relationships, which have a restricted temporal validity, with high precision (and fast)
    Ongoing work: Adaptive Faceted Search on Twitter
    http://wis.ewi.tudelft.nl/tweetum/
  • 36. Relation Discovery for Adaptive Faceted Search
    Current Query:
    2. Analyze (temporal)relationships of entities that appear in the user profile to adapt facet ranking.
    Expand Query:
    Results:
    Yskiddd: Next saturday@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit#iwillspinrecords2
    Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL
    sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents
    Eindhoven
    Music
    Locations more...
    Events more...
    Music Artists:
    + Guilty Simpson
    + Bryan Adams
    + Elton John
    + Golden Earring
    + Rihanna
    + The eagles
    + 3 Doors Down
    more...
    1. Analyze (temporal)relationships of entities of the “current query” to adapt facet ranking.
    user
  • 37. Thank you!
    IlknurCelik, Fabian Abel, Geert-Jan Houben
    Twitter: @persweb
    http://wis.ewi.tudelft.nl/tweetum/
  • 38. The Social Web
    Help me to tackle the information overload!
    Who is this? What are his
    personal demands? How
    can we make him happy?
    Recommend me news articles that now interest me!
    Help me to find interesting (social) media!
    Give me personalized support when I do my online training!
    Personalize my Web experience!
    Do not bother me with advertisements that are not interesting for me!