• Like
A Corpus for Entity Profiling in Microblog Posts
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

A Corpus for Entity Profiling in Microblog Posts

  • 446 views
Published

Microblogs have become an invaluable source of information for the purpose of online reputation management. Streams of microblogs …

Microblogs have become an invaluable source of information for the purpose of online reputation management. Streams of microblogs
are of great value because of their direct and real-time nature. An emerging problem is to identify not only microblog posts (such as tweets) that are relevant for a given entity, but also the specific aspects that people discuss. Determining such aspects can be non-trivial
because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form
of communication. In this paper we present two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both
of them based upon the WePS-3 ORM task dataset and made available online. The first is created using a pooling methodology, for
which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. Human
assessors have labeled each of the candidates as being relevant. The second corpus is more fine-grained and contains opinion targets.
Here, annotators consider individual tweets related to an entity and manually identify whether the tweet is opinionated and, if so, which
part of the tweet is subjective and what the target of the sentiment is, if any.

Published in Technology , Spiritual
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
446
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
6
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A Corpus for Entity Profiling in Microblog Posts Edgar Meij, Andrei Oghina, Damiano Spina Minh T. Bui, Mathias Breuss, Maarten de Rijke UNED NLP & IR Group ISLA, University of Amsterdam Madrid, Spain Amsterdam, The Netherlands LREC Workshop on Language Engineering for Online Reputation Management May 26th, 2012 - Istambul, Turkey
  • 2. Introduction• Online Reputation Management – Public image of an entity in Online Media – Entity = { brand, organization, company, person, product }• Microblogging services (e.g. Twitter) – People sharing thoughts about an entity – Dynamic, Real-Time• Human Language Technologies – Aid to reputation managers – Retrieval and Analysis of entity mentions
  • 3. Sentiment vs. Profiling• Sentiment analysis• Entity Profiling – “hot” topics that people talk about in the context of an entity
  • 4. Our task: Aspect identification• @xbox_news here we go again, microsoft being jealous of sony again.• I lov big Sony headphones .. I lov my #music 2 b more beautiful• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 5. Our task: Aspect identification• @xbox_news here we go again, microsoft being jealous of sony again.• I lov big Sony headphones .. I lov my #music 2 b more beautiful• not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 6. Goal• Build manually annotated corpora – Evaluate the task of entity profiling in microblog streams
  • 7. A Corpus for Entity Profiling in Microblog Posts WePS-3 ORM Corpus Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)
  • 8. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 9. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 10. Approach I: Pooling aspects• Pooling methodology – 4 Ranking Methods: • TF.IDF [Salton and Buckley, 1988] • Log-Likelihood Ratio [Dunning, 1993] • Parsimonious Language Model [Hiemstra et al. 2004] • Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010] – Top 10 terms• Manual annotation
  • 11. Aspects dataset: annotation example
  • 12. Aspects dataset: outcome• Three annotators, substantial agreement (> 0.6 Cohen/Fleiss’ kappa)• 94 entities, 17775 tweets, ≈177 tweets/entity• 2455 terms, 1304 aspects (54.11%)
  • 13. Approach II: Tweet annotation• Opinion targets dataset• Tweet-level annotation – Is the tweet subjective?• Phrase-level annotation – Subjective phrase – Opinion target phrase p: • p is an aspect of the entity • p is included in a sentence that contains a direct subjective phrase • p is the target of the expressed opinion
  • 14. Opinion Targets dataset: annotation example
  • 15. Opinion targets dataset: outcome• 59 entities, 9396 tweets, ≈159 tweets/entity• 15.16% of tweets with subjective phrases• 13.82% of tweets with opinion targets
  • 16. Aspects vs. Opinion targets Terms in Opinion TargetsAspects 783 270 1650
  • 17. Aspects vs. Opinion targets Terms in Opinion TargetsAspects 12.67% 783 270 1650 26.69%
  • 18. A Corpus for Entity Profiling in Microblog Posts• Available at http://bitly.com/profilingTwitter • 59 entities, 9,396 tweets, • 94 entities, 17,775 tweets ≈159 tweets/entity ≈177 tweets/entity • 15.16% of tweets with subj. phrases • 2455 terms, 1304 aspects (54.11%) • 13.82% of tweets with opinion targets Tweet annotation Pooling Opinion targets Aspects dataset dataset WePS-3 ORM Corpus