Microblogs have become an invaluable source of information for the purpose of online reputation management. Streams of microblogs
are of great value because of their direct and real-time nature. An emerging problem is to identify not only microblog posts (such as tweets) that are relevant for a given entity, but also the specific aspects that people discuss. Determining such aspects can be non-trivial
because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form
of communication. In this paper we present two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both
of them based upon the WePS-3 ORM task dataset and made available online. The first is created using a pooling methodology, for
which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. Human
assessors have labeled each of the candidates as being relevant. The second corpus is more fine-grained and contains opinion targets.
Here, annotators consider individual tweets related to an entity and manually identify whether the tweet is opinionated and, if so, which
part of the tweet is subjective and what the target of the sentiment is, if any.
WePS-3 ORM Corpus for Entity Profiling in Microblog Posts
1. A Corpus for Entity Profiling in
Microblog Posts
Edgar Meij, Andrei Oghina,
Damiano Spina Minh T. Bui, Mathias Breuss,
Maarten de Rijke
UNED NLP & IR Group ISLA, University of Amsterdam
Madrid, Spain Amsterdam, The Netherlands
LREC Workshop on
Language Engineering for Online Reputation Management
May 26th, 2012 - Istambul, Turkey
2. Introduction
• Online Reputation Management
– Public image of an entity in Online Media
– Entity = { brand, organization, company, person, product }
• Microblogging services (e.g. Twitter)
– People sharing thoughts about an entity
– Dynamic, Real-Time
• Human Language Technologies
– Aid to reputation managers
– Retrieval and Analysis of entity mentions
3. Sentiment vs. Profiling
• Sentiment analysis
• Entity Profiling
– “hot” topics that people talk about in the context of an entity
4. Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b
more beautiful
• not surprising that @graypowell was out and about -
he used to be a ’Field Verification & Operator
Acceptance Engineer’ at Sony
5. Our task: Aspect identification
• @xbox_news here we go again,
microsoft being jealous of sony again.
• I lov big Sony headphones .. I lov my #music 2 b
more beautiful
• not surprising that @graypowell was out and about -
he used to be a ’Field Verification & Operator
Acceptance Engineer’ at Sony
6. Goal
• Build manually annotated corpora
– Evaluate the task of entity profiling in microblog
streams
7. A Corpus for Entity Profiling in
Microblog Posts
WePS-3 ORM Corpus
Collection of tweets
Disambiguated company names
(e.g. apple fruit vs. Apple Inc.)
8. A Corpus for Entity Profiling in
Microblog Posts
Tweet annotation
Pooling Aspects Opinion targets
WePS-3 ORM Corpus
9. A Corpus for Entity Profiling in
Microblog Posts
Tweet annotation
Pooling Aspects Opinion targets
WePS-3 ORM Corpus
10. Approach I: Pooling aspects
• Pooling methodology
– 4 Ranking Methods:
• TF.IDF [Salton and Buckley, 1988]
• Log-Likelihood Ratio [Dunning, 1993]
• Parsimonious Language Model [Hiemstra et al. 2004]
• Opinion target extraction using topic-specific subjective
lexicons [Jijkoun et al. 2010]
– Top 10 terms
• Manual annotation
13. Approach II: Tweet annotation
• Opinion targets dataset
• Tweet-level annotation
– Is the tweet subjective?
• Phrase-level annotation
– Subjective phrase
– Opinion target phrase p:
• p is an aspect of the entity
• p is included in a sentence that contains a direct subjective
phrase
• p is the target of the expressed opinion