SlideShare a Scribd company logo
1 of 18
Download to read offline
A Corpus for Entity Profiling in
      Microblog Posts
                           Edgar Meij, Andrei Oghina,
  Damiano Spina            Minh T. Bui, Mathias Breuss,
                                Maarten de Rijke


   UNED NLP & IR Group             ISLA, University of Amsterdam
      Madrid, Spain                 Amsterdam, The Netherlands

                     LREC Workshop on
   Language Engineering for Online Reputation Management
              May 26th, 2012 - Istambul, Turkey
Introduction
• Online Reputation Management
   – Public image of an entity in Online Media
   – Entity = { brand, organization, company, person, product }
• Microblogging services (e.g. Twitter)
   – People sharing thoughts about an entity
   – Dynamic, Real-Time
• Human Language Technologies
   – Aid to reputation managers
   – Retrieval and Analysis of entity mentions
Sentiment vs. Profiling
• Sentiment analysis




• Entity Profiling
   – “hot” topics that people talk about in the context of an entity
Our task: Aspect identification
• @xbox_news here we go again,
  microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b
   more beautiful

• not surprising that @graypowell was out and about -
  he used to be a ’Field Verification & Operator
  Acceptance Engineer’ at Sony
Our task: Aspect identification
• @xbox_news here we go again,
  microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b
  more beautiful

• not surprising that @graypowell was out and about -
  he used to be a ’Field Verification & Operator
  Acceptance Engineer’ at Sony
Goal

• Build manually annotated corpora

  – Evaluate the task of entity profiling in microblog
    streams
A Corpus for Entity Profiling in
      Microblog Posts




        WePS-3 ORM Corpus
                      Collection of tweets
                      Disambiguated company names
                      (e.g. apple fruit vs. Apple Inc.)
A Corpus for Entity Profiling in
      Microblog Posts



                              Tweet annotation
 Pooling Aspects                Opinion targets


          WePS-3 ORM Corpus
A Corpus for Entity Profiling in
      Microblog Posts



                              Tweet annotation
 Pooling Aspects                Opinion targets


          WePS-3 ORM Corpus
Approach I: Pooling aspects
• Pooling methodology
  – 4 Ranking Methods:
    •   TF.IDF [Salton and Buckley, 1988]
    •   Log-Likelihood Ratio [Dunning, 1993]
    •   Parsimonious Language Model [Hiemstra et al. 2004]
    •   Opinion target extraction using topic-specific subjective
        lexicons [Jijkoun et al. 2010]
  – Top 10 terms
• Manual annotation
Aspects dataset: annotation example
Aspects dataset: outcome
• Three annotators, substantial agreement
  (> 0.6 Cohen/Fleiss’ kappa)


• 94 entities, 17775 tweets, ≈177 tweets/entity

• 2455 terms, 1304 aspects (54.11%)
Approach II: Tweet annotation
• Opinion targets dataset
• Tweet-level annotation
  – Is the tweet subjective?
• Phrase-level annotation
  – Subjective phrase
  – Opinion target phrase p:
     • p is an aspect of the entity
     • p is included in a sentence that contains a direct subjective
       phrase
     • p is the target of the expressed opinion
Opinion Targets dataset: annotation
             example
Opinion targets dataset: outcome
• 59 entities, 9396 tweets, ≈159 tweets/entity

• 15.16% of tweets with subjective phrases

• 13.82% of tweets with opinion targets
Aspects vs. Opinion targets

                           Terms in
                           Opinion Targets

Aspects



          783 270   1650
Aspects vs. Opinion targets

                                     Terms in
                                     Opinion Targets

Aspects

                     12.67%
          783 270             1650
            26.69%
A Corpus for Entity Profiling in
              Microblog Posts
• Available at
             http://bitly.com/profilingTwitter

                                         •   59 entities, 9,396 tweets,
 •   94 entities, 17,775 tweets              ≈159 tweets/entity
     ≈177 tweets/entity                  •   15.16% of tweets with subj. phrases
 •   2455 terms, 1304 aspects (54.11%)   •   13.82% of tweets with opinion targets

                                                             Tweet annotation
                  Pooling
                                                                 Opinion targets
               Aspects dataset
                                                                        dataset

                            WePS-3 ORM Corpus

More Related Content

What's hot

Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
Fabio Benedetti
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
piya chauhan
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
SonuCreation
 

What's hot (20)

Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Semantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of TwitterSemantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of Twitter
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
 

Viewers also liked (8)

The Magnificent Seven
The Magnificent SevenThe Magnificent Seven
The Magnificent Seven
 
Clojure 1.1 And Beyond
Clojure 1.1 And BeyondClojure 1.1 And Beyond
Clojure 1.1 And Beyond
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
 
Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...
 
July 2010 Presentation
July 2010 PresentationJuly 2010 Presentation
July 2010 Presentation
 
Online Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveOnline Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access Perspective
 
Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...
 
Independencia judicial
Independencia judicialIndependencia judicial
Independencia judicial
 

Similar to A Corpus for Entity Profiling in Microblog Posts

Direct Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long TailDirect Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long Tail
Michael Bernstein
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
Open Analytics
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Review
iosrjce
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysis
ikanow
 

Similar to A Corpus for Entity Profiling in Microblog Posts (20)

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Direct Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long TailDirect Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long Tail
 
Structured design: Modular style for modern content
Structured design: Modular style for modern contentStructured design: Modular style for modern content
Structured design: Modular style for modern content
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Review
 
W01761157162
W01761157162W01761157162
W01761157162
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
 
Ire major project
Ire major projectIre major project
Ire major project
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysis
 

More from Damiano Spina

UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013
Damiano Spina
 

More from Damiano Spina (9)

A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking Fusion
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...
 
UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013
 
Identifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsIdentifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog Posts
 
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsTowards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
 
Evaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosEvaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuarios
 
Caracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoCaracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de caso
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

A Corpus for Entity Profiling in Microblog Posts

  • 1. A Corpus for Entity Profiling in Microblog Posts Edgar Meij, Andrei Oghina, Damiano Spina Minh T. Bui, Mathias Breuss, Maarten de Rijke UNED NLP & IR Group ISLA, University of Amsterdam Madrid, Spain Amsterdam, The Netherlands LREC Workshop on Language Engineering for Online Reputation Management May 26th, 2012 - Istambul, Turkey
  • 2. Introduction • Online Reputation Management – Public image of an entity in Online Media – Entity = { brand, organization, company, person, product } • Microblogging services (e.g. Twitter) – People sharing thoughts about an entity – Dynamic, Real-Time • Human Language Technologies – Aid to reputation managers – Retrieval and Analysis of entity mentions
  • 3. Sentiment vs. Profiling • Sentiment analysis • Entity Profiling – “hot” topics that people talk about in the context of an entity
  • 4. Our task: Aspect identification • @xbox_news here we go again, microsoft being jealous of sony again. • I lov big Sony headphones .. I lov my #music 2 b more beautiful • not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 5. Our task: Aspect identification • @xbox_news here we go again, microsoft being jealous of sony again. • I lov big Sony headphones .. I lov my #music 2 b more beautiful • not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 6. Goal • Build manually annotated corpora – Evaluate the task of entity profiling in microblog streams
  • 7. A Corpus for Entity Profiling in Microblog Posts WePS-3 ORM Corpus Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)
  • 8. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 9. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 10. Approach I: Pooling aspects • Pooling methodology – 4 Ranking Methods: • TF.IDF [Salton and Buckley, 1988] • Log-Likelihood Ratio [Dunning, 1993] • Parsimonious Language Model [Hiemstra et al. 2004] • Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010] – Top 10 terms • Manual annotation
  • 12. Aspects dataset: outcome • Three annotators, substantial agreement (> 0.6 Cohen/Fleiss’ kappa) • 94 entities, 17775 tweets, ≈177 tweets/entity • 2455 terms, 1304 aspects (54.11%)
  • 13. Approach II: Tweet annotation • Opinion targets dataset • Tweet-level annotation – Is the tweet subjective? • Phrase-level annotation – Subjective phrase – Opinion target phrase p: • p is an aspect of the entity • p is included in a sentence that contains a direct subjective phrase • p is the target of the expressed opinion
  • 14. Opinion Targets dataset: annotation example
  • 15. Opinion targets dataset: outcome • 59 entities, 9396 tweets, ≈159 tweets/entity • 15.16% of tweets with subjective phrases • 13.82% of tweets with opinion targets
  • 16. Aspects vs. Opinion targets Terms in Opinion Targets Aspects 783 270 1650
  • 17. Aspects vs. Opinion targets Terms in Opinion Targets Aspects 12.67% 783 270 1650 26.69%
  • 18. A Corpus for Entity Profiling in Microblog Posts • Available at http://bitly.com/profilingTwitter • 59 entities, 9,396 tweets, • 94 entities, 17,775 tweets ≈159 tweets/entity ≈177 tweets/entity • 15.16% of tweets with subj. phrases • 2455 terms, 1304 aspects (54.11%) • 13.82% of tweets with opinion targets Tweet annotation Pooling Opinion targets Aspects dataset dataset WePS-3 ORM Corpus