SlideShare a Scribd company logo
1 of 18
Download to read offline
A Corpus for Entity Profiling in
      Microblog Posts
                           Edgar Meij, Andrei Oghina,
  Damiano Spina            Minh T. Bui, Mathias Breuss,
                                Maarten de Rijke


   UNED NLP & IR Group             ISLA, University of Amsterdam
      Madrid, Spain                 Amsterdam, The Netherlands

                     LREC Workshop on
   Language Engineering for Online Reputation Management
              May 26th, 2012 - Istambul, Turkey
Introduction
• Online Reputation Management
   – Public image of an entity in Online Media
   – Entity = { brand, organization, company, person, product }
• Microblogging services (e.g. Twitter)
   – People sharing thoughts about an entity
   – Dynamic, Real-Time
• Human Language Technologies
   – Aid to reputation managers
   – Retrieval and Analysis of entity mentions
Sentiment vs. Profiling
• Sentiment analysis




• Entity Profiling
   – “hot” topics that people talk about in the context of an entity
Our task: Aspect identification
• @xbox_news here we go again,
  microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b
   more beautiful

• not surprising that @graypowell was out and about -
  he used to be a ’Field Verification & Operator
  Acceptance Engineer’ at Sony
Our task: Aspect identification
• @xbox_news here we go again,
  microsoft being jealous of sony again.

• I lov big Sony headphones .. I lov my #music 2 b
  more beautiful

• not surprising that @graypowell was out and about -
  he used to be a ’Field Verification & Operator
  Acceptance Engineer’ at Sony
Goal

• Build manually annotated corpora

  – Evaluate the task of entity profiling in microblog
    streams
A Corpus for Entity Profiling in
      Microblog Posts




        WePS-3 ORM Corpus
                      Collection of tweets
                      Disambiguated company names
                      (e.g. apple fruit vs. Apple Inc.)
A Corpus for Entity Profiling in
      Microblog Posts



                              Tweet annotation
 Pooling Aspects                Opinion targets


          WePS-3 ORM Corpus
A Corpus for Entity Profiling in
      Microblog Posts



                              Tweet annotation
 Pooling Aspects                Opinion targets


          WePS-3 ORM Corpus
Approach I: Pooling aspects
• Pooling methodology
  – 4 Ranking Methods:
    •   TF.IDF [Salton and Buckley, 1988]
    •   Log-Likelihood Ratio [Dunning, 1993]
    •   Parsimonious Language Model [Hiemstra et al. 2004]
    •   Opinion target extraction using topic-specific subjective
        lexicons [Jijkoun et al. 2010]
  – Top 10 terms
• Manual annotation
Aspects dataset: annotation example
Aspects dataset: outcome
• Three annotators, substantial agreement
  (> 0.6 Cohen/Fleiss’ kappa)


• 94 entities, 17775 tweets, ≈177 tweets/entity

• 2455 terms, 1304 aspects (54.11%)
Approach II: Tweet annotation
• Opinion targets dataset
• Tweet-level annotation
  – Is the tweet subjective?
• Phrase-level annotation
  – Subjective phrase
  – Opinion target phrase p:
     • p is an aspect of the entity
     • p is included in a sentence that contains a direct subjective
       phrase
     • p is the target of the expressed opinion
Opinion Targets dataset: annotation
             example
Opinion targets dataset: outcome
• 59 entities, 9396 tweets, ≈159 tweets/entity

• 15.16% of tweets with subjective phrases

• 13.82% of tweets with opinion targets
Aspects vs. Opinion targets

                           Terms in
                           Opinion Targets

Aspects



          783 270   1650
Aspects vs. Opinion targets

                                     Terms in
                                     Opinion Targets

Aspects

                     12.67%
          783 270             1650
            26.69%
A Corpus for Entity Profiling in
              Microblog Posts
• Available at
             http://bitly.com/profilingTwitter

                                         •   59 entities, 9,396 tweets,
 •   94 entities, 17,775 tweets              ≈159 tweets/entity
     ≈177 tweets/entity                  •   15.16% of tweets with subj. phrases
 •   2455 terms, 1304 aspects (54.11%)   •   13.82% of tweets with opinion targets

                                                             Tweet annotation
                  Pooling
                                                                 Opinion targets
               Aspects dataset
                                                                        dataset

                            WePS-3 ORM Corpus

More Related Content

What's hot

Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemMark Cieliebak
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webKarishma chaudhary
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using pythonCloudTechnologies
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Eric Brown
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...Eric Brown
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemShailly Saxena
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerbohanairl
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonHetu Bhavsar
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigNurfadhlina Mohd Sharef
 

What's hot (20)

Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...Twitter Sentiment & Investing - modeling stock price movements with twitter s...
Twitter Sentiment & Investing - modeling stock price movements with twitter s...
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...These slides cover the final defense presentation for my Doctorate degree. Th...
These slides cover the final defense presentation for my Doctorate degree. Th...
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Semantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of TwitterSemantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of Twitter
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
A review of sentiment analysis approaches in big
A review of sentiment analysis approaches in bigA review of sentiment analysis approaches in big
A review of sentiment analysis approaches in big
 

Viewers also liked

The Magnificent Seven
The Magnificent SevenThe Magnificent Seven
The Magnificent SevenMike Fogus
 
Clojure 1.1 And Beyond
Clojure 1.1 And BeyondClojure 1.1 And Beyond
Clojure 1.1 And BeyondMike Fogus
 
Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Damiano Spina
 
July 2010 Presentation
July 2010 PresentationJuly 2010 Presentation
July 2010 Presentationcyndilevy
 
Online Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveOnline Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveDamiano Spina
 
Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...Damiano Spina
 
Independencia judicial
Independencia judicialIndependencia judicial
Independencia judicialeric prado
 

Viewers also liked (8)

The Magnificent Seven
The Magnificent SevenThe Magnificent Seven
The Magnificent Seven
 
Clojure 1.1 And Beyond
Clojure 1.1 And BeyondClojure 1.1 And Beyond
Clojure 1.1 And Beyond
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
 
Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...
 
July 2010 Presentation
July 2010 PresentationJuly 2010 Presentation
July 2010 Presentation
 
Online Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveOnline Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access Perspective
 
Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...Learning Similarity Functions for Topic Detection in Online Reputation Monito...
Learning Similarity Functions for Topic Detection in Online Reputation Monito...
 
Independencia judicial
Independencia judicialIndependencia judicial
Independencia judicial
 

Similar to WePS-3 ORM Corpus for Entity Profiling in Microblog Posts

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisAli BELCAID
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryAll Things Open
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignCommunitySense
 
Direct Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long TailDirect Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long TailMichael Bernstein
 
Structured design: Modular style for modern content
Structured design: Modular style for modern contentStructured design: Modular style for modern content
Structured design: Modular style for modern contentChristopher Hess
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisYun Hao
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Reviewiosrjce
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015Amar Budhiraja
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysisikanow
 

Similar to WePS-3 ORM Corpus for Entity Profiling in Microblog Posts (20)

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Data Acquisition for Sentiment Analysis
Data Acquisition for Sentiment AnalysisData Acquisition for Sentiment Analysis
Data Acquisition for Sentiment Analysis
 
Modern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discoveryModern Search: Using ML & NLP advances to enhance search and discovery
Modern Search: Using ML & NLP advances to enhance search and discovery
 
Conversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems DesignConversations in Context: A Twitter Case for Social Media Systems Design
Conversations in Context: A Twitter Case for Social Media Systems Design
 
Direct Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long TailDirect Answers for Search Queries in the Long Tail
Direct Answers for Search Queries in the Long Tail
 
Structured design: Modular style for modern content
Structured design: Modular style for modern contentStructured design: Modular style for modern content
Structured design: Modular style for modern content
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
W01761157162
W01761157162W01761157162
W01761157162
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Review
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
Computing Social Score of Web Artifacts - IRE Major Project Spring 2015
 
Ire major project
Ire major projectIre major project
Ire major project
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysis
 

More from Damiano Spina

A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionDamiano Spina
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...Damiano Spina
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterDamiano Spina
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Damiano Spina
 
UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013Damiano Spina
 
Identifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsIdentifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsDamiano Spina
 
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsTowards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsDamiano Spina
 
Evaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosEvaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosDamiano Spina
 
Caracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoCaracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoDamiano Spina
 

More from Damiano Spina (9)

A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking Fusion
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...
 
UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013
 
Identifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsIdentifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog Posts
 
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsTowards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
 
Evaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosEvaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuarios
 
Caracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoCaracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de caso
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

WePS-3 ORM Corpus for Entity Profiling in Microblog Posts

  • 1. A Corpus for Entity Profiling in Microblog Posts Edgar Meij, Andrei Oghina, Damiano Spina Minh T. Bui, Mathias Breuss, Maarten de Rijke UNED NLP & IR Group ISLA, University of Amsterdam Madrid, Spain Amsterdam, The Netherlands LREC Workshop on Language Engineering for Online Reputation Management May 26th, 2012 - Istambul, Turkey
  • 2. Introduction • Online Reputation Management – Public image of an entity in Online Media – Entity = { brand, organization, company, person, product } • Microblogging services (e.g. Twitter) – People sharing thoughts about an entity – Dynamic, Real-Time • Human Language Technologies – Aid to reputation managers – Retrieval and Analysis of entity mentions
  • 3. Sentiment vs. Profiling • Sentiment analysis • Entity Profiling – “hot” topics that people talk about in the context of an entity
  • 4. Our task: Aspect identification • @xbox_news here we go again, microsoft being jealous of sony again. • I lov big Sony headphones .. I lov my #music 2 b more beautiful • not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 5. Our task: Aspect identification • @xbox_news here we go again, microsoft being jealous of sony again. • I lov big Sony headphones .. I lov my #music 2 b more beautiful • not surprising that @graypowell was out and about - he used to be a ’Field Verification & Operator Acceptance Engineer’ at Sony
  • 6. Goal • Build manually annotated corpora – Evaluate the task of entity profiling in microblog streams
  • 7. A Corpus for Entity Profiling in Microblog Posts WePS-3 ORM Corpus Collection of tweets Disambiguated company names (e.g. apple fruit vs. Apple Inc.)
  • 8. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 9. A Corpus for Entity Profiling in Microblog Posts Tweet annotation Pooling Aspects Opinion targets WePS-3 ORM Corpus
  • 10. Approach I: Pooling aspects • Pooling methodology – 4 Ranking Methods: • TF.IDF [Salton and Buckley, 1988] • Log-Likelihood Ratio [Dunning, 1993] • Parsimonious Language Model [Hiemstra et al. 2004] • Opinion target extraction using topic-specific subjective lexicons [Jijkoun et al. 2010] – Top 10 terms • Manual annotation
  • 12. Aspects dataset: outcome • Three annotators, substantial agreement (> 0.6 Cohen/Fleiss’ kappa) • 94 entities, 17775 tweets, ≈177 tweets/entity • 2455 terms, 1304 aspects (54.11%)
  • 13. Approach II: Tweet annotation • Opinion targets dataset • Tweet-level annotation – Is the tweet subjective? • Phrase-level annotation – Subjective phrase – Opinion target phrase p: • p is an aspect of the entity • p is included in a sentence that contains a direct subjective phrase • p is the target of the expressed opinion
  • 14. Opinion Targets dataset: annotation example
  • 15. Opinion targets dataset: outcome • 59 entities, 9396 tweets, ≈159 tweets/entity • 15.16% of tweets with subjective phrases • 13.82% of tweets with opinion targets
  • 16. Aspects vs. Opinion targets Terms in Opinion Targets Aspects 783 270 1650
  • 17. Aspects vs. Opinion targets Terms in Opinion Targets Aspects 12.67% 783 270 1650 26.69%
  • 18. A Corpus for Entity Profiling in Microblog Posts • Available at http://bitly.com/profilingTwitter • 59 entities, 9,396 tweets, • 94 entities, 17,775 tweets ≈159 tweets/entity ≈177 tweets/entity • 15.16% of tweets with subj. phrases • 2455 terms, 1304 aspects (54.11%) • 13.82% of tweets with opinion targets Tweet annotation Pooling Opinion targets Aspects dataset dataset WePS-3 ORM Corpus