SlideShare a Scribd company logo
1 of 12
Download to read offline
Team No:48
Snigdha Agarwal
Arnav Sharma
Srimanikantha Tangudu
Anubhav Jaiswal
Entity Linking in Social Media
Introduction
Problem Statement: For a given tweet, find the entities and then using
contextual information about these entities, try to link it with the corresponding
information resources
● Microblogs capture an unprecedented amount of information
● Information extraction from microblog posts
● In our case, Twitter Entity Linking.
Related Works
● To Link or Not to Link? A Study on End-to-End Tweet
Entity Linking - Stephen Guo, Ming Wei Chang, Emre
Kiciman
● TAGME: On-the-fly Annotation of Short Text Fragments
(by Wikipedia Entities)
System Components
The project has been broken down into 3 major parts
● Mention Detection,
○ task of extraction of surface form candidates that can link to an entity
in the domain of interest
● Link Generation,
○ task of finding the relevant Wikipedia pages for each entity obtained in
the tweet
● Entity Disambiguation.
○ task of linking an extracted mention to a specific definition or instance
of an entity
Approach
Mention Detection
● Classification and Segmentation of named entities as
separate tasks
● Most words found in tweets are not part of an entity
● Annotated dataset to effectively learn a model of named
entities required
Approach
Mention Detection
● Segmentation
○ @usernames not considered as entity
- unambiguous
- trivial to identify with 100% accuracy
- would only serve to inflate performance statistics
○ Brown clusters and tagging system, chunking system and
capitalization system, have been used to generate features.
Approach
Mention Detection
● Classification
○ Tweets do not contain enough context
○ A large lists of entities and their types
○ Use of LabeledLDA
- Models each entity string as a mixture of types
- Information about an entity's distribution over types can be shared,
thus handling ambiguous entity strings
- For example, Amazon could correspond to a distribution over two
types:COMPANY, and LOCATION, whereas Apple might represent a
distribution over COMPANY, and FOOD.
Approach
Link Generation
● For each entity obtained in the previous step, find the
relevant Wikipedia pages
● Previously done using the Wikipedia library
● Results were given an inverted rank on the basis of
combination of Jaccard Similarity and commonness of
the entity.
Approach
Entity Disambiguation
● List of Wikipedia pages obtained in previous step
● Rank them according to the context of the tweet
● Pick out the most relevant ones
Approach
Entity Disambiguation
● Semantic similarity measure known as Relatedness used for
disambiguation
● Quantification of the relation between two Wikipedia entities
based upon their inlinks and outlinks.
- For one entity in tweet, the one with the highest rank in the
previous step is selected as the answer.
- For two entities, the pair giving the highest relatedness
measure is selected.
- For more than two entities, pair-wise relatedness is used.
Accuracy
● To calculate accuracy, we manually annotated around 100
tweets. While annotating, using human intelligence, we found
out the entities inside the tweet and linked it to it’s relevant
wikipedia page.
● The misses and hits were considered to calculate the
accuracy.
● The test set of 100 tweets was very diverse and contained
tweets which had multiple entities as well.
● The overall accuracy of the system was around ~52%
Results
Thank You

More Related Content

Similar to Ire

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAanargha gangadharan
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAMary Lis Joseph
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysisijtsrd
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...Daniel Katz
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterDamiano Spina
 
Major presentation
Major presentationMajor presentation
Major presentationPS241092
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter Sultan Alzahrani
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesJason Yang
 
Finding and accessing e-resources
Finding and accessing e-resourcesFinding and accessing e-resources
Finding and accessing e-resourcesLouise Penn
 

Similar to Ire (20)

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
 
Major presentation
Major presentationMajor presentation
Major presentation
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
 
Database design
Database designDatabase design
Database design
 
Finding and accessing e-resources
Finding and accessing e-resourcesFinding and accessing e-resources
Finding and accessing e-resources
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Ire

  • 1. Team No:48 Snigdha Agarwal Arnav Sharma Srimanikantha Tangudu Anubhav Jaiswal Entity Linking in Social Media
  • 2. Introduction Problem Statement: For a given tweet, find the entities and then using contextual information about these entities, try to link it with the corresponding information resources ● Microblogs capture an unprecedented amount of information ● Information extraction from microblog posts ● In our case, Twitter Entity Linking.
  • 3. Related Works ● To Link or Not to Link? A Study on End-to-End Tweet Entity Linking - Stephen Guo, Ming Wei Chang, Emre Kiciman ● TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities)
  • 4. System Components The project has been broken down into 3 major parts ● Mention Detection, ○ task of extraction of surface form candidates that can link to an entity in the domain of interest ● Link Generation, ○ task of finding the relevant Wikipedia pages for each entity obtained in the tweet ● Entity Disambiguation. ○ task of linking an extracted mention to a specific definition or instance of an entity
  • 5. Approach Mention Detection ● Classification and Segmentation of named entities as separate tasks ● Most words found in tweets are not part of an entity ● Annotated dataset to effectively learn a model of named entities required
  • 6. Approach Mention Detection ● Segmentation ○ @usernames not considered as entity - unambiguous - trivial to identify with 100% accuracy - would only serve to inflate performance statistics ○ Brown clusters and tagging system, chunking system and capitalization system, have been used to generate features.
  • 7. Approach Mention Detection ● Classification ○ Tweets do not contain enough context ○ A large lists of entities and their types ○ Use of LabeledLDA - Models each entity string as a mixture of types - Information about an entity's distribution over types can be shared, thus handling ambiguous entity strings - For example, Amazon could correspond to a distribution over two types:COMPANY, and LOCATION, whereas Apple might represent a distribution over COMPANY, and FOOD.
  • 8. Approach Link Generation ● For each entity obtained in the previous step, find the relevant Wikipedia pages ● Previously done using the Wikipedia library ● Results were given an inverted rank on the basis of combination of Jaccard Similarity and commonness of the entity.
  • 9. Approach Entity Disambiguation ● List of Wikipedia pages obtained in previous step ● Rank them according to the context of the tweet ● Pick out the most relevant ones
  • 10. Approach Entity Disambiguation ● Semantic similarity measure known as Relatedness used for disambiguation ● Quantification of the relation between two Wikipedia entities based upon their inlinks and outlinks. - For one entity in tweet, the one with the highest rank in the previous step is selected as the answer. - For two entities, the pair giving the highest relatedness measure is selected. - For more than two entities, pair-wise relatedness is used.
  • 11. Accuracy ● To calculate accuracy, we manually annotated around 100 tweets. While annotating, using human intelligence, we found out the entities inside the tweet and linked it to it’s relevant wikipedia page. ● The misses and hits were considered to calculate the accuracy. ● The test set of 100 tweets was very diverse and contained tweets which had multiple entities as well. ● The overall accuracy of the system was around ~52% Results