SlideShare a Scribd company logo
Team No:48
Snigdha Agarwal
Arnav Sharma
Srimanikantha Tangudu
Anubhav Jaiswal
Entity Linking in Social Media
Introduction
Problem Statement: For a given tweet, find the entities and then using
contextual information about these entities, try to link it with the corresponding
information resources
● Microblogs capture an unprecedented amount of information
● Information extraction from microblog posts
● In our case, Twitter Entity Linking.
Related Works
● To Link or Not to Link? A Study on End-to-End Tweet
Entity Linking - Stephen Guo, Ming Wei Chang, Emre
Kiciman
● TAGME: On-the-fly Annotation of Short Text Fragments
(by Wikipedia Entities)
System Components
The project has been broken down into 3 major parts
● Mention Detection,
○ task of extraction of surface form candidates that can link to an entity
in the domain of interest
● Link Generation,
○ task of finding the relevant Wikipedia pages for each entity obtained in
the tweet
● Entity Disambiguation.
○ task of linking an extracted mention to a specific definition or instance
of an entity
Approach
Mention Detection
● Classification and Segmentation of named entities as
separate tasks
● Most words found in tweets are not part of an entity
● Annotated dataset to effectively learn a model of named
entities required
Approach
Mention Detection
● Segmentation
○ @usernames not considered as entity
- unambiguous
- trivial to identify with 100% accuracy
- would only serve to inflate performance statistics
○ Brown clusters and tagging system, chunking system and
capitalization system, have been used to generate features.
Approach
Mention Detection
● Classification
○ Tweets do not contain enough context
○ A large lists of entities and their types
○ Use of LabeledLDA
- Models each entity string as a mixture of types
- Information about an entity's distribution over types can be shared,
thus handling ambiguous entity strings
- For example, Amazon could correspond to a distribution over two
types:COMPANY, and LOCATION, whereas Apple might represent a
distribution over COMPANY, and FOOD.
Approach
Link Generation
● For each entity obtained in the previous step, find the
relevant Wikipedia pages
● Previously done using the Wikipedia library
● Results were given an inverted rank on the basis of
combination of Jaccard Similarity and commonness of
the entity.
Approach
Entity Disambiguation
● List of Wikipedia pages obtained in previous step
● Rank them according to the context of the tweet
● Pick out the most relevant ones
Approach
Entity Disambiguation
● Semantic similarity measure known as Relatedness used for
disambiguation
● Quantification of the relation between two Wikipedia entities
based upon their inlinks and outlinks.
- For one entity in tweet, the one with the highest rank in the
previous step is selected as the answer.
- For two entities, the pair giving the highest relatedness
measure is selected.
- For more than two entities, pair-wise relatedness is used.
Accuracy
● To calculate accuracy, we manually annotated around 100
tweets. While annotating, using human intelligence, we found
out the entities inside the tweet and linked it to it’s relevant
wikipedia page.
● The misses and hits were considered to calculate the
accuracy.
● The test set of 100 tweets was very diverse and contained
tweets which had multiple entities as well.
● The overall accuracy of the system was around ~52%
Results
Thank You

More Related Content

Similar to Ire

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
Kan-Han (John) Lu
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
Pravin Katiyar
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
IJERA Editor
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
1crore projects
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
Mohamed El-Geish
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
Mary Lis Joseph
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
anargha gangadharan
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
ijtsrd
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
Daniel Katz
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
20211a05p7
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
Damiano Spina
 
Major presentation
Major presentationMajor presentation
Major presentation
PS241092
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
Ravi Kumar
 
A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter
Sultan Alzahrani
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
Ravindra Chaudhary
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Jason Yang
 
Database design
Database designDatabase design
Database design
FLYMAN TECHNOLOGY LIMITED
 
Finding and accessing e-resources
Finding and accessing e-resourcesFinding and accessing e-resources
Finding and accessing e-resources
Louise Penn
 

Similar to Ire (20)

Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
 
Major presentation
Major presentationMajor presentation
Major presentation
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter A network based model for predicting a hashtag break out in twitter
A network based model for predicting a hashtag break out in twitter
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesCompare & Contrast Using The Web To Discover Comparable Cases For News Stories
Compare & Contrast Using The Web To Discover Comparable Cases For News Stories
 
Database design
Database designDatabase design
Database design
 
Finding and accessing e-resources
Finding and accessing e-resourcesFinding and accessing e-resources
Finding and accessing e-resources
 

Recently uploaded

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 

Recently uploaded (20)

ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 

Ire

  • 1. Team No:48 Snigdha Agarwal Arnav Sharma Srimanikantha Tangudu Anubhav Jaiswal Entity Linking in Social Media
  • 2. Introduction Problem Statement: For a given tweet, find the entities and then using contextual information about these entities, try to link it with the corresponding information resources ● Microblogs capture an unprecedented amount of information ● Information extraction from microblog posts ● In our case, Twitter Entity Linking.
  • 3. Related Works ● To Link or Not to Link? A Study on End-to-End Tweet Entity Linking - Stephen Guo, Ming Wei Chang, Emre Kiciman ● TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities)
  • 4. System Components The project has been broken down into 3 major parts ● Mention Detection, ○ task of extraction of surface form candidates that can link to an entity in the domain of interest ● Link Generation, ○ task of finding the relevant Wikipedia pages for each entity obtained in the tweet ● Entity Disambiguation. ○ task of linking an extracted mention to a specific definition or instance of an entity
  • 5. Approach Mention Detection ● Classification and Segmentation of named entities as separate tasks ● Most words found in tweets are not part of an entity ● Annotated dataset to effectively learn a model of named entities required
  • 6. Approach Mention Detection ● Segmentation ○ @usernames not considered as entity - unambiguous - trivial to identify with 100% accuracy - would only serve to inflate performance statistics ○ Brown clusters and tagging system, chunking system and capitalization system, have been used to generate features.
  • 7. Approach Mention Detection ● Classification ○ Tweets do not contain enough context ○ A large lists of entities and their types ○ Use of LabeledLDA - Models each entity string as a mixture of types - Information about an entity's distribution over types can be shared, thus handling ambiguous entity strings - For example, Amazon could correspond to a distribution over two types:COMPANY, and LOCATION, whereas Apple might represent a distribution over COMPANY, and FOOD.
  • 8. Approach Link Generation ● For each entity obtained in the previous step, find the relevant Wikipedia pages ● Previously done using the Wikipedia library ● Results were given an inverted rank on the basis of combination of Jaccard Similarity and commonness of the entity.
  • 9. Approach Entity Disambiguation ● List of Wikipedia pages obtained in previous step ● Rank them according to the context of the tweet ● Pick out the most relevant ones
  • 10. Approach Entity Disambiguation ● Semantic similarity measure known as Relatedness used for disambiguation ● Quantification of the relation between two Wikipedia entities based upon their inlinks and outlinks. - For one entity in tweet, the one with the highest rank in the previous step is selected as the answer. - For two entities, the pair giving the highest relatedness measure is selected. - For more than two entities, pair-wise relatedness is used.
  • 11. Accuracy ● To calculate accuracy, we manually annotated around 100 tweets. While annotating, using human intelligence, we found out the entities inside the tweet and linked it to it’s relevant wikipedia page. ● The misses and hits were considered to calculate the accuracy. ● The test set of 100 tweets was very diverse and contained tweets which had multiple entities as well. ● The overall accuracy of the system was around ~52% Results