Ire

•

0 likes•535 views

Anubhav Jaiswal

IRE Major Project Presentation (Group 48) - Entity Linking in Social Media

Education Technology

Team No:48
Snigdha Agarwal
Arnav Sharma
Srimanikantha Tangudu
Anubhav Jaiswal
Entity Linking in Social Media

Introduction
Problem Statement: For a given tweet, find the entities and then using
contextual information about these entities, try to link it with the corresponding
information resources
● Microblogs capture an unprecedented amount of information
● Information extraction from microblog posts
● In our case, Twitter Entity Linking.

Related Works
● To Link or Not to Link? A Study on End-to-End Tweet
Entity Linking - Stephen Guo, Ming Wei Chang, Emre
Kiciman
● TAGME: On-the-fly Annotation of Short Text Fragments
(by Wikipedia Entities)

System Components
The project has been broken down into 3 major parts
● Mention Detection,
○ task of extraction of surface form candidates that can link to an entity
in the domain of interest
● Link Generation,
○ task of finding the relevant Wikipedia pages for each entity obtained in
the tweet
● Entity Disambiguation.
○ task of linking an extracted mention to a specific definition or instance
of an entity

Approach
Mention Detection
● Classification and Segmentation of named entities as
separate tasks
● Most words found in tweets are not part of an entity
● Annotated dataset to effectively learn a model of named
entities required

Approach
Mention Detection
● Segmentation
○ @usernames not considered as entity
- unambiguous
- trivial to identify with 100% accuracy
- would only serve to inflate performance statistics
○ Brown clusters and tagging system, chunking system and
capitalization system, have been used to generate features.

Approach
Mention Detection
● Classification
○ Tweets do not contain enough context
○ A large lists of entities and their types
○ Use of LabeledLDA
- Models each entity string as a mixture of types
- Information about an entity's distribution over types can be shared,
thus handling ambiguous entity strings
- For example, Amazon could correspond to a distribution over two
types:COMPANY, and LOCATION, whereas Apple might represent a
distribution over COMPANY, and FOOD.

Approach
Link Generation
● For each entity obtained in the previous step, find the
relevant Wikipedia pages
● Previously done using the Wikipedia library
● Results were given an inverted rank on the basis of
combination of Jaccard Similarity and commonness of
the entity.

Approach
Entity Disambiguation
● List of Wikipedia pages obtained in previous step
● Rank them according to the context of the tweet
● Pick out the most relevant ones

Approach
Entity Disambiguation
● Semantic similarity measure known as Relatedness used for
disambiguation
● Quantification of the relation between two Wikipedia entities
based upon their inlinks and outlinks.
- For one entity in tweet, the one with the highest rank in the
previous step is selected as the answer.
- For two entities, the pair giving the highest relatedness
measure is selected.
- For more than two entities, pair-wise relatedness is used.

Accuracy
● To calculate accuracy, we manually annotated around 100
tweets. While annotating, using human intelligence, we found
out the entities inside the tweet and linked it to it’s relevant
wikipedia page.
● The misses and hits were considered to calculate the
accuracy.
● The test set of 100 tweets was very diverse and contained
tweets which had multiple entities as well.
● The overall accuracy of the system was around ~52%
Results

Similar to Ire

Twitter as a personalizable information service iiKan-Han (John) Lu

Sentiment analysis using mlPravin Katiyar

Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor

Tweet Segmentation and Its Application to Named Entity Recognition1crore projects

Sentiment Analysis of Twitter DataSumit Raj

Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish

SENTIMENT ANALYSIS OF TWITTER DATAanargha gangadharan

REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAMary Lis Joseph

SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj

Twitter Sentiment Analysisijtsrd

ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...Daniel Katz

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterDamiano Spina

Major presentationPS241092

New sentiment analysis of tweets using python by Ravi kumarRavi Kumar

A network based model for predicting a hashtag break out in twitter Sultan Alzahrani

sentiment analysis text extraction from social media Ravindra Chaudhary

Compare & Contrast Using The Web To Discover Comparable Cases For News StoriesJason Yang

Database designFLYMAN TECHNOLOGY LIMITED

Finding and accessing e-resourcesLouise Penn

Similar to Ire (20)

Twitter as a personalizable information service ii

Sentiment analysis using ml

Sentiment Analysis of Twitter tweets using supervised classification technique

Tweet Segmentation and Its Application to Named Entity Recognition

Sentiment Analysis of Twitter Data

Prediction of Reaction towards Textual Posts in Social Networks

SENTIMENT ANALYSIS OF TWITTER DATA

REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA

SENTIMENT ANALYSIS OF TWITTER DATA

Twitter Sentiment Analysis

ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...

SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx

ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter

Major presentation

New sentiment analysis of tweets using python by Ravi kumar

A network based model for predicting a hashtag break out in twitter

sentiment analysis text extraction from social media

Compare & Contrast Using The Web To Discover Comparable Cases For News Stories

Database design

Finding and accessing e-resources

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh

Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K

microwave assisted reaction. General introductionMaksud Ahmed

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31

Paris 2024 Olympic Geographies - an activityGeoBlogs

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood

Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019

Arihant handbook biology for class 11 .pdfchloefrazer622

General AI for Medical Educators April 2024Janet Corral

Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622

Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417

Código Creativo y Arte de Software | Unidad 1Maestría en Comunicación Digital Interactiva - UNR

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Software Engineering Methodologies (overview)eniolaolutunde

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Class 11th Physics NEET formula sheet pdfAyushMahapatra5

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf

Measures of Dispersion and Variability: Range, QD, AD and SD

microwave assisted reaction. General introduction

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...

Paris 2024 Olympic Geographies - an activity

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx

Sanyam Choudhary Chemistry practical.pdf

Arihant handbook biology for class 11 .pdf

General AI for Medical Educators April 2024

Disha NEET Physics Guide for classes 11 and 12.pdf

Unit-IV- Pharma. Marketing Channels.pptx

Código Creativo y Arte de Software | Unidad 1

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Software Engineering Methodologies (overview)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Class 11th Physics NEET formula sheet pdf

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Measures of Central Tendency: Mean, Median and Mode

Ire

1. Team No:48 Snigdha Agarwal Arnav Sharma Srimanikantha Tangudu Anubhav Jaiswal Entity Linking in Social Media

2. Introduction Problem Statement: For a given tweet, find the entities and then using contextual information about these entities, try to link it with the corresponding information resources ● Microblogs capture an unprecedented amount of information ● Information extraction from microblog posts ● In our case, Twitter Entity Linking.

3. Related Works ● To Link or Not to Link? A Study on End-to-End Tweet Entity Linking - Stephen Guo, Ming Wei Chang, Emre Kiciman ● TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities)

4. System Components The project has been broken down into 3 major parts ● Mention Detection, ○ task of extraction of surface form candidates that can link to an entity in the domain of interest ● Link Generation, ○ task of finding the relevant Wikipedia pages for each entity obtained in the tweet ● Entity Disambiguation. ○ task of linking an extracted mention to a specific definition or instance of an entity

5. Approach Mention Detection ● Classification and Segmentation of named entities as separate tasks ● Most words found in tweets are not part of an entity ● Annotated dataset to effectively learn a model of named entities required

6. Approach Mention Detection ● Segmentation ○ @usernames not considered as entity - unambiguous - trivial to identify with 100% accuracy - would only serve to inflate performance statistics ○ Brown clusters and tagging system, chunking system and capitalization system, have been used to generate features.

7. Approach Mention Detection ● Classification ○ Tweets do not contain enough context ○ A large lists of entities and their types ○ Use of LabeledLDA - Models each entity string as a mixture of types - Information about an entity's distribution over types can be shared, thus handling ambiguous entity strings - For example, Amazon could correspond to a distribution over two types:COMPANY, and LOCATION, whereas Apple might represent a distribution over COMPANY, and FOOD.

8. Approach Link Generation ● For each entity obtained in the previous step, find the relevant Wikipedia pages ● Previously done using the Wikipedia library ● Results were given an inverted rank on the basis of combination of Jaccard Similarity and commonness of the entity.

9. Approach Entity Disambiguation ● List of Wikipedia pages obtained in previous step ● Rank them according to the context of the tweet ● Pick out the most relevant ones

10. Approach Entity Disambiguation ● Semantic similarity measure known as Relatedness used for disambiguation ● Quantification of the relation between two Wikipedia entities based upon their inlinks and outlinks. - For one entity in tweet, the one with the highest rank in the previous step is selected as the answer. - For two entities, the pair giving the highest relatedness measure is selected. - For more than two entities, pair-wise relatedness is used.

11. Accuracy ● To calculate accuracy, we manually annotated around 100 tweets. While annotating, using human intelligence, we found out the entities inside the tweet and linked it to it’s relevant wikipedia page. ● The misses and hits were considered to calculate the accuracy. ● The test set of 100 tweets was very diverse and contained tweets which had multiple entities as well. ● The overall accuracy of the system was around ~52% Results

12. Thank You

Ire

Recommended

Recommended

More Related Content

Similar to Ire

Similar to Ire (20)

Recently uploaded

Recently uploaded (20)

Ire