SlideShare a Scribd company logo
1 of 12
Download to read offline
Sub-Topic Detection Of Tweets
Related To An Entity
International Institute of Information Technology-Hyderabad
Mentor - Sandeep Pannem
By
P Yashaswi (201102111) Aayush Asawa(201305617)
Kumari Ankita(201101161) Diksha J. Yadav(201125130)
Introduction
➢ Tweets are classified according to the “Topic” and then the “Subtopic” they
refer to.
○ “Topic” refers to any major event in the real world.
○ “Subtopics” are fine-grained aspects of such events.
➢ Mining subtopics from entities/topics from tweets helps in trend
analysis, social monitoring, topic tracking and reputation
mining.
➢ Generally all tweets related to a particular entity have similar keywords. So,
while detecting the subtopics will have to deal with more features.
Work Flow
Training Data
Store
features in
Lucene
Classifier
(Phase 1,2,3)
Detected
Subtopic
Extract
Tweet
features
Input Tweet
Approach
Input : Training set of tweets which have subtopic names as class labels.
Test tweets which are to be classified into subtopics
Output : Assign subtopics to each of the test tweets
The entire workflow can be broken into three phases :
1. Pre-processing
2. Feature Extraction and Representation
3. Classification.
Feature Extraction
The following features are extracted from each tweet :
➢ TweetConcepts (using TagMe API)
➢ Named entity and event phrases( using Twical)
➢ URLConcepts(using TagMe API on the content in the external links)
➢ Key Phrases(extracting noun phrases after POS tagging)
➢ Hash tags
➢ Categories(extracting categories for the titles got though TagMe)
Similarity Measures used :
➢ Wikipedia miner(for comparing wikipedia titles)
➢ Wordnet similarity measure(to compare key phrases)
Classification
➢ Subtopic detection is considered as a classification problem where
subtopics are the class labels for the tweets which are the data points.
➢ The classifier derives logic from what features majority of the tweet
(datapoints) of a particular subtopic(class label) have.
➢ Based on the features initial seed clusters are created for each topic and
each cluster is represented as crisp information and index.
➢ The features of test tweets are found and compared with the clusters, and
then a cluster to which it best matches is assigned to the test tweet.
➢ This is done using Machine Learning technique.
Pre-Processing
Pre-processing involves the following steps :
➢ Removal of stopwords from the tweets and stemming from the training
data points.
➢ Extracting URLS from the tweets.
This is done for both training and test tweets.
Algorithm
Offline Process
1. All the tweets in the training data are grouped together according to their
sub topic
2. For every tweet in a subtopic, the features are extracted and are grouped to
form subtopic features.
3. The subtopic features of all the subtopic are stored in the lucene index
under different fields.
4. All those features that are common in two or more subtopics are removed,
also those features are removed that are directly related to the entity name.
Algorithm
Online Procedure
1. Phase 1 : The category features of the test tweet are searched in the lucene
index and the top 10 subtopics are listed.
2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared
with that of the top 10 subtopics from Phase 1 and top 5 subtopics are
listed based on wikipedia miner similarity measure.
3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5
category list from phase 2 using wordnet similarity measures. For hash tags
direct intersection is done .After this the best of 5 subtopics is chosen
All these can also be clubbed together to get the best subtopic
Experiments
➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities.
Each entity has about 700 tweets for training and 1500 tweets for testing.
➢ For evaluation we use Reliability ,Sensitivity and F Measure.
The results that we got for the entity “Volvo” are:
Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
Future Work
➢ We can build an SVM classifier which can accurately determine which
feature has to be given preference while classifying the tweets
➢ The input vectors would have dimensions as various features of various
subtopics with the corresponding similarity measures as the coefficients ,
where the labelled subtopic is the class label
➢ In the testing phase we can create similar vectors for test tweets to get their
corresponding subtopics
Reference
1. REINA at RepLab2013 Topic Detection Task: Community Detection
2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter

More Related Content

What's hot

Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learningVenkat Projects
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on TwitterLukas Masuch
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using pythonCloudTechnologies
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804Amir Goudarzi
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7ashhadiqbal
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 

What's hot (18)

Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on Twitter
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Combined queries
Combined queriesCombined queries
Combined queries
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Mule filters
Mule filtersMule filters
Mule filters
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 

Viewers also liked

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsGabriela Agustini
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsM. Atif Qureshi
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For DisastersSarvnaz Karimi
 
Discovering Context
Discovering ContextDiscovering Context
Discovering ContextYegin Genc
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweetsmitsmit
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical Angus Fox
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010Angus Fox
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSMukul Jha
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide ShowDan Foote
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes ClassifiersDongseo University
 

Viewers also liked (12)

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of Tweets
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For Disasters
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweets
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETS
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide Show
 
Twitter API Annotations
Twitter API AnnotationsTwitter API Annotations
Twitter API Annotations
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 

Similar to SubTopic Detection of Tweets Related to an Entity

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsS M Raju
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdfvisheshs4
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On TwitterShashank S
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionIJERA Editor
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterVenkat Projects
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37manish jindal
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State Universitydhabalia
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexEric Tham
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionTarekMourad8
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET Journal
 

Similar to SubTopic Detection of Tweets Related to an Entity (20)

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdf
 
Q01741118123
Q01741118123Q01741118123
Q01741118123
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Ire major project
Ire major projectIre major project
Ire major project
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event Detection
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitter
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Internship
InternshipInternship
Internship
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 

Recently uploaded

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 

Recently uploaded (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 

SubTopic Detection of Tweets Related to an Entity

  • 1. Sub-Topic Detection Of Tweets Related To An Entity International Institute of Information Technology-Hyderabad Mentor - Sandeep Pannem By P Yashaswi (201102111) Aayush Asawa(201305617) Kumari Ankita(201101161) Diksha J. Yadav(201125130)
  • 2. Introduction ➢ Tweets are classified according to the “Topic” and then the “Subtopic” they refer to. ○ “Topic” refers to any major event in the real world. ○ “Subtopics” are fine-grained aspects of such events. ➢ Mining subtopics from entities/topics from tweets helps in trend analysis, social monitoring, topic tracking and reputation mining. ➢ Generally all tweets related to a particular entity have similar keywords. So, while detecting the subtopics will have to deal with more features.
  • 3. Work Flow Training Data Store features in Lucene Classifier (Phase 1,2,3) Detected Subtopic Extract Tweet features Input Tweet
  • 4. Approach Input : Training set of tweets which have subtopic names as class labels. Test tweets which are to be classified into subtopics Output : Assign subtopics to each of the test tweets The entire workflow can be broken into three phases : 1. Pre-processing 2. Feature Extraction and Representation 3. Classification.
  • 5. Feature Extraction The following features are extracted from each tweet : ➢ TweetConcepts (using TagMe API) ➢ Named entity and event phrases( using Twical) ➢ URLConcepts(using TagMe API on the content in the external links) ➢ Key Phrases(extracting noun phrases after POS tagging) ➢ Hash tags ➢ Categories(extracting categories for the titles got though TagMe) Similarity Measures used : ➢ Wikipedia miner(for comparing wikipedia titles) ➢ Wordnet similarity measure(to compare key phrases)
  • 6. Classification ➢ Subtopic detection is considered as a classification problem where subtopics are the class labels for the tweets which are the data points. ➢ The classifier derives logic from what features majority of the tweet (datapoints) of a particular subtopic(class label) have. ➢ Based on the features initial seed clusters are created for each topic and each cluster is represented as crisp information and index. ➢ The features of test tweets are found and compared with the clusters, and then a cluster to which it best matches is assigned to the test tweet. ➢ This is done using Machine Learning technique.
  • 7. Pre-Processing Pre-processing involves the following steps : ➢ Removal of stopwords from the tweets and stemming from the training data points. ➢ Extracting URLS from the tweets. This is done for both training and test tweets.
  • 8. Algorithm Offline Process 1. All the tweets in the training data are grouped together according to their sub topic 2. For every tweet in a subtopic, the features are extracted and are grouped to form subtopic features. 3. The subtopic features of all the subtopic are stored in the lucene index under different fields. 4. All those features that are common in two or more subtopics are removed, also those features are removed that are directly related to the entity name.
  • 9. Algorithm Online Procedure 1. Phase 1 : The category features of the test tweet are searched in the lucene index and the top 10 subtopics are listed. 2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared with that of the top 10 subtopics from Phase 1 and top 5 subtopics are listed based on wikipedia miner similarity measure. 3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5 category list from phase 2 using wordnet similarity measures. For hash tags direct intersection is done .After this the best of 5 subtopics is chosen All these can also be clubbed together to get the best subtopic
  • 10. Experiments ➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities. Each entity has about 700 tweets for training and 1500 tweets for testing. ➢ For evaluation we use Reliability ,Sensitivity and F Measure. The results that we got for the entity “Volvo” are: Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
  • 11. Future Work ➢ We can build an SVM classifier which can accurately determine which feature has to be given preference while classifying the tweets ➢ The input vectors would have dimensions as various features of various subtopics with the corresponding similarity measures as the coefficients , where the labelled subtopic is the class label ➢ In the testing phase we can create similar vectors for test tweets to get their corresponding subtopics
  • 12. Reference 1. REINA at RepLab2013 Topic Detection Task: Community Detection 2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter