SlideShare a Scribd company logo
Sub-Topic Detection Of Tweets
Related To An Entity
International Institute of Information Technology-Hyderabad
Mentor - Sandeep Pannem
By
P Yashaswi (201102111) Aayush Asawa(201305617)
Kumari Ankita(201101161) Diksha J. Yadav(201125130)
Introduction
➢ Tweets are classified according to the “Topic” and then the “Subtopic” they
refer to.
○ “Topic” refers to any major event in the real world.
○ “Subtopics” are fine-grained aspects of such events.
➢ Mining subtopics from entities/topics from tweets helps in trend
analysis, social monitoring, topic tracking and reputation
mining.
➢ Generally all tweets related to a particular entity have similar keywords. So,
while detecting the subtopics will have to deal with more features.
Work Flow
Training Data
Store
features in
Lucene
Classifier
(Phase 1,2,3)
Detected
Subtopic
Extract
Tweet
features
Input Tweet
Approach
Input : Training set of tweets which have subtopic names as class labels.
Test tweets which are to be classified into subtopics
Output : Assign subtopics to each of the test tweets
The entire workflow can be broken into three phases :
1. Pre-processing
2. Feature Extraction and Representation
3. Classification.
Feature Extraction
The following features are extracted from each tweet :
➢ TweetConcepts (using TagMe API)
➢ Named entity and event phrases( using Twical)
➢ URLConcepts(using TagMe API on the content in the external links)
➢ Key Phrases(extracting noun phrases after POS tagging)
➢ Hash tags
➢ Categories(extracting categories for the titles got though TagMe)
Similarity Measures used :
➢ Wikipedia miner(for comparing wikipedia titles)
➢ Wordnet similarity measure(to compare key phrases)
Classification
➢ Subtopic detection is considered as a classification problem where
subtopics are the class labels for the tweets which are the data points.
➢ The classifier derives logic from what features majority of the tweet
(datapoints) of a particular subtopic(class label) have.
➢ Based on the features initial seed clusters are created for each topic and
each cluster is represented as crisp information and index.
➢ The features of test tweets are found and compared with the clusters, and
then a cluster to which it best matches is assigned to the test tweet.
➢ This is done using Machine Learning technique.
Pre-Processing
Pre-processing involves the following steps :
➢ Removal of stopwords from the tweets and stemming from the training
data points.
➢ Extracting URLS from the tweets.
This is done for both training and test tweets.
Algorithm
Offline Process
1. All the tweets in the training data are grouped together according to their
sub topic
2. For every tweet in a subtopic, the features are extracted and are grouped to
form subtopic features.
3. The subtopic features of all the subtopic are stored in the lucene index
under different fields.
4. All those features that are common in two or more subtopics are removed,
also those features are removed that are directly related to the entity name.
Algorithm
Online Procedure
1. Phase 1 : The category features of the test tweet are searched in the lucene
index and the top 10 subtopics are listed.
2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared
with that of the top 10 subtopics from Phase 1 and top 5 subtopics are
listed based on wikipedia miner similarity measure.
3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5
category list from phase 2 using wordnet similarity measures. For hash tags
direct intersection is done .After this the best of 5 subtopics is chosen
All these can also be clubbed together to get the best subtopic
Experiments
➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities.
Each entity has about 700 tweets for training and 1500 tweets for testing.
➢ For evaluation we use Reliability ,Sensitivity and F Measure.
The results that we got for the entity “Volvo” are:
Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
Future Work
➢ We can build an SVM classifier which can accurately determine which
feature has to be given preference while classifying the tweets
➢ The input vectors would have dimensions as various features of various
subtopics with the corresponding similarity measures as the coefficients ,
where the labelled subtopic is the class label
➢ In the testing phase we can create similar vectors for test tweets to get their
corresponding subtopics
Reference
1. REINA at RepLab2013 Topic Detection Task: Community Detection
2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter

More Related Content

What's hot

Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
Venkat Projects
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
Ayush Khandelwal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
Pravin Katiyar
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
Ashish Mundra
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on Twitter
Lukas Masuch
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
CloudTechnologies
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
Ravi Kumar
 
Combined queries
Combined queriesCombined queries
Combined queries
Laura Strudeman
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
Savio Aberneithie
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804Amir Goudarzi
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
prathako
 
Mule filters
Mule filtersMule filters
Mule filters
Son Nguyen
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
Ravindra Chaudhary
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7ashhadiqbal
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
🧑‍💻 Manuel Coppotelli
 

What's hot (18)

Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Sentiment analysis using machine learning
Sentiment analysis using machine learningSentiment analysis using machine learning
Sentiment analysis using machine learning
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Trend detection and analysis on Twitter
Trend detection and analysis on TwitterTrend detection and analysis on Twitter
Trend detection and analysis on Twitter
 
Sentiment analysis in twitter using python
Sentiment analysis in twitter using pythonSentiment analysis in twitter using python
Sentiment analysis in twitter using python
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Combined queries
Combined queriesCombined queries
Combined queries
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804포스터_아미르호세인그다르지_2010-11804
포스터_아미르호세인그다르지_2010-11804
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Mule filters
Mule filtersMule filters
Mule filters
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7Comp 220 ilab 5 of 7
Comp 220 ilab 5 of 7
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 

Viewers also liked

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of Tweets
Gabriela Agustini
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
M. Atif Qureshi
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For Disasters
Sarvnaz Karimi
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
Yegin Genc
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweetsmitsmit
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical Angus Fox
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010
Angus Fox
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETS
Mukul Jha
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide Show
Dan Foote
 
Twitter API Annotations
Twitter API AnnotationsTwitter API Annotations
Twitter API Annotations
Raffi Krikorian
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
Trilok Sharma
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes ClassifiersDongseo University
 

Viewers also liked (12)

Harnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of TweetsHarnessing Web Page Directories for Large-Scale Classification of Tweets
Harnessing Web Page Directories for Large-Scale Classification of Tweets
 
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in TweetsExploiting Wikipedia for Entity Name Disambiguation in Tweets
Exploiting Wikipedia for Entity Name Disambiguation in Tweets
 
Classifying Microblogs For Disasters
Classifying Microblogs For DisastersClassifying Microblogs For Disasters
Classifying Microblogs For Disasters
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
 
Semantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports TweetsSemantic Entity extraction from Sports Tweets
Semantic Entity extraction from Sports Tweets
 
warblecamp - twical
warblecamp - twical warblecamp - twical
warblecamp - twical
 
London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010London Twitter Developer Nest - April 2010
London Twitter Developer Nest - April 2010
 
CLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETSCLASSIFICATION OF TWEETS
CLASSIFICATION OF TWEETS
 
Dan Foote Slide Show
Dan Foote Slide ShowDan Foote Slide Show
Dan Foote Slide Show
 
Twitter API Annotations
Twitter API AnnotationsTwitter API Annotations
Twitter API Annotations
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
2013-1 Machine Learning Lecture 03 - Naïve Bayes Classifiers
 

Similar to SubTopic Detection of Tweets Related to an Entity

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
Sumit Raj
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
S M Raju
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdf
visheshs4
 
Q01741118123
Q01741118123Q01741118123
Q01741118123
IOSR Journals
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter
Shashank S
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Ire major project
Ire major projectIre major project
Ire major project
Abhishek Mungoli
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event Detection
IJERA Editor
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
IRJET Journal
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitter
Venkat Projects
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
Subarno Pal
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
manish jindal
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
IRJET Journal
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
dhabalia
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
Eric Tham
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET Journal
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
TarekMourad8
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Xin Ye
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET Journal
 

Similar to SubTopic Detection of Tweets Related to an Entity (20)

Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
sentimentanaly 2.pdf
sentimentanaly 2.pdfsentimentanaly 2.pdf
sentimentanaly 2.pdf
 
Q01741118123
Q01741118123Q01741118123
Q01741118123
 
19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter19-14-Sentiment Analysis On Twitter
19-14-Sentiment Analysis On Twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Ire major project
Ire major projectIre major project
Ire major project
 
Social Sensor for Real Time Event Detection
Social Sensor for Real Time Event DetectionSocial Sensor for Real Time Event Detection
Social Sensor for Real Time Event Detection
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
Twitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitterTwitter sentiment analysis basedon ordinal regression twitter
Twitter sentiment analysis basedon ordinal regression twitter
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37IRE Project IIIT Hyderabad Tweet classification Group 37
IRE Project IIIT Hyderabad Tweet classification Group 37
 
Internship
InternshipInternship
Internship
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State UniversityLSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
LSP ( Logic Score Preference ) _ Rajan_Dhabalia_San Francisco State University
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
A Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information ExtractionA Fuzzy Logic Intelligent Agent for Information Extraction
A Fuzzy Logic Intelligent Agent for Information Extraction
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 

Recently uploaded

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 

Recently uploaded (20)

June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 

SubTopic Detection of Tweets Related to an Entity

  • 1. Sub-Topic Detection Of Tweets Related To An Entity International Institute of Information Technology-Hyderabad Mentor - Sandeep Pannem By P Yashaswi (201102111) Aayush Asawa(201305617) Kumari Ankita(201101161) Diksha J. Yadav(201125130)
  • 2. Introduction ➢ Tweets are classified according to the “Topic” and then the “Subtopic” they refer to. ○ “Topic” refers to any major event in the real world. ○ “Subtopics” are fine-grained aspects of such events. ➢ Mining subtopics from entities/topics from tweets helps in trend analysis, social monitoring, topic tracking and reputation mining. ➢ Generally all tweets related to a particular entity have similar keywords. So, while detecting the subtopics will have to deal with more features.
  • 3. Work Flow Training Data Store features in Lucene Classifier (Phase 1,2,3) Detected Subtopic Extract Tweet features Input Tweet
  • 4. Approach Input : Training set of tweets which have subtopic names as class labels. Test tweets which are to be classified into subtopics Output : Assign subtopics to each of the test tweets The entire workflow can be broken into three phases : 1. Pre-processing 2. Feature Extraction and Representation 3. Classification.
  • 5. Feature Extraction The following features are extracted from each tweet : ➢ TweetConcepts (using TagMe API) ➢ Named entity and event phrases( using Twical) ➢ URLConcepts(using TagMe API on the content in the external links) ➢ Key Phrases(extracting noun phrases after POS tagging) ➢ Hash tags ➢ Categories(extracting categories for the titles got though TagMe) Similarity Measures used : ➢ Wikipedia miner(for comparing wikipedia titles) ➢ Wordnet similarity measure(to compare key phrases)
  • 6. Classification ➢ Subtopic detection is considered as a classification problem where subtopics are the class labels for the tweets which are the data points. ➢ The classifier derives logic from what features majority of the tweet (datapoints) of a particular subtopic(class label) have. ➢ Based on the features initial seed clusters are created for each topic and each cluster is represented as crisp information and index. ➢ The features of test tweets are found and compared with the clusters, and then a cluster to which it best matches is assigned to the test tweet. ➢ This is done using Machine Learning technique.
  • 7. Pre-Processing Pre-processing involves the following steps : ➢ Removal of stopwords from the tweets and stemming from the training data points. ➢ Extracting URLS from the tweets. This is done for both training and test tweets.
  • 8. Algorithm Offline Process 1. All the tweets in the training data are grouped together according to their sub topic 2. For every tweet in a subtopic, the features are extracted and are grouped to form subtopic features. 3. The subtopic features of all the subtopic are stored in the lucene index under different fields. 4. All those features that are common in two or more subtopics are removed, also those features are removed that are directly related to the entity name.
  • 9. Algorithm Online Procedure 1. Phase 1 : The category features of the test tweet are searched in the lucene index and the top 10 subtopics are listed. 2. Phase 2 : The tweet concepts and URL concepts of test tweet are compared with that of the top 10 subtopics from Phase 1 and top 5 subtopics are listed based on wikipedia miner similarity measure. 3. Phase 3 : NER, Key phrases, event phrases are compared with the top 5 category list from phase 2 using wordnet similarity measures. For hash tags direct intersection is done .After this the best of 5 subtopics is chosen All these can also be clubbed together to get the best subtopic
  • 10. Experiments ➢ RepLab 2013 data set was used. The dataset contains tweets for 61entities. Each entity has about 700 tweets for training and 1500 tweets for testing. ➢ For evaluation we use Reliability ,Sensitivity and F Measure. The results that we got for the entity “Volvo” are: Sensitivity : 0.37 , Reliability : 0.39 F measure : 0.38
  • 11. Future Work ➢ We can build an SVM classifier which can accurately determine which feature has to be given preference while classifying the tweets ➢ The input vectors would have dimensions as various features of various subtopics with the corresponding similarity measures as the coefficients , where the labelled subtopic is the class label ➢ In the testing phase we can create similar vectors for test tweets to get their corresponding subtopics
  • 12. Reference 1. REINA at RepLab2013 Topic Detection Task: Community Detection 2. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter