SlideShare a Scribd company logo
Sensing Topics in Tweets
Amar Budhiraja
Manas Tewari
Vikram Ahuja
Team - 39
Problem Statement
Identifying topics in Tweets is a major problem as it poses the following
challenges:
The length of a Tweet is very small
Noisy data
Uneven Typing (Use of words like “U”, “em” etc)
We plan on comparing three Topic Detection in Twitter schemes from available
literature.
The goal of the project is to understand the tradeoff between the three
approaches when it comes to topic detection in Tweets.
Dataset
Tweets were extracted for two topics:
FA Cup 2015 (~2000)
US Elections(~2000)
SUPER TUESDAYS (~2000)
TOTAL - 6231
Classifiers Used
● LDA(Unigrams Only)
● KNN(Calculating for Unigrams, Bigram and Trigram)
● K-Means (Unigrams Only)
● Naive Bayes(Calculating for Unigrams, Bigram and Trigram)
● Hashtag Mirroring and Proper Nouns Boosting done for every
Classifier.
Latent Dirichlet Allocation
1. Each Document is considered as a probability distribution over topics which in
turn are distributions over words.
2. Every Document is considered as a bag of terms which are only observed variables
in the model.
3. LDA requires the expected number of topics k as input.
Naive Bayes Classifier
Assumes independence among features
Implemented as a Tweet being a BAG of Words Representation
We did not consider any smoothing function over the Bayes Classifier - Neglected the
words which were not seen earlier.
For each Tweet the following Bayes equation was modelled based on its words
Clustering using K-Means
● Clustering algorithms aim at finding the natural grouping of a
given set of elements with a notion of distance between each
pair of elements.
● We used COSINE similarity between each pair of Tweets to
perform clustering
K-Nearest Neighbor
For each iteration of KNN algorithm, distance of input point is computed with every
other data point in the set and the K closest neighbors are noticed
Usually, a majority vote is taken into consideration to assign a class to the incoming
point
Odd value of K is preferred
Our Approaches
● Stop word removal
● Boosting using duplication of hashtags and proper nouns
● Representing strings as numbers using TF*IDF
● An extremely sparse matrix created for TF * IDF. 14000*6000 in case of Unigrams,
113000 * 6000 in case of Bigrams and 189000*6000 in case of Trigrams.
● Checking for accuracy over different ratios training to test data (80-20, 60-40, 40-
60)
● Cosine similarity used for K-means with number of clusters taken as 3
Results
KNN(Unigram)
KNN(Bigrams)
KNN(Trigram)
KMeans
For 3 Clusters
Naive Bayes
LDA
Conclusion
● Boosting proved too be a valuable modification to data on almost all occasions.
● In KNN we observe that for all the unigram, bigram and trigram the accuracy
increases till a specific value of K and then decreases for large values of K.
● Trigram gave better results than bigrams and bigrams gives a better accuracy than
unigram for all the cases.
● K-means gave least accuracy as it is an unsupervised method
● LDA performance increased drastically by boosting
● Trigrams provide the best result as most of the tweets are semantically similar and
syntactically similar in case of Retweets.
Thank you!
Questions?

More Related Content

What's hot

Naive string matching
Naive string matchingNaive string matching
Naive string matching
Abhishek Singh
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
Subhas Kumar Ghosh
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
Network and Tree in Graph Theory
Network and Tree in Graph TheoryNetwork and Tree in Graph Theory
Network and Tree in Graph Theory
Rabin BK
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
Asiri Wijesinghe
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Cryptography an application of vectors and matrices
Cryptography an application of vectors and matricesCryptography an application of vectors and matrices
Cryptography an application of vectors and matrices
dianasc04
 
Clustering
ClusteringClustering
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
Ajay Iet
 
Kmeans
KmeansKmeans
Kmeans
Nikita Goyal
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
Iwan Sofana
 
Clustering
ClusteringClustering
Clustering
LipikaSaha2
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
Zachary Thomas
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppt
tarun shekhawat
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
Zohaib HUSSAIN
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs
Waqas Nawaz
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
Prashanth Guntal
 
Mit102 data & file structures
Mit102  data & file structuresMit102  data & file structures
Mit102 data & file structures
smumbahelp
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
Sara-Jayne Terp
 

What's hot (20)

Naive string matching
Naive string matchingNaive string matching
Naive string matching
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Network and Tree in Graph Theory
Network and Tree in Graph TheoryNetwork and Tree in Graph Theory
Network and Tree in Graph Theory
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Cryptography an application of vectors and matrices
Cryptography an application of vectors and matricesCryptography an application of vectors and matrices
Cryptography an application of vectors and matrices
 
Clustering
ClusteringClustering
Clustering
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
Kmeans
KmeansKmeans
Kmeans
 
Training machine learning knn 2017
Training machine learning knn 2017Training machine learning knn 2017
Training machine learning knn 2017
 
Clustering
ClusteringClustering
Clustering
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Needleman wunsch computional ppt
Needleman wunsch computional pptNeedleman wunsch computional ppt
Needleman wunsch computional ppt
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Mit102 data & file structures
Mit102  data & file structuresMit102  data & file structures
Mit102 data & file structures
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 

Viewers also liked

Conference presentation final
Conference presentation finalConference presentation final
Conference presentation final
ALYAA AL-BARRAK
 
Alzheimer disease
Alzheimer diseaseAlzheimer disease
Alzheimer disease
Gopal Agrawal
 
Resume_NamanSinghal
Resume_NamanSinghalResume_NamanSinghal
Resume_NamanSinghal
NAMAN SINGHAL
 
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
SOCIAL AGENT, Inc
 
Avoid plagiarism may 2011
Avoid plagiarism may 2011Avoid plagiarism may 2011
Avoid plagiarism may 2011
weirnelson
 
社会構造を変える Io Tサービス ~イノベーションにチャレンジせよ~
社会構造を変える Io Tサービス  ~イノベーションにチャレンジせよ~社会構造を変える Io Tサービス  ~イノベーションにチャレンジせよ~
社会構造を変える Io Tサービス ~イノベーションにチャレンジせよ~
Nitta Tetsuya
 
Appleの開発・買収動向
Appleの開発・買収動向Appleの開発・買収動向
Appleの開発・買収動向
Takayuki Yamazaki
 
Масштабирование бизнеса
Масштабирование бизнеса Масштабирование бизнеса
Масштабирование бизнеса
Training Institute - ARB Pro Group
 
DRUG discovery
DRUG discoveryDRUG discovery
DRUG discovery
Gopal Agrawal
 
Scrum to Scrumban Migration
Scrum to Scrumban MigrationScrum to Scrumban Migration
Scrum to Scrumban Migration
Skills Matter
 
Xanax New England
Xanax New EnglandXanax New England
Xanax New England
21Engineers
 
Системный анализ - зачем?
Системный анализ - зачем?Системный анализ - зачем?
Системный анализ - зачем?
Vladimir Kalenov
 
Christmas in china
Christmas in chinaChristmas in china
Christmas in china
schreiber12
 

Viewers also liked (14)

Billboard
BillboardBillboard
Billboard
 
Conference presentation final
Conference presentation finalConference presentation final
Conference presentation final
 
Alzheimer disease
Alzheimer diseaseAlzheimer disease
Alzheimer disease
 
Resume_NamanSinghal
Resume_NamanSinghalResume_NamanSinghal
Resume_NamanSinghal
 
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
東南アジアへのネット通販・越境EC支援プラットフォーム”EC-PORT”
 
Avoid plagiarism may 2011
Avoid plagiarism may 2011Avoid plagiarism may 2011
Avoid plagiarism may 2011
 
社会構造を変える Io Tサービス ~イノベーションにチャレンジせよ~
社会構造を変える Io Tサービス  ~イノベーションにチャレンジせよ~社会構造を変える Io Tサービス  ~イノベーションにチャレンジせよ~
社会構造を変える Io Tサービス ~イノベーションにチャレンジせよ~
 
Appleの開発・買収動向
Appleの開発・買収動向Appleの開発・買収動向
Appleの開発・買収動向
 
Масштабирование бизнеса
Масштабирование бизнеса Масштабирование бизнеса
Масштабирование бизнеса
 
DRUG discovery
DRUG discoveryDRUG discovery
DRUG discovery
 
Scrum to Scrumban Migration
Scrum to Scrumban MigrationScrum to Scrumban Migration
Scrum to Scrumban Migration
 
Xanax New England
Xanax New EnglandXanax New England
Xanax New England
 
Системный анализ - зачем?
Системный анализ - зачем?Системный анализ - зачем?
Системный анализ - зачем?
 
Christmas in china
Christmas in chinaChristmas in china
Christmas in china
 

Similar to Sensing topics in Tweets

KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
Mohamed Essam
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
Trupti Shingala, WAS, CPACC, CPWA, JAWS, CSM
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
Mobashshirur Rahman 👲
 
Data clustering
Data clustering Data clustering
Data clustering
GARIMA SHAKYA
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
Liwei Ren任力偉
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
IRJET Journal
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
Abhimanyu Dwivedi
 
Clique and sting
Clique and stingClique and sting
Clique and sting
Subramanyam Natarajan
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
Vivek Maskara
 
Clustering
ClusteringClustering
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
Abdullah al Mamun
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
Data Science Warsaw
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
Edureka!
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
ShubhamSiddhartha
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
RishavSharma112
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 

Similar to Sensing topics in Tweets (20)

KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Introduction to data mining and machine learning
Introduction to data mining and machine learningIntroduction to data mining and machine learning
Introduction to data mining and machine learning
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
 
Data clustering
Data clustering Data clustering
Data clustering
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Evaluation of programs codes using machine learning
Evaluation of programs codes using machine learningEvaluation of programs codes using machine learning
Evaluation of programs codes using machine learning
 
Clustering
ClusteringClustering
Clustering
 
K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)K-Nearest Neighbor(KNN)
K-Nearest Neighbor(KNN)
 
Metody logiczne w analizie danych
Metody logiczne w analizie danych Metody logiczne w analizie danych
Metody logiczne w analizie danych
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
 
Knn Algorithm presentation
Knn Algorithm presentationKnn Algorithm presentation
Knn Algorithm presentation
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Sensing topics in Tweets

  • 1. Sensing Topics in Tweets Amar Budhiraja Manas Tewari Vikram Ahuja Team - 39
  • 2. Problem Statement Identifying topics in Tweets is a major problem as it poses the following challenges: The length of a Tweet is very small Noisy data Uneven Typing (Use of words like “U”, “em” etc) We plan on comparing three Topic Detection in Twitter schemes from available literature. The goal of the project is to understand the tradeoff between the three approaches when it comes to topic detection in Tweets.
  • 3. Dataset Tweets were extracted for two topics: FA Cup 2015 (~2000) US Elections(~2000) SUPER TUESDAYS (~2000) TOTAL - 6231
  • 4. Classifiers Used ● LDA(Unigrams Only) ● KNN(Calculating for Unigrams, Bigram and Trigram) ● K-Means (Unigrams Only) ● Naive Bayes(Calculating for Unigrams, Bigram and Trigram) ● Hashtag Mirroring and Proper Nouns Boosting done for every Classifier.
  • 5. Latent Dirichlet Allocation 1. Each Document is considered as a probability distribution over topics which in turn are distributions over words. 2. Every Document is considered as a bag of terms which are only observed variables in the model. 3. LDA requires the expected number of topics k as input.
  • 6. Naive Bayes Classifier Assumes independence among features Implemented as a Tweet being a BAG of Words Representation We did not consider any smoothing function over the Bayes Classifier - Neglected the words which were not seen earlier. For each Tweet the following Bayes equation was modelled based on its words
  • 7. Clustering using K-Means ● Clustering algorithms aim at finding the natural grouping of a given set of elements with a notion of distance between each pair of elements. ● We used COSINE similarity between each pair of Tweets to perform clustering
  • 8. K-Nearest Neighbor For each iteration of KNN algorithm, distance of input point is computed with every other data point in the set and the K closest neighbors are noticed Usually, a majority vote is taken into consideration to assign a class to the incoming point Odd value of K is preferred
  • 9. Our Approaches ● Stop word removal ● Boosting using duplication of hashtags and proper nouns ● Representing strings as numbers using TF*IDF ● An extremely sparse matrix created for TF * IDF. 14000*6000 in case of Unigrams, 113000 * 6000 in case of Bigrams and 189000*6000 in case of Trigrams. ● Checking for accuracy over different ratios training to test data (80-20, 60-40, 40- 60) ● Cosine similarity used for K-means with number of clusters taken as 3
  • 11.
  • 13.
  • 15.
  • 17.
  • 19.
  • 20. LDA
  • 21. Conclusion ● Boosting proved too be a valuable modification to data on almost all occasions. ● In KNN we observe that for all the unigram, bigram and trigram the accuracy increases till a specific value of K and then decreases for large values of K. ● Trigram gave better results than bigrams and bigrams gives a better accuracy than unigram for all the cases. ● K-means gave least accuracy as it is an unsupervised method ● LDA performance increased drastically by boosting ● Trigrams provide the best result as most of the tweets are semantically similar and syntactically similar in case of Retweets.