SlideShare a Scribd company logo
1 of 9
Project : Sub-event
detection on Social Media
Codebase:
https://github.com/pallavshah/TwitterSubeventDetector
Pallav Shah Akshay Joshi
Rajat Bhardwaj Ravneet Singh Kathuria
The Project
• Make a timeline/summary of events from a corpus of tweets
commenting on the event.
• The corpus consists of tweets from a specific domain talking about a
single major event.
• The objective of the project is to extract sub-events within the event.
• Summary will be short description about the sub event.
Our Approach
We followed a two-step approach:
• Sub-event Detection: The first step is to identify if and when a sub-
event has occurred and if it has, what tweets comprise the sub-event
• Tweet Selection: The second step is to choose a representative tweet
that describes the sub-event appropriately.
The aggregation of these two processes will in turn provide a set of
tweets as a summary of the event.
Part1: Detecting the sub-
event
Sub-event detection is done by finding the distance measure between
different tweets of same event.
• Dictionary of words: The parsed data is used to create a dictionary
which stores relevant words and its count in the corpus.
• Vector for each tweet: The generated dictionary and a second parse
over the parsed data are used to get a single sparse vector
corresponding to each tweet. This vector contains the id and count of
each word present in the tweet.
Part 1: Detecting the sub-
event(continued)
• The sub-event detector module:
 The module uses LSHash Library of Python to find similarity distance
between various tweets. Each tweet is analyzed and compared with the
existing group of similar tweets.
If the tweet matches to any of the group with a high threshold, the tweet is
assumed to belong to that group and added to it.
Otherwise, a new group is created with that tweet as the representative
tweet of the group. In the end all the tweets as thus partitioned into groups
(or clusters) representing different sub-events.
Part 2: Summarization of Sub-
event
• Term Frequency Inverse Document Frequency: A statistical weighting
technique that assigns each term within a document a weight that
reflects the term’s saliency within the document. The TF-IDF value is
composed of two primary parts.
The term frequency component (TF) assigns more weight to words that occur
frequently within a document because important words are often repeated.
The inverse document frequency component (IDF) compensates for the fact
that some words such as common stop words are frequent.
Normalization of tweets: The tweets are normalized to prevent bias towards
larger tweets.
System Block Diagram
Technologies Used
We have used the following python libraries:
• LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev
• Gensim: http://radimrehurek.com/gensim/
Dataset
We used Snow dataset containing tweets of 2012 US General Elections.
Experiments and Results
• Tested on the 2012 US General Elections tweets data set from SNOW
2014.
• Results bore around 60% accuracy as compared to manual evaluation
of the tweets data.

More Related Content

Similar to Twitter Sub-event Detection Project Presentation

Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaAhmedali Durga
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexEric Tham
 
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Surveyvivatechijri
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaionPrathameshSankpal
 
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET Journal
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweetsmoresmile
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionieeepondy
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntityAnkita Kumari
 
final_nlp
final_nlpfinal_nlp
final_nlpaphex34
 
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationIJERA Editor
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Ke Tao
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2The Night's Watch
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture Ramez Al-Fayez
 
Comparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksComparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksElenaEtchemendy1
 

Similar to Twitter Sub-event Detection Project Presentation (20)

Group-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social mediaGroup-13 Project 15 Sub event detection on social media
Group-13 Project 15 Sub event detection on social media
 
Real time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ indexReal time sentiment analysis of twitter feeds with the NASDAQ index
Real time sentiment analysis of twitter feeds with the NASDAQ index
 
Tweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A SurveyTweet Summarization and Segmentation: A Survey
Tweet Summarization and Segmentation: A Survey
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Ire major project
Ire major projectIre major project
Ire major project
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Tweet analyzer web applicaion
Tweet analyzer web applicaionTweet analyzer web applicaion
Tweet analyzer web applicaion
 
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 
Tweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognitionTweet segmentation and its application to named entity recognition
Tweet segmentation and its application to named entity recognition
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank SummarizationTopic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
Topic Evolutionary Tweet Stream Clustering Algorithm and TCV Rank Summarization
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2Detecting Trends Through Twitter Stream v2
Detecting Trends Through Twitter Stream v2
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
Comparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksComparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworks
 

Recently uploaded

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxaryanv1753
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 

Recently uploaded (20)

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Event 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptxEvent 4 Introduction to Open Source.pptx
Event 4 Introduction to Open Source.pptx
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 

Twitter Sub-event Detection Project Presentation

  • 1. Project : Sub-event detection on Social Media Codebase: https://github.com/pallavshah/TwitterSubeventDetector Pallav Shah Akshay Joshi Rajat Bhardwaj Ravneet Singh Kathuria
  • 2. The Project • Make a timeline/summary of events from a corpus of tweets commenting on the event. • The corpus consists of tweets from a specific domain talking about a single major event. • The objective of the project is to extract sub-events within the event. • Summary will be short description about the sub event.
  • 3. Our Approach We followed a two-step approach: • Sub-event Detection: The first step is to identify if and when a sub- event has occurred and if it has, what tweets comprise the sub-event • Tweet Selection: The second step is to choose a representative tweet that describes the sub-event appropriately. The aggregation of these two processes will in turn provide a set of tweets as a summary of the event.
  • 4. Part1: Detecting the sub- event Sub-event detection is done by finding the distance measure between different tweets of same event. • Dictionary of words: The parsed data is used to create a dictionary which stores relevant words and its count in the corpus. • Vector for each tweet: The generated dictionary and a second parse over the parsed data are used to get a single sparse vector corresponding to each tweet. This vector contains the id and count of each word present in the tweet.
  • 5. Part 1: Detecting the sub- event(continued) • The sub-event detector module:  The module uses LSHash Library of Python to find similarity distance between various tweets. Each tweet is analyzed and compared with the existing group of similar tweets. If the tweet matches to any of the group with a high threshold, the tweet is assumed to belong to that group and added to it. Otherwise, a new group is created with that tweet as the representative tweet of the group. In the end all the tweets as thus partitioned into groups (or clusters) representing different sub-events.
  • 6. Part 2: Summarization of Sub- event • Term Frequency Inverse Document Frequency: A statistical weighting technique that assigns each term within a document a weight that reflects the term’s saliency within the document. The TF-IDF value is composed of two primary parts. The term frequency component (TF) assigns more weight to words that occur frequently within a document because important words are often repeated. The inverse document frequency component (IDF) compensates for the fact that some words such as common stop words are frequent. Normalization of tweets: The tweets are normalized to prevent bias towards larger tweets.
  • 8. Technologies Used We have used the following python libraries: • LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev • Gensim: http://radimrehurek.com/gensim/ Dataset We used Snow dataset containing tweets of 2012 US General Elections.
  • 9. Experiments and Results • Tested on the 2012 US General Elections tweets data set from SNOW 2014. • Results bore around 60% accuracy as compared to manual evaluation of the tweets data.