Twitter Sub-event Detection Project Presentation

•Download as PPTX, PDF•

1 like•1,072 views

Pallav Shah

Powerpoint presentation for IRE Major Project - Twitter Sub-event Detection

Presentations & Public Speaking Technology

Project : Sub-event
detection on Social Media
Codebase:
https://github.com/pallavshah/TwitterSubeventDetector
Pallav Shah Akshay Joshi
Rajat Bhardwaj Ravneet Singh Kathuria

The Project
• Make a timeline/summary of events from a corpus of tweets
commenting on the event.
• The corpus consists of tweets from a specific domain talking about a
single major event.
• The objective of the project is to extract sub-events within the event.
• Summary will be short description about the sub event.

Our Approach
We followed a two-step approach:
• Sub-event Detection: The first step is to identify if and when a sub-
event has occurred and if it has, what tweets comprise the sub-event
• Tweet Selection: The second step is to choose a representative tweet
that describes the sub-event appropriately.
The aggregation of these two processes will in turn provide a set of
tweets as a summary of the event.

Part1: Detecting the sub-
event
Sub-event detection is done by finding the distance measure between
different tweets of same event.
• Dictionary of words: The parsed data is used to create a dictionary
which stores relevant words and its count in the corpus.
• Vector for each tweet: The generated dictionary and a second parse
over the parsed data are used to get a single sparse vector
corresponding to each tweet. This vector contains the id and count of
each word present in the tweet.

Part 1: Detecting the sub-
event(continued)
• The sub-event detector module:
 The module uses LSHash Library of Python to find similarity distance
between various tweets. Each tweet is analyzed and compared with the
existing group of similar tweets.
If the tweet matches to any of the group with a high threshold, the tweet is
assumed to belong to that group and added to it.
Otherwise, a new group is created with that tweet as the representative
tweet of the group. In the end all the tweets as thus partitioned into groups
(or clusters) representing different sub-events.

Part 2: Summarization of Sub-
event
• Term Frequency Inverse Document Frequency: A statistical weighting
technique that assigns each term within a document a weight that
reflects the term’s saliency within the document. The TF-IDF value is
composed of two primary parts.
The term frequency component (TF) assigns more weight to words that occur
frequently within a document because important words are often repeated.
The inverse document frequency component (IDF) compensates for the fact
that some words such as common stop words are frequent.
Normalization of tweets: The tweets are normalized to prevent bias towards
larger tweets.

Technologies Used
We have used the following python libraries:
• LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev
• Gensim: http://radimrehurek.com/gensim/
Dataset
We used Snow dataset containing tweets of 2012 US General Elections.

Experiments and Results
• Tested on the 2012 US General Elections tweets data set from SNOW
2014.
• Results bore around 60% accuracy as compared to manual evaluation
of the tweets data.

Recently uploaded

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69

miladyskindiseases-200705210221 2.!!pptxCarrieButtitta

Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS

The Ten Facts About People With Autism PresentationNathan Young

Event 4 Introduction to Open Source.pptxaryanv1753

OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS

OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS

James Joyce, Dubliners and Ulysses.ppt !risocarla2016

PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi

Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS

Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807

NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie

Genshin Impact PPT Template by EaTemp.pptxJohnree4

Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella

Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power

Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe

Philippine History cavite Mutiny Report.pptssuser319dad

SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella

Recently uploaded (20)

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...

miladyskindiseases-200705210221 2.!!pptx

Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...

The Ten Facts About People With Autism Presentation

Event 4 Introduction to Open Source.pptx

OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...

OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...

OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...

James Joyce, Dubliners and Ulysses.ppt !

PHYSICS PROJECT BY MSC - NANOTECHNOLOGY

Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...

Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...

Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf

NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)

Genshin Impact PPT Template by EaTemp.pptx

Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist

Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics

Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...

Philippine History cavite Mutiny Report.ppt

SBFT Tool Competition 2024 -- Python Test Case Generation Track

Twitter Sub-event Detection Project Presentation

1. Project : Sub-event detection on Social Media Codebase: https://github.com/pallavshah/TwitterSubeventDetector Pallav Shah Akshay Joshi Rajat Bhardwaj Ravneet Singh Kathuria

2. The Project • Make a timeline/summary of events from a corpus of tweets commenting on the event. • The corpus consists of tweets from a specific domain talking about a single major event. • The objective of the project is to extract sub-events within the event. • Summary will be short description about the sub event.

3. Our Approach We followed a two-step approach: • Sub-event Detection: The first step is to identify if and when a sub- event has occurred and if it has, what tweets comprise the sub-event • Tweet Selection: The second step is to choose a representative tweet that describes the sub-event appropriately. The aggregation of these two processes will in turn provide a set of tweets as a summary of the event.

4. Part1: Detecting the sub- event Sub-event detection is done by finding the distance measure between different tweets of same event. • Dictionary of words: The parsed data is used to create a dictionary which stores relevant words and its count in the corpus. • Vector for each tweet: The generated dictionary and a second parse over the parsed data are used to get a single sparse vector corresponding to each tweet. This vector contains the id and count of each word present in the tweet.

5. Part 1: Detecting the sub- event(continued) • The sub-event detector module:  The module uses LSHash Library of Python to find similarity distance between various tweets. Each tweet is analyzed and compared with the existing group of similar tweets. If the tweet matches to any of the group with a high threshold, the tweet is assumed to belong to that group and added to it. Otherwise, a new group is created with that tweet as the representative tweet of the group. In the end all the tweets as thus partitioned into groups (or clusters) representing different sub-events.

6. Part 2: Summarization of Sub- event • Term Frequency Inverse Document Frequency: A statistical weighting technique that assigns each term within a document a weight that reflects the term’s saliency within the document. The TF-IDF value is composed of two primary parts. The term frequency component (TF) assigns more weight to words that occur frequently within a document because important words are often repeated. The inverse document frequency component (IDF) compensates for the fact that some words such as common stop words are frequent. Normalization of tweets: The tweets are normalized to prevent bias towards larger tweets.

7. System Block Diagram

8. Technologies Used We have used the following python libraries: • LSHash: https://pypi.python.org/pypi/lshash/0.0.3dev • Gensim: http://radimrehurek.com/gensim/ Dataset We used Snow dataset containing tweets of 2012 US General Elections.

9. Experiments and Results • Tested on the 2012 US General Elections tweets data set from SNOW 2014. • Results bore around 60% accuracy as compared to manual evaluation of the tweets data.

Twitter Sub-event Detection Project Presentation

Recommended

Recommended

More Related Content

Similar to Twitter Sub-event Detection Project Presentation

Similar to Twitter Sub-event Detection Project Presentation (20)

Recently uploaded

Recently uploaded (20)

Twitter Sub-event Detection Project Presentation