SlideShare a Scribd company logo
1 of 28
Sarcasm Detection on Twitter
May 2016
Hao Lyu, MSIS Student
Guided by Dr. Byron Wallace
17/7/2016
Content
1. Introduction
2. Data
3. Feature Models(machine learning)
4. Experimental settings
5. Result and discuss
7/7/2016
2
Why social media?
Mine and analyze data in blogs, postings, tweets
can:
• Support marketing and customer service activities
• Help decision making
• Enhance the products and services
• Improve the competitive advantage of companies
Twitter is one of the most important social media
resources.
Support different types of data: text, pictures, videos
7/7/2016
3
Sarcasm poses problems for
algorithms in U.S. election 2016
7/7/2016
4
In the race for the White House in 2016, election
campaigns rely on social media analysis to help
them tailor advertising and other outreach to
particular groups of voters.
Average follower growth since
Jan 26 --- Feb 26
1. @realDonaldTrump 20,900
2. @BernieSanders 10,400
3. @HillaryClinton 10,300
4. @MarcoRubio 5,320
5. @TedCruz 3,950
6. @RealBenCarson 1,870
7. @JohnKasich 1,440
Stay Classy
7/7/2016
5
A predictive analysis firm,
examined Tweets
containing the expression
“classy” and found 72
percent of them used it in a
positive way.
But when used near the
name of Republican
presidential candidate
Donald Trump, around three
quarters of tweets citing
"classy" were negative.
What is Sarcasm on Twitter
7/7/2016
6
A sarcastic tweet. The speaker is clearly not
welcoming allergy season back.
Lexical clues could provide enough knowledge to
detect sarcasm.
What is Sarcasm on Twitter
7/7/2016
7
Another sarcastic tweet. The speaker actually
supports democrat.
This one needs contextual information surrounding
his posting to detect it is whether or not sarcastic.
Sarcasm Detection on Twitter
State-of-the-art method combines lexical and contextual
information to achieve robust classification performance.
In this project, I re-implement of a recent method for automatic
sarcasm detection due to Bamman and Smith (2015).
I utilize multiple approaches to extract large mount of data and
apply machine learning models to detect sarcastic and non-
sarcastic tweets.
7/7/2016 8
DATA
Bamman dataset: 19534 tweets, around half
sarcastic tweets, while the other half non-sarcastic
tweets. Bamman shares the IDs of those tweets.
Tweets are dispearing with time goes, because
users may quit Twitter, protect their accounts from
viewing by the public or delete tweets. After data
crawling, I finally collected 17926 tweets.
DATA
The labels of tweets are inferred from self-
declaration of sarcasm, e.g. a tweet is marked as
sarcastic if it contains the hashtag #sarcasm or
#sarcastic and non-sarcastic otherwise.
DATA
Historical(past) tweets and profiles of user
DATA
Audience(the user who responded to the target
tweet, or was mentioned in the target tweet)
Original Tweet(the tweet to which the target tweet
responded)
DATA EXTRACTION
Static web crawling
Dynamic web crawling
Twitter Stream API
DATA EXTRACTION
Static web crawling:Scrapes static web pages
and extracts text from the HTML mark
profile
DATA EXTRACTION
Dynamic web crawling: Focus on the data sent from the
Twitter server when I interact with a website, e.g. scroll down
the page to view more tweets from a user
DATA EXTRACTION
Twitter Stream API: Make it efficient to collect
public tweets. Twitter provides an interface to
developers using its API.
Limit: 1% of public tweets
DATA PROCESSING
Remove tweets that are:
• Not English
• Shorter than 3 words
• Retweet
Replace URLs and user mentions
Remove hashtags #sarcastic and #sarcasm in the Sarcastic
tweets
Normalize profile data, e.g.,
timezone data are mapped to different area using Google
geocoder package
Numbers in Twitter are displayed in string, like ’22K’ or ‘2
Million’, and they are converted to numeric type.
FEATURE ENGINEERING
In machine learning and pattern recognition, a feature is an
individual measurable property of a phenomenon being observed.
Similar concept: the explanatory variable used in statistical
techniques such as linear regression
FEATURE ENGINEERING
Tweet Features Author Features
Represent the lexical and grammatical
information of the target tweet.
Using only text of the target tweet
Capture information about the author of
the target tweet.
Using historical tweets and profile
information of the author
Audience Features Response Features
Encode information about the addressee
of the tweet
Using historical tweets, profile information
of the audience, and the communication
between audience and the author
Consider the interaction between the
target tweet and the tweet that it is
responding to.
Using text of the original tweet
TWEET FEATURES
Bag of Words: In this model, a text (such as a sentence or a
document) is represented as the bag (multiset) of its words,
disregarding grammar and even word order but keeping
multiplicity.
“Get in am at work (not) #Work”  1 1 1 1 0 0
“Love my new work #Work”  0 0 1 0 1 1
Stop words are removed.
get am work not love new
Pronunciation features: Twitter users have specific writing styles,
e.g., RT (Retweet), CHK (Check) and IIRC (If I recall correctly).
I count the number of words that only have alphabetic characters
but no vowels, and the words with more than three syllables.
Wow! wtf man? RT @latimes: Gov. Brown signs bills to
raise smoking age to 21, restrict e-cigarettes
2 0
AUTHOR FEATURES
Author historical topics:Historical topic features are inferred
under LDA with 100 topics over all historical tweets.
LDA , short for Latent Dirichlet Allocation, is a generative
statistical model that allows sets of observations to be explained
by unobserved groups that explain why some parts of the data are
similar(Blei, Ng, and Jordan 2003)
Author 1 (tweet01, tweet11… tweetX1)
Author 2 (tweet02, tweet12… tweetX2)
Topic 1, Topic2 ,…, Topic 100
0.3232 0.932 ,…, 0.1522
0.4232 0.3322 ,…, 0.5522
Each topic is defined by multiple words, e.g.,
Topic 1 : basketball, StephCurry, Stadium, fans, awesome,
champion…
AUDIENCE FEATURES
Author/Audience Interactional topics: This feature measures the
similarity of historical topics of the audience and author.
I take the element-wise product of the author and audience's
historical topic distribution. Similar topics will have higher
distribution.
Author historical topic
Audience historical topic
element-wise product 0.05 0.81 ,…, 0.01
Topic 1, Topic2 ,…, Topic 100
0.1 0.9 ,…, 0.1
0.5 0.9 ,…, 0.1
RESPONSE FEATURES
Bag of Words: Here we use the BoW from the original tweet(the
tweet that it is responding to the target tweet)
EXPERIMENTAL SETTING
Data  meaningful features
Machine learning model: Logistic Regression
Tune
set
LR
Model
Optimized
Parameter Train
set
LR
Model
Fit
Test
set
Evalute
Results
69.1%
73.3%
75.7%
75.3%
77.6%
78.3%
7/7/2016
25
Discussion
• Combining lexical information of text and contextual
information can generate the best accuracy in detecting
sarcasm.
• Collecting historical tweets is very expensive in both time
and computing. Not very practical!
• I suggest to use less contextual information of the author,
especially the data that can be collected easily and fast.
E.g., the profile information of the author and the response
features are relatively effective and cost less.
7/7/2016
26
Discussion
• Extract the historical tweets around the target tweet. From
intuition, these surrounding tweets posted in the closer
time could probably emphasize on the similar object more
often.
• Random sampling from the historical tweet cans also both
generate the topic distribution and reduce cost.
7/7/2016
27
Questions?
lyuhao@utexas.edu
5127183100
287/7/2016

More Related Content

What's hot

Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Reference List Citations - APA 6th Edition
Reference List Citations - APA 6th EditionReference List Citations - APA 6th Edition
Reference List Citations - APA 6th EditionJanice Orcutt
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Finding stories by newsgathering and monitoring on social web .pptx
Finding stories by newsgathering and monitoring  on social web .pptxFinding stories by newsgathering and monitoring  on social web .pptx
Finding stories by newsgathering and monitoring on social web .pptxyasminMohamedramadan1
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics CapstoneMacemann
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platformFayan TAO
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Deepak K
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Todd Rutherford
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonHetu Bhavsar
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 
Finding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing TimeFinding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing Timeysuzuki-naist
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 

What's hot (19)

Pydata Taipei 2020
Pydata Taipei 2020Pydata Taipei 2020
Pydata Taipei 2020
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Social media analysis project
Social media analysis projectSocial media analysis project
Social media analysis project
 
Reference List Citations - APA 6th Edition
Reference List Citations - APA 6th EditionReference List Citations - APA 6th Edition
Reference List Citations - APA 6th Edition
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Finding stories by newsgathering and monitoring on social web .pptx
Finding stories by newsgathering and monitoring  on social web .pptxFinding stories by newsgathering and monitoring  on social web .pptx
Finding stories by newsgathering and monitoring on social web .pptx
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Data Analytics Capstone
Data Analytics CapstoneData Analytics Capstone
Data Analytics Capstone
 
Text mining on Twitter information based on R platform
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platform
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.Improving VIVO search through semantic ranking.
Improving VIVO search through semantic ranking.
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Finding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing TimeFinding Missing Tweets using Topic Structure and Browsing Time
Finding Missing Tweets using Topic Structure and Browsing Time
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
Evaluation Datasets for Twitter Sentiment Analysis: A survey and a new datase...
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 

Similar to Hao lyu slides_sarcasm

Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique IJERA Editor
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET Journal
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxAASTHA76
 
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET - Implementation of Twitter Sentimental Analysis According to Hash TagIRJET Journal
 
Accessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptxAccessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptxLadduAnanu
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python37point2
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysisijtsrd
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxjasoninnes20
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxcurwenmichaela
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemShailly Saxena
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsWeiai Wayne Xu
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogsEtico Capital
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter ReserchKim Holmberg
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Serge Beckers
 
How To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdfHow To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdfAmaraLaurent
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysisTaylor Graham
 

Similar to Hao lyu slides_sarcasm (20)

Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
IRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data AnalysisIRJET- Categorization of Geo-Located Tweets for Data Analysis
IRJET- Categorization of Geo-Located Tweets for Data Analysis
 
Twitter introduction
Twitter introductionTwitter introduction
Twitter introduction
 
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docxBrand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
Brand Strategy and Super Bowl Twitter AnalyticsImage Sou.docx
 
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
IRJET - Implementation of Twitter Sentimental Analysis According to Hash Tag
 
Accessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptxAccessing and analysing your own social media data.pptx
Accessing and analysing your own social media data.pptx
 
OSINT using Twitter & Python
OSINT using Twitter & PythonOSINT using Twitter & Python
OSINT using Twitter & Python
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docxBUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
 
Five steps to search and store tweets by keywords
Five steps to search and store tweets by keywordsFive steps to search and store tweets by keywords
Five steps to search and store tweets by keywords
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogs
 
Hashtags & friends
Hashtags & friendsHashtags & friends
Hashtags & friends
 
Conducting Twitter Reserch
Conducting Twitter ReserchConducting Twitter Reserch
Conducting Twitter Reserch
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?Twitter: Social Network Or News Medium?
Twitter: Social Network Or News Medium?
 
How To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdfHow To Extract Data from Twitter.pdf
How To Extract Data from Twitter.pdf
 
GeospatialDataAnalysis
GeospatialDataAnalysisGeospatialDataAnalysis
GeospatialDataAnalysis
 

Recently uploaded

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Hao lyu slides_sarcasm

  • 1. Sarcasm Detection on Twitter May 2016 Hao Lyu, MSIS Student Guided by Dr. Byron Wallace 17/7/2016
  • 2. Content 1. Introduction 2. Data 3. Feature Models(machine learning) 4. Experimental settings 5. Result and discuss 7/7/2016 2
  • 3. Why social media? Mine and analyze data in blogs, postings, tweets can: • Support marketing and customer service activities • Help decision making • Enhance the products and services • Improve the competitive advantage of companies Twitter is one of the most important social media resources. Support different types of data: text, pictures, videos 7/7/2016 3
  • 4. Sarcasm poses problems for algorithms in U.S. election 2016 7/7/2016 4 In the race for the White House in 2016, election campaigns rely on social media analysis to help them tailor advertising and other outreach to particular groups of voters. Average follower growth since Jan 26 --- Feb 26 1. @realDonaldTrump 20,900 2. @BernieSanders 10,400 3. @HillaryClinton 10,300 4. @MarcoRubio 5,320 5. @TedCruz 3,950 6. @RealBenCarson 1,870 7. @JohnKasich 1,440
  • 5. Stay Classy 7/7/2016 5 A predictive analysis firm, examined Tweets containing the expression “classy” and found 72 percent of them used it in a positive way. But when used near the name of Republican presidential candidate Donald Trump, around three quarters of tweets citing "classy" were negative.
  • 6. What is Sarcasm on Twitter 7/7/2016 6 A sarcastic tweet. The speaker is clearly not welcoming allergy season back. Lexical clues could provide enough knowledge to detect sarcasm.
  • 7. What is Sarcasm on Twitter 7/7/2016 7 Another sarcastic tweet. The speaker actually supports democrat. This one needs contextual information surrounding his posting to detect it is whether or not sarcastic.
  • 8. Sarcasm Detection on Twitter State-of-the-art method combines lexical and contextual information to achieve robust classification performance. In this project, I re-implement of a recent method for automatic sarcasm detection due to Bamman and Smith (2015). I utilize multiple approaches to extract large mount of data and apply machine learning models to detect sarcastic and non- sarcastic tweets. 7/7/2016 8
  • 9. DATA Bamman dataset: 19534 tweets, around half sarcastic tweets, while the other half non-sarcastic tweets. Bamman shares the IDs of those tweets. Tweets are dispearing with time goes, because users may quit Twitter, protect their accounts from viewing by the public or delete tweets. After data crawling, I finally collected 17926 tweets.
  • 10. DATA The labels of tweets are inferred from self- declaration of sarcasm, e.g. a tweet is marked as sarcastic if it contains the hashtag #sarcasm or #sarcastic and non-sarcastic otherwise.
  • 12. DATA Audience(the user who responded to the target tweet, or was mentioned in the target tweet) Original Tweet(the tweet to which the target tweet responded)
  • 13. DATA EXTRACTION Static web crawling Dynamic web crawling Twitter Stream API
  • 14. DATA EXTRACTION Static web crawling:Scrapes static web pages and extracts text from the HTML mark profile
  • 15. DATA EXTRACTION Dynamic web crawling: Focus on the data sent from the Twitter server when I interact with a website, e.g. scroll down the page to view more tweets from a user
  • 16. DATA EXTRACTION Twitter Stream API: Make it efficient to collect public tweets. Twitter provides an interface to developers using its API. Limit: 1% of public tweets
  • 17. DATA PROCESSING Remove tweets that are: • Not English • Shorter than 3 words • Retweet Replace URLs and user mentions Remove hashtags #sarcastic and #sarcasm in the Sarcastic tweets Normalize profile data, e.g., timezone data are mapped to different area using Google geocoder package Numbers in Twitter are displayed in string, like ’22K’ or ‘2 Million’, and they are converted to numeric type.
  • 18. FEATURE ENGINEERING In machine learning and pattern recognition, a feature is an individual measurable property of a phenomenon being observed. Similar concept: the explanatory variable used in statistical techniques such as linear regression
  • 19. FEATURE ENGINEERING Tweet Features Author Features Represent the lexical and grammatical information of the target tweet. Using only text of the target tweet Capture information about the author of the target tweet. Using historical tweets and profile information of the author Audience Features Response Features Encode information about the addressee of the tweet Using historical tweets, profile information of the audience, and the communication between audience and the author Consider the interaction between the target tweet and the tweet that it is responding to. Using text of the original tweet
  • 20. TWEET FEATURES Bag of Words: In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. “Get in am at work (not) #Work”  1 1 1 1 0 0 “Love my new work #Work”  0 0 1 0 1 1 Stop words are removed. get am work not love new Pronunciation features: Twitter users have specific writing styles, e.g., RT (Retweet), CHK (Check) and IIRC (If I recall correctly). I count the number of words that only have alphabetic characters but no vowels, and the words with more than three syllables. Wow! wtf man? RT @latimes: Gov. Brown signs bills to raise smoking age to 21, restrict e-cigarettes 2 0
  • 21. AUTHOR FEATURES Author historical topics:Historical topic features are inferred under LDA with 100 topics over all historical tweets. LDA , short for Latent Dirichlet Allocation, is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar(Blei, Ng, and Jordan 2003) Author 1 (tweet01, tweet11… tweetX1) Author 2 (tweet02, tweet12… tweetX2) Topic 1, Topic2 ,…, Topic 100 0.3232 0.932 ,…, 0.1522 0.4232 0.3322 ,…, 0.5522 Each topic is defined by multiple words, e.g., Topic 1 : basketball, StephCurry, Stadium, fans, awesome, champion…
  • 22. AUDIENCE FEATURES Author/Audience Interactional topics: This feature measures the similarity of historical topics of the audience and author. I take the element-wise product of the author and audience's historical topic distribution. Similar topics will have higher distribution. Author historical topic Audience historical topic element-wise product 0.05 0.81 ,…, 0.01 Topic 1, Topic2 ,…, Topic 100 0.1 0.9 ,…, 0.1 0.5 0.9 ,…, 0.1
  • 23. RESPONSE FEATURES Bag of Words: Here we use the BoW from the original tweet(the tweet that it is responding to the target tweet)
  • 24. EXPERIMENTAL SETTING Data  meaningful features Machine learning model: Logistic Regression Tune set LR Model Optimized Parameter Train set LR Model Fit Test set Evalute
  • 26. Discussion • Combining lexical information of text and contextual information can generate the best accuracy in detecting sarcasm. • Collecting historical tweets is very expensive in both time and computing. Not very practical! • I suggest to use less contextual information of the author, especially the data that can be collected easily and fast. E.g., the profile information of the author and the response features are relatively effective and cost less. 7/7/2016 26
  • 27. Discussion • Extract the historical tweets around the target tweet. From intuition, these surrounding tweets posted in the closer time could probably emphasize on the similar object more often. • Random sampling from the historical tweet cans also both generate the topic distribution and reduce cost. 7/7/2016 27