SlideShare a Scribd company logo
Lexicon Mining for Semiotic Squares:
Exploding Binary Classification
Jason S. Kessler*
Data Day Texas
January 27, 2018
@jasonkesslerhttp://bit.ly/LexiconMining *No, not that Jason Kessler.
@jasonkessler
By the end of the talk
You’ll know how to
programmatically create
semiotic squares like
this one.
http://bit.ly/LexiconMining
Source:
http://www.squadrati.com/2014/03/31/
quadrato-semiotico-dei-wine-lovers/
@jasonkessler
http://bit.ly/LexiconMining
Lexicon Mining for Semiotic Squares:
Exploding Binary Classification
@jasonkessler
http://bit.ly/LexiconMining
@jasonkessler
Lexicon speculation
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification
using machine learning techniques. EMNLP. 2002.
http://bit.ly/LexiconMining
@jasonkessler
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification
using machine learning techniques. EMNLP. 2002.
Lexicon mining ≠ lexicon speculation
http://bit.ly/LexiconMining
Lexicon mining
• Let’s do a deep dive into lexicon mining
• We’ll walk through the notebook linked below:
• http://bit.ly/LexiconMining
@jasonkessler
Visualizing Category-Associated
Lexicons using Scatterplots
@jasonkessler
@jasonkessler
R: ggplot2
• Most labels are legible
• Labels can overlap points
• Dense areas can be tough to
read
• Not recommended.
• Use Scattertext instead, if
possible.
Burt Monroe, Michael Colaresi and Kevin Quinn.
Fightin' words: Lexical feature selection and
evaluation for identifying the content of political
conflict. Political Analysis. 2008.
@jasonkessler
Monroe et. al plot
• Identifies terms with z-scores >
1.96
• Labels still overlap, only labels a
few points
Michael Colaresi and Zuhaib Mahmood. Do the robot: Lessons from machine
learning to improve conflict forecasting. Journal of Peace Research. 2017.
R package available:
https://github.com/zsmahmood89/ModelC
riticism
@jasonkessler
In defense of stop words
Cindy K. Chung and James W. Pennebaker. Counting
Little Words in Big Data: The Psychology of
Communities, Culture, and History. EASP. 2012
In times of
shared crisis,
“we” use
increases, while
“I” use decreases.
I/we: age, social
integration
I: lying, social
position relative
hearer,
testosterone
@jasonkessler
Stop words in context: gender
Newman, ML; Groom, CJ; Handelman LD, Pennebaker, JW. Gender
Differences in Language Use: An Analysis of 14,000 Text Samples. 2008
LIWC Dimension
Bold: entirely stopwords
Effect Size (Cohen’s d)
(>0 F, <0 M) MANOVA p<.001
All Pronouns 0.36
Present tense verbs (walk, is, be) 0.18
Feeling (touch, hold, feel) 0.17
Certaintyns (always, never) 0.14
Word count NS
Numbers -0.15
Prepositions -0.17
Words >6 letters -0.24
Swear words -0.22
Articles -0.24
• Performed on a
variety of
language
categories,
including
speech.
• Other studies
have found that
function words
are the best
predictors of
gender.
@jasonkessler
Lexicon Mining for Semiotic Squares:
Exploding Binary Classification
@jasonkessler
Greimas, A.J. and Francis Rastier. 1968. “The Interaction of
Semiotic Constraints,” Yale French Studies. 41: 86-105.
Original Semiotic
Square paper:
Pelkey, Jamin. 2017. Greimas embodied: How kinesthetic opposition
grounds the semiotic square. Semiotica 214(1). 277–305
@jasonkessler
Language about movies
@jasonkessler
The domain of discourse
From where we draw concepts
to create the semiotic square
Language about movies
Positive Negative
Positive sentimentPositive sentiment
Opposition relation
These are treated as
contrastive concepts
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative
Language that’s not
negative
Entailment relation
¬Negative includes
positive sentiment
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative
Language that’s not
negative
@jasonkessler
Contradiction
relation
Negative and
¬Negative are
mutually exclusive*
*exceptions, e.g., damning with faint praise”
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
@jasonkessler
Complete the
square
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
Objective
Plot descriptions Neutral term
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
Evaluative
Reviews
Objective
Plot descriptions
Complex term
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
Objective
Plot descriptions
Evaluative
Reviews
Aspects of well-
reviewed movies
Positive deixis
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
Objective
Plot descriptions
Evaluative
Reviews
Aspects of well-
reviewed movies
Aspects of poorly-
reviewed movies
Negative deixis
@jasonkessler
Language about movies
Positive Negative
Positive sentimentPositive sentiment
¬Negative ¬Positive
Language that’s not
negative
Language that’s not
positive
Objective
Plot descriptions
Evaluative
Reviews
Aspects of well-
reviewed movies
Aspects of poorly-
reviewed movies
Semiotic square
@jasonkessler
Lexicalizing the Semiotic Square
• Task:
• Find language associated with
each element of the square
• Domain of discourse
• Corpus of documents
pertaining to domain (e.g.,
movie-related text)
• Corpus is divided into three
categories
Domain of discourse
Pos Neg
¬Neg ¬Pos
Neutral Term
Complex Term
Positive
Deixis
Negative
Deixis
@jasonkessler
Lexicalizing the Semiotic Square: Copora
Pos Neg
¬Neg ¬Pos
Neutral Term
Complex term
Positive
Deixis
Negative
Deixis
Negative
Documents
E.g., negative reviews
Positive
Documents
E.g., positive reviews
Neutral
Documents
Plot descriptions @jasonkessler
POSITIVE
NEGATIVE
@jasonkessler
REVIEW
PLOT
@jasonkessler
REVIEW
PLOT
NEGATIVEPOSITIVE
@jasonkessler
REVIEW
PLOT
NEGATIVEPOSITIVE
Positive corner of
semiotic square:
- Terms near
(Euclidean distance)
of blue point
- Limit to Quadrant II
- Captures positive
terms used
disproportionately in
reviews
@jasonkessler
PLOT
NEGATIVEPOSITIVE
Positive + Negative complex term:
- Terms near (Euclidean distance) of blue point
- Limit to Quadrant II and Quadrant I
- Captures terms which are associated with reviews but
not highly polar.
REVIEW
@jasonkessler
@jasonkessler
"The square is a map of logical
possibilities. As such, it can be
used as a heuristic device, and in
fact, attempting to fill it in
stimulates the imagination… the
theory of the square allows us to
see all thinking as a game, with
the logical relations as the rules
and concepts current in a given
language and culture as the
pieces."
@jasonkessler
Lithuanian stamp issued honoring the 100th
birthday of Greimas.
Lithuanian stamp issued honoring the 100th
birthday of Greimas.
Thank you
@jasonkessler
In defense of stop words
Function words reveal traits psychological traits. Person A is tentative, B is
stiff, C is easy going.
Cindy K. Chung and James W. Pennebaker. The
Psychological Functions of Function Words. Social
Communication. 2007.
@jasonkessler
James Clifford. The Predicament of Culture. The
Predicament of Culture: Twentieth-Century
Ethnography, Literature, and Art. Harvard Univ.
Press. 1988. @jasonkessler
Language about movies
Positive Negative
¬Negative ¬Positive
Objective
Evaluative
Aspects of
well-
reviewed
movies
Aspects of
poorly-reviewed
movies
@jasonkessler
Lexicalizing the Semiotic Square
Lexicalizing the Semiotic Square
Pos Neg
¬Neg ¬Pos
Neutral Term
Complex term
Positive
Deixis
Negative
Deixis
Negative
Documents
E.g., negative reviews
Positive
Documents
E.g., positive reviews
Lexicons: compare positive and negative corpora @jasonkessler
Vitaliy Kaurov. Finding X in Espresso: Adventures in
Computational Lexicology. Wolfram Blog. 2017.
@jasonkessler
Jason Kessler. Using Scattertext and the Python NLP
Ecosystem for Text Visualization. PyData. July. 2017 @jasonkessler
Josh Katz, Claire Cain Miller And Kathleen A. Flynn. The
Words Men and Women Use When They Write About
Love. The Upshot. The New York Times. Nov 2017.
@jasonkessler

More Related Content

More from Jason Kessler

Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer Behavior
Jason Kessler
 
Scattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in LanguageScattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in Language
Jason Kessler
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
Jason Kessler
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainJason Kessler
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Jason Kessler
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Jason Kessler
 

More from Jason Kessler (7)

Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with Twitter
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer Behavior
 
Scattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in LanguageScattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in Language
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive Domain
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 

Lexicon Mining for Semiotic Squares: Exploding Binary Classification

  • 1. Lexicon Mining for Semiotic Squares: Exploding Binary Classification Jason S. Kessler* Data Day Texas January 27, 2018 @jasonkesslerhttp://bit.ly/LexiconMining *No, not that Jason Kessler.
  • 2. @jasonkessler By the end of the talk You’ll know how to programmatically create semiotic squares like this one. http://bit.ly/LexiconMining Source: http://www.squadrati.com/2014/03/31/ quadrato-semiotico-dei-wine-lovers/
  • 4. Lexicon Mining for Semiotic Squares: Exploding Binary Classification @jasonkessler http://bit.ly/LexiconMining
  • 5. @jasonkessler Lexicon speculation Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP. 2002. http://bit.ly/LexiconMining
  • 6. @jasonkessler Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP. 2002. Lexicon mining ≠ lexicon speculation http://bit.ly/LexiconMining
  • 7. Lexicon mining • Let’s do a deep dive into lexicon mining • We’ll walk through the notebook linked below: • http://bit.ly/LexiconMining @jasonkessler
  • 9. @jasonkessler R: ggplot2 • Most labels are legible • Labels can overlap points • Dense areas can be tough to read • Not recommended. • Use Scattertext instead, if possible.
  • 10. Burt Monroe, Michael Colaresi and Kevin Quinn. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis. 2008. @jasonkessler Monroe et. al plot • Identifies terms with z-scores > 1.96 • Labels still overlap, only labels a few points
  • 11. Michael Colaresi and Zuhaib Mahmood. Do the robot: Lessons from machine learning to improve conflict forecasting. Journal of Peace Research. 2017. R package available: https://github.com/zsmahmood89/ModelC riticism @jasonkessler
  • 12. In defense of stop words Cindy K. Chung and James W. Pennebaker. Counting Little Words in Big Data: The Psychology of Communities, Culture, and History. EASP. 2012 In times of shared crisis, “we” use increases, while “I” use decreases. I/we: age, social integration I: lying, social position relative hearer, testosterone @jasonkessler
  • 13. Stop words in context: gender Newman, ML; Groom, CJ; Handelman LD, Pennebaker, JW. Gender Differences in Language Use: An Analysis of 14,000 Text Samples. 2008 LIWC Dimension Bold: entirely stopwords Effect Size (Cohen’s d) (>0 F, <0 M) MANOVA p<.001 All Pronouns 0.36 Present tense verbs (walk, is, be) 0.18 Feeling (touch, hold, feel) 0.17 Certaintyns (always, never) 0.14 Word count NS Numbers -0.15 Prepositions -0.17 Words >6 letters -0.24 Swear words -0.22 Articles -0.24 • Performed on a variety of language categories, including speech. • Other studies have found that function words are the best predictors of gender. @jasonkessler
  • 14. Lexicon Mining for Semiotic Squares: Exploding Binary Classification @jasonkessler
  • 15. Greimas, A.J. and Francis Rastier. 1968. “The Interaction of Semiotic Constraints,” Yale French Studies. 41: 86-105. Original Semiotic Square paper: Pelkey, Jamin. 2017. Greimas embodied: How kinesthetic opposition grounds the semiotic square. Semiotica 214(1). 277–305 @jasonkessler
  • 16. Language about movies @jasonkessler The domain of discourse From where we draw concepts to create the semiotic square
  • 17. Language about movies Positive Negative Positive sentimentPositive sentiment Opposition relation These are treated as contrastive concepts @jasonkessler
  • 18. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative Language that’s not negative Entailment relation ¬Negative includes positive sentiment @jasonkessler
  • 19. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative Language that’s not negative @jasonkessler Contradiction relation Negative and ¬Negative are mutually exclusive* *exceptions, e.g., damning with faint praise”
  • 20. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive @jasonkessler Complete the square
  • 21. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Neutral term @jasonkessler
  • 22. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Evaluative Reviews Objective Plot descriptions Complex term @jasonkessler
  • 23. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Positive deixis @jasonkessler
  • 24. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Aspects of poorly- reviewed movies Negative deixis @jasonkessler
  • 25. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Aspects of poorly- reviewed movies Semiotic square @jasonkessler
  • 26. Lexicalizing the Semiotic Square • Task: • Find language associated with each element of the square • Domain of discourse • Corpus of documents pertaining to domain (e.g., movie-related text) • Corpus is divided into three categories Domain of discourse Pos Neg ¬Neg ¬Pos Neutral Term Complex Term Positive Deixis Negative Deixis @jasonkessler
  • 27. Lexicalizing the Semiotic Square: Copora Pos Neg ¬Neg ¬Pos Neutral Term Complex term Positive Deixis Negative Deixis Negative Documents E.g., negative reviews Positive Documents E.g., positive reviews Neutral Documents Plot descriptions @jasonkessler
  • 31. REVIEW PLOT NEGATIVEPOSITIVE Positive corner of semiotic square: - Terms near (Euclidean distance) of blue point - Limit to Quadrant II - Captures positive terms used disproportionately in reviews @jasonkessler
  • 32. PLOT NEGATIVEPOSITIVE Positive + Negative complex term: - Terms near (Euclidean distance) of blue point - Limit to Quadrant II and Quadrant I - Captures terms which are associated with reviews but not highly polar. REVIEW @jasonkessler
  • 34. "The square is a map of logical possibilities. As such, it can be used as a heuristic device, and in fact, attempting to fill it in stimulates the imagination… the theory of the square allows us to see all thinking as a game, with the logical relations as the rules and concepts current in a given language and culture as the pieces." @jasonkessler Lithuanian stamp issued honoring the 100th birthday of Greimas. Lithuanian stamp issued honoring the 100th birthday of Greimas.
  • 36. In defense of stop words Function words reveal traits psychological traits. Person A is tentative, B is stiff, C is easy going. Cindy K. Chung and James W. Pennebaker. The Psychological Functions of Function Words. Social Communication. 2007. @jasonkessler
  • 37. James Clifford. The Predicament of Culture. The Predicament of Culture: Twentieth-Century Ethnography, Literature, and Art. Harvard Univ. Press. 1988. @jasonkessler
  • 38. Language about movies Positive Negative ¬Negative ¬Positive Objective Evaluative Aspects of well- reviewed movies Aspects of poorly-reviewed movies @jasonkessler Lexicalizing the Semiotic Square
  • 39.
  • 40. Lexicalizing the Semiotic Square Pos Neg ¬Neg ¬Pos Neutral Term Complex term Positive Deixis Negative Deixis Negative Documents E.g., negative reviews Positive Documents E.g., positive reviews Lexicons: compare positive and negative corpora @jasonkessler
  • 41. Vitaliy Kaurov. Finding X in Espresso: Adventures in Computational Lexicology. Wolfram Blog. 2017. @jasonkessler
  • 42. Jason Kessler. Using Scattertext and the Python NLP Ecosystem for Text Visualization. PyData. July. 2017 @jasonkessler
  • 43. Josh Katz, Claire Cain Miller And Kathleen A. Flynn. The Words Men and Women Use When They Write About Love. The Upshot. The New York Times. Nov 2017. @jasonkessler