SlideShare a Scribd company logo

Lexicon Mining for Semiotic Squares: Exploding Binary Classification

A common task in natural language processing is category-specific lexicon mining, or identifying words and phrases that are associated with the presence or absence of a specific category. For example, lists of words associated with positive (vs. negative) product reviews may be automatically discovered from labeled corpora. In the 1960s, the semanticists A. J. Greimas and F. Rastier developed a framework for turning two opposing categories into a network of 10 semantic classes. This talk introduces an algorithm for discovering lexicons associated with those semantic classes given a corpus of categorized documents. This algorithm is implemented as part of Scattertext, and the output can be viewed in an interactive browser-based visualization.

1 of 43
Download to read offline
Lexicon Mining for Semiotic Squares:
Exploding Binary Classification
Jason S. Kessler*
Data Day Texas
January 27, 2018
@jasonkesslerhttp://bit.ly/LexiconMining *No, not that Jason Kessler.
@jasonkessler
By the end of the talk
You’ll know how to
programmatically create
semiotic squares like
this one.
http://bit.ly/LexiconMining
Source:
http://www.squadrati.com/2014/03/31/
quadrato-semiotico-dei-wine-lovers/
@jasonkessler
http://bit.ly/LexiconMining
Lexicon Mining for Semiotic Squares:
Exploding Binary Classification
@jasonkessler
http://bit.ly/LexiconMining
@jasonkessler
Lexicon speculation
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification
using machine learning techniques. EMNLP. 2002.
http://bit.ly/LexiconMining
@jasonkessler
Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification
using machine learning techniques. EMNLP. 2002.
Lexicon mining ≠ lexicon speculation
http://bit.ly/LexiconMining

Recommended

Visualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextVisualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextJason Kessler
 
Natural Language Visualization with Scattertext
Natural Language Visualization with ScattertextNatural Language Visualization with Scattertext
Natural Language Visualization with ScattertextJason Kessler
 
Colleges That Require Essays. Online assignment writing service.
Colleges That Require Essays. Online assignment writing service.Colleges That Require Essays. Online assignment writing service.
Colleges That Require Essays. Online assignment writing service.Tiffany Surratt
 
Writing An Essay Steps Tips - TopWritingService.Com
Writing An Essay Steps  Tips - TopWritingService.ComWriting An Essay Steps  Tips - TopWritingService.Com
Writing An Essay Steps Tips - TopWritingService.ComStephanie Roberts
 
Running head FICTION ESSAYS 1 Compare.docx
Running head FICTION ESSAYS  1  Compare.docxRunning head FICTION ESSAYS  1  Compare.docx
Running head FICTION ESSAYS 1 Compare.docxcowinhelen
 
Who Helped Write The Federalist Papers 07 Alexan
Who Helped Write The Federalist Papers 07 AlexanWho Helped Write The Federalist Papers 07 Alexan
Who Helped Write The Federalist Papers 07 AlexanDenise Halvorsen
 
Compare And Contrast College Essay.pdf
Compare And Contrast College Essay.pdfCompare And Contrast College Essay.pdf
Compare And Contrast College Essay.pdfRenee Spahn
 

More Related Content

More from Jason Kessler

Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorJason Kessler
 
Scattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in LanguageScattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in LanguageJason Kessler
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsJason Kessler
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainJason Kessler
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Jason Kessler
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Jason Kessler
 

More from Jason Kessler (7)

Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with Twitter
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer Behavior
 
Scattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in LanguageScattertext: A Tool for Visualizing Differences in Language
Scattertext: A Tool for Visualizing Differences in Language
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
 
The 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive DomainThe 2010 JDPA Sentiment Corpus for the Automotive Domain
The 2010 JDPA Sentiment Corpus for the Automotive Domain
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
 

Recently uploaded

Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughsNikolas Markou
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...DrSumathyV
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdfruwanp2000
 
ISO 27701:2022 Data Privacy New Version Presentation
ISO 27701:2022 Data Privacy New Version PresentationISO 27701:2022 Data Privacy New Version Presentation
ISO 27701:2022 Data Privacy New Version Presentationyogaallworks
 
Ratio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxRatio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxSugumarVenkai
 
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFEXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFProject Cubicle
 
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Samuel Chukwuma
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............mahetamanav24
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelTope Osanyintuyi
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?Denodo
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdfAlexia Trejo
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 

Recently uploaded (17)

Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
Artificial Intelligence for Vision:  A walkthrough of recent breakthroughsArtificial Intelligence for Vision:  A walkthrough of recent breakthroughs
Artificial Intelligence for Vision: A walkthrough of recent breakthroughs
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...Introduction to data science.pdf-Definition,types and application of Data Sci...
Introduction to data science.pdf-Definition,types and application of Data Sci...
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
 
ISO 27701:2022 Data Privacy New Version Presentation
ISO 27701:2022 Data Privacy New Version PresentationISO 27701:2022 Data Privacy New Version Presentation
ISO 27701:2022 Data Privacy New Version Presentation
 
Ratio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptxRatio analysis, Formulas, Advantage PPt.pptx
Ratio analysis, Formulas, Advantage PPt.pptx
 
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDFEXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
EXCEL-VLOOKUP-AND-HLOOKUP LECTURE NOTES ALL EXCEL VLOOKUP NOTES PDF
 
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
Cousera Cap Course Datasets containing datasets from a Fictional Fitness Trac...
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
Basics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft ExcelBasics of Creating Graphs / Charts using Microsoft Excel
Basics of Creating Graphs / Charts using Microsoft Excel
 
What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?What you need to know about Generative AI and Data Management?
What you need to know about Generative AI and Data Management?
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Choose your perfect jacket.pdf
Choose your perfect jacket.pdfChoose your perfect jacket.pdf
Choose your perfect jacket.pdf
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 

Lexicon Mining for Semiotic Squares: Exploding Binary Classification

  • 1. Lexicon Mining for Semiotic Squares: Exploding Binary Classification Jason S. Kessler* Data Day Texas January 27, 2018 @jasonkesslerhttp://bit.ly/LexiconMining *No, not that Jason Kessler.
  • 2. @jasonkessler By the end of the talk You’ll know how to programmatically create semiotic squares like this one. http://bit.ly/LexiconMining Source: http://www.squadrati.com/2014/03/31/ quadrato-semiotico-dei-wine-lovers/
  • 4. Lexicon Mining for Semiotic Squares: Exploding Binary Classification @jasonkessler http://bit.ly/LexiconMining
  • 5. @jasonkessler Lexicon speculation Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP. 2002. http://bit.ly/LexiconMining
  • 6. @jasonkessler Bo Pang, Lillian Lee and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP. 2002. Lexicon mining ≠ lexicon speculation http://bit.ly/LexiconMining
  • 7. Lexicon mining • Let’s do a deep dive into lexicon mining • We’ll walk through the notebook linked below: • http://bit.ly/LexiconMining @jasonkessler
  • 9. @jasonkessler R: ggplot2 • Most labels are legible • Labels can overlap points • Dense areas can be tough to read • Not recommended. • Use Scattertext instead, if possible.
  • 10. Burt Monroe, Michael Colaresi and Kevin Quinn. Fightin' words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis. 2008. @jasonkessler Monroe et. al plot • Identifies terms with z-scores > 1.96 • Labels still overlap, only labels a few points
  • 11. Michael Colaresi and Zuhaib Mahmood. Do the robot: Lessons from machine learning to improve conflict forecasting. Journal of Peace Research. 2017. R package available: https://github.com/zsmahmood89/ModelC riticism @jasonkessler
  • 12. In defense of stop words Cindy K. Chung and James W. Pennebaker. Counting Little Words in Big Data: The Psychology of Communities, Culture, and History. EASP. 2012 In times of shared crisis, “we” use increases, while “I” use decreases. I/we: age, social integration I: lying, social position relative hearer, testosterone @jasonkessler
  • 13. Stop words in context: gender Newman, ML; Groom, CJ; Handelman LD, Pennebaker, JW. Gender Differences in Language Use: An Analysis of 14,000 Text Samples. 2008 LIWC Dimension Bold: entirely stopwords Effect Size (Cohen’s d) (>0 F, <0 M) MANOVA p<.001 All Pronouns 0.36 Present tense verbs (walk, is, be) 0.18 Feeling (touch, hold, feel) 0.17 Certaintyns (always, never) 0.14 Word count NS Numbers -0.15 Prepositions -0.17 Words >6 letters -0.24 Swear words -0.22 Articles -0.24 • Performed on a variety of language categories, including speech. • Other studies have found that function words are the best predictors of gender. @jasonkessler
  • 14. Lexicon Mining for Semiotic Squares: Exploding Binary Classification @jasonkessler
  • 15. Greimas, A.J. and Francis Rastier. 1968. “The Interaction of Semiotic Constraints,” Yale French Studies. 41: 86-105. Original Semiotic Square paper: Pelkey, Jamin. 2017. Greimas embodied: How kinesthetic opposition grounds the semiotic square. Semiotica 214(1). 277–305 @jasonkessler
  • 16. Language about movies @jasonkessler The domain of discourse From where we draw concepts to create the semiotic square
  • 17. Language about movies Positive Negative Positive sentimentPositive sentiment Opposition relation These are treated as contrastive concepts @jasonkessler
  • 18. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative Language that’s not negative Entailment relation ¬Negative includes positive sentiment @jasonkessler
  • 19. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative Language that’s not negative @jasonkessler Contradiction relation Negative and ¬Negative are mutually exclusive* *exceptions, e.g., damning with faint praise”
  • 20. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive @jasonkessler Complete the square
  • 21. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Neutral term @jasonkessler
  • 22. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Evaluative Reviews Objective Plot descriptions Complex term @jasonkessler
  • 23. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Positive deixis @jasonkessler
  • 24. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Aspects of poorly- reviewed movies Negative deixis @jasonkessler
  • 25. Language about movies Positive Negative Positive sentimentPositive sentiment ¬Negative ¬Positive Language that’s not negative Language that’s not positive Objective Plot descriptions Evaluative Reviews Aspects of well- reviewed movies Aspects of poorly- reviewed movies Semiotic square @jasonkessler
  • 26. Lexicalizing the Semiotic Square • Task: • Find language associated with each element of the square • Domain of discourse • Corpus of documents pertaining to domain (e.g., movie-related text) • Corpus is divided into three categories Domain of discourse Pos Neg ¬Neg ¬Pos Neutral Term Complex Term Positive Deixis Negative Deixis @jasonkessler
  • 27. Lexicalizing the Semiotic Square: Copora Pos Neg ¬Neg ¬Pos Neutral Term Complex term Positive Deixis Negative Deixis Negative Documents E.g., negative reviews Positive Documents E.g., positive reviews Neutral Documents Plot descriptions @jasonkessler
  • 31. REVIEW PLOT NEGATIVEPOSITIVE Positive corner of semiotic square: - Terms near (Euclidean distance) of blue point - Limit to Quadrant II - Captures positive terms used disproportionately in reviews @jasonkessler
  • 32. PLOT NEGATIVEPOSITIVE Positive + Negative complex term: - Terms near (Euclidean distance) of blue point - Limit to Quadrant II and Quadrant I - Captures terms which are associated with reviews but not highly polar. REVIEW @jasonkessler
  • 34. "The square is a map of logical possibilities. As such, it can be used as a heuristic device, and in fact, attempting to fill it in stimulates the imagination… the theory of the square allows us to see all thinking as a game, with the logical relations as the rules and concepts current in a given language and culture as the pieces." @jasonkessler Lithuanian stamp issued honoring the 100th birthday of Greimas. Lithuanian stamp issued honoring the 100th birthday of Greimas.
  • 36. In defense of stop words Function words reveal traits psychological traits. Person A is tentative, B is stiff, C is easy going. Cindy K. Chung and James W. Pennebaker. The Psychological Functions of Function Words. Social Communication. 2007. @jasonkessler
  • 37. James Clifford. The Predicament of Culture. The Predicament of Culture: Twentieth-Century Ethnography, Literature, and Art. Harvard Univ. Press. 1988. @jasonkessler
  • 38. Language about movies Positive Negative ¬Negative ¬Positive Objective Evaluative Aspects of well- reviewed movies Aspects of poorly-reviewed movies @jasonkessler Lexicalizing the Semiotic Square
  • 40. Lexicalizing the Semiotic Square Pos Neg ¬Neg ¬Pos Neutral Term Complex term Positive Deixis Negative Deixis Negative Documents E.g., negative reviews Positive Documents E.g., positive reviews Lexicons: compare positive and negative corpora @jasonkessler
  • 41. Vitaliy Kaurov. Finding X in Espresso: Adventures in Computational Lexicology. Wolfram Blog. 2017. @jasonkessler
  • 42. Jason Kessler. Using Scattertext and the Python NLP Ecosystem for Text Visualization. PyData. July. 2017 @jasonkessler
  • 43. Josh Katz, Claire Cain Miller And Kathleen A. Flynn. The Words Men and Women Use When They Write About Love. The Upshot. The New York Times. Nov 2017. @jasonkessler