SlideShare a Scribd company logo
T O WA R D S A D I C T I O N A R Y O F T H E F UT UR E
COUNTS, COMPARISONS,
COLLOCATIONS, CONTESTATIONS
DICTIONARY OF THE FUTURE?
SOME OTHER PLACES TO CHECK OUT
• The Google Ngram Viewer helps you understand
trends across a bazillion books that Google has
digitized. It’s an amazing resource:
• So are the Corpus of Historical American English:
http://corpus.byu.edu/coha/ (COHA)
• And the Corpus of Contemporary English:
http://corpus.byu.edu/coca/ (COCA)
TO COHA!
TO COCA!
TAKING CARE WITH COUNTS
• The counts in the last two slides are too small to be
anything more than interesting
• The next slide shows us tracking the collocates of
future
• Collocates are the words that appear near a given
word—one of the chief collocates of salt is pepper,
for example
COUNTS COUNT
DISCUSSIONS, DEMOCRACIES AND
DICTIONARIES
What’s going
on in Urban
Dictionary?
• Identity
• Play
• Politics
KEYWORDS
• What are the words
that are most
contested?
• How do they
change?
• Who controls the
future?
• Liberty vs. Freedom
JACK GRIEVE FINDING WOTY’S
• See also http://idibon.com/quantifying-word-year/
• p.s.—in my
ideal
Dictionary of
the Future, we
understand
the geography
of how a word
is used
MEANING IS IN THE USE
• “For a large class of
cases of the
employment of the
word ‘meaning’—
though not for all—
this way can be
explained in this way:
the meaning of a
word is its use in the
language” —
Wittgenstein,
Philosophical
Investigations
MEANING IN THE USE
• Tumblr moms use
over 4 x’s as many
and
as Twitter peeps
• What are the
collocates?
• Blue: his he him
• Purple: she’s she
• No pink heart option!
• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and
http://idibon.com/emomji-emoji-new-moms-use/
CO-OCCURRENCES MATTER (MOVIE
REVIEW RATINGS AND WORDS)
• The idea here is that if you’re writing a review and use the word wow, you’re being very positive
or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.
• If you’re using however, however, you’re likely to be in the middle of your movie review rating or
travel summary—not at the very positive/negative extremes.
• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and
http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf
FOUR CASE STUDIES
• Wholesomeness: http://idibon.com/wholesome-
branding-campaign-effectiveness/
• Entrepreneur: http://idibon.com/entrepreneurs-
french-spanish-english/
• Because X: http://idibon.com/innovating-
innovation/
• #BlackLivesMatter:
http://idibon.com/blacklivesmatter-events-change-
conversations/
WHOLESOMENESS
HTTP://I DI BON.COM/WHOLESOME -BRA NDI NG -
CA MPA IGN -EFFECTI VENESS /
BRANDS LOVE WORDS
DEEP HISTORY
• The first uses of wholesome tended to be about
‘virtuous teachings’.
• In Wycliffe’s Bible way back in 1382:
The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)
(Modern versions treat wordis as ‘words’, ‘teachings’, or
‘instructions’.)
“WHOLESOME” [NOUN] OVER TIME
HOW ABOUT IN SOCIAL MEDIA?
• You have to deal with spam (11% of data in this
case; another 36% of data is “Wholesome Radio”,
which is probably irrelevant)
• In 2014 tweets:
• Food: 23% (but mostly not about Honey Maid)
• Humans: 23% (and how they can/should live; church-
related mentions are prominent)
• Entertainment: 13% (movies, TV)
• Now let’s compare this to 2011 tweet uses:
• Humans: 32%
• Entertainment: 12%
• Food: 9%
WORDS ARE CONTESTED
MORE ON CONTESTED WORDS
• In the next slide, you’ll see an image from Monroe
et al (2008)
• This is work that takes the basic thing we know:
Republicans and Democrats speak about the same
issue differently.
• In the next slide, they are showing methods that
can pull about how the parties speak about
abortion when they take the floor.
• The words at the top are the Democratic party
words, the ones at the bottom are the Republican
party words.
• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf
ENTREPRENEUR
HTTP ://I DI BON.COM/ENTREPRENEURS -FRENCH -SPA NI SH-
ENGLI SH/
ENTREPRENEUR IN ENGLISH, FRENCH,
SPANISH
• Tycoon, mogul, industrialist
• A flavor of ‘ill-gotten gains’
• Entrepreuneur doesn’t seem to have this—in English right now
• Collocates have to do with:
• Advice
• Success
• Investors
• Marketing
• Social (media/services/topics/techniques)
• Failure (especially fear-of)
• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,
Etsy)
• The people using entrepreneur identify themselves as
• Authors, speakers, writers, bloggers, strategists, (life) coaches,
consultants, moms, wives, husbands, fathers, food-lovers, music-lovers
KEY: GET COMPARISON SETS
Group/Context
A
Group/Context
B
INTERCONNECTED AXES OF
DIFFERENCE
• Genre (State of the Unions vs. Reddit comments)
• Time (1940s vs. the last ten years)
• Geography (hella vs. wicked)
• Traditional demographics (age, gender, education)
• Personal identity/style (nerd, goth, bro, mom)
BECAUSE X
HTTP://IDIBON.COM/INNOVATING -INNOVATION/
INNOVATIONS AND THEIR
COMMUNITIES
• Because X’ers
disporportionately like:
• YouTube
• Tumblr
• One Direction (especially
Harry)
• Justin Bieber
• Ariana Grande
• “bands”
• pizza
• sex
• cats
• books
• They are decidedly less likely
to talk about
• software
• basketball
• NASCAR
• business
• words associated with African-
American Vernacular English
THEXINBECAUSEX
Part of speech Word counts ≥ 50
Noun (people, spoilers) 32.02%
Compressed clause
(ilysm)
21.78%
Adjective (ugly, tired) 16.04%
Interjection (sweg, omg) 14.71%
Agreement (yeah, no) 12.97%
Pronoun (you, me) 2.45%
PART OF SPEECH TAGGERS ARE GOOD
• There’s even a pretty good one for Twitter POS
INNOVATIONS CLUMP
#BLACKLIVESMATTER
HTTP://I DI BON.COM/BLA CKLI VESMA TTER -EVENTS -
CHA NGE -CONVERSA TI ONS /
TOPIC MODELING
• In the previous sections, I’ve been noting what you can
do when you have two or more comparison sets
• How is wholesome used in time x vs. time y vs. time z
• What are the differences between English speakers talking
about entrepreneurship vs. French speakers and Spanish
speakers?
• How are people who use the innovative Because X
construction different than people who don’t use it?
• In this section, we talk about topic modeling, which is a
way to automatically identify clusters within a data set,
even if you don’t have a comparison set.
• We’ll use this to explore conversations around
#blacklivesmatter, but we’ll also see how these
conversations shift before/after a particular moment in
time
TIME MATTERS
TOPICS (EVEN WHEN YOU DON’T HAVE
AN A PRIORI COMPARISON SET)
UNKNOWN UNKNOWNS
• In general, topic modeling is a way of addressing
the limits of our knowledge. If you’re asking a
question about data, you probably know
something about the data going in.
• But what we hear from people is that they are keenly aware
that they don’t know what they don’t know.
• Topic modeling is meant to help that.
• In the next slides, another use of topic modeling:
identifying the themes of Martin Luther King Jr.’s
major speeches and sermons
• Topic modeling Dr.
King’s major
speeches and
sermons gets
these topics
• Which change
over time
• See also
http://idibon.com/
topic-detection-
mlk/
Counts, comparisons, collocations, contestations: Towards a dictionary of the future

More Related Content

Viewers also liked

Collocations
CollocationsCollocations
Collocations
Mercè Ballabriga
 
Verb collocations
Verb collocationsVerb collocations
Verb collocations
AshleyBest
 
Verb noun collocations related
Verb noun collocations relatedVerb noun collocations related
Verb noun collocations related
Anabel Milagros Montes Miranda
 
Collocation by mahmoud abu qarmoul
Collocation by mahmoud abu qarmoulCollocation by mahmoud abu qarmoul
Collocation by mahmoud abu qarmoul
Mahmoud Qarmoul
 
Why and how to teach collocations
Why and how to teach collocationsWhy and how to teach collocations
Why and how to teach collocationsYicel Cermeño
 
Collocations 01
Collocations 01Collocations 01
Collocations 01dafarnum
 
Collocation
CollocationCollocation
Collocation
Buhsra
 
verb noun collocations
verb noun collocationsverb noun collocations
verb noun collocations
Tara Lockhart
 
Make / Do Collocations
Make / Do CollocationsMake / Do Collocations
Make / Do Collocations
antdela
 
Collocations
CollocationsCollocations
Collocations
Leticia Portugal
 
Collocation in use
Collocation in useCollocation in use
Collocation in use
Ahmad Zakki Mualana
 
Verb noun collocations
Verb noun collocationsVerb noun collocations
Verb noun collocations
Sandy Millin
 
Semantic Fild and collocation
Semantic Fild and collocationSemantic Fild and collocation
Semantic Fild and collocationAyi Yulianty
 
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
Ben Worthington
 

Viewers also liked (19)

Basic collocations in English.
Basic collocations in English.Basic collocations in English.
Basic collocations in English.
 
Collocations
CollocationsCollocations
Collocations
 
Collocations and idioms
Collocations and  idiomsCollocations and  idioms
Collocations and idioms
 
Verb collocations
Verb collocationsVerb collocations
Verb collocations
 
Verb noun collocations related
Verb noun collocations relatedVerb noun collocations related
Verb noun collocations related
 
Collocation by mahmoud abu qarmoul
Collocation by mahmoud abu qarmoulCollocation by mahmoud abu qarmoul
Collocation by mahmoud abu qarmoul
 
Collocations life events
Collocations life eventsCollocations life events
Collocations life events
 
Why and how to teach collocations
Why and how to teach collocationsWhy and how to teach collocations
Why and how to teach collocations
 
Collocations 01
Collocations 01Collocations 01
Collocations 01
 
Collocation
CollocationCollocation
Collocation
 
verb noun collocations
verb noun collocationsverb noun collocations
verb noun collocations
 
Make / Do Collocations
Make / Do CollocationsMake / Do Collocations
Make / Do Collocations
 
Collocations
CollocationsCollocations
Collocations
 
Collocation in use
Collocation in useCollocation in use
Collocation in use
 
Verb noun collocations
Verb noun collocationsVerb noun collocations
Verb noun collocations
 
Collocations pp
Collocations ppCollocations pp
Collocations pp
 
Semantic Fild and collocation
Semantic Fild and collocationSemantic Fild and collocation
Semantic Fild and collocation
 
Types of collocations
Types of collocationsTypes of collocations
Types of collocations
 
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
How to learn IELTS Vocabulary (Collocations and Topic Specific Vocabulary)
 

Similar to Counts, comparisons, collocations, contestations: Towards a dictionary of the future

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
santoshi mangalgi
 
Texas Library Association 2021 - Social Media for Librarians
Texas Library Association 2021 - Social Media for LibrariansTexas Library Association 2021 - Social Media for Librarians
Texas Library Association 2021 - Social Media for Librarians
Caitlin Jeansonne
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting information
Dr. Russell Rodrigo
 
Webinar Slides-Three Knows to Great Writing Nov 4 2014
Webinar Slides-Three Knows to Great Writing Nov 4 2014Webinar Slides-Three Knows to Great Writing Nov 4 2014
Webinar Slides-Three Knows to Great Writing Nov 4 2014
ERAUWebinars
 
IELTS Writing task 2 structuring , organizing and brainstorming
IELTS Writing task 2 structuring , organizing and brainstormingIELTS Writing task 2 structuring , organizing and brainstorming
IELTS Writing task 2 structuring , organizing and brainstorming
Shannon290101
 
Evaluation of music magazine
Evaluation of music magazineEvaluation of music magazine
Evaluation of music magazine
Eve1714
 
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
Optimizely
 
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
ALATechSource
 
Selfish Accessibility: a11y Camp Toronto 2014
Selfish Accessibility: a11y Camp Toronto 2014Selfish Accessibility: a11y Camp Toronto 2014
Selfish Accessibility: a11y Camp Toronto 2014Adrian Roselli
 
Template Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docxTemplate Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docx
rhetttrevannion
 
Bc week 4 powerpoint
Bc week 4 powerpointBc week 4 powerpoint
Bc week 4 powerpoint
Beth Carey
 
Your data is great, but does it work for your users
Your data is great, but does it work for your usersYour data is great, but does it work for your users
Your data is great, but does it work for your users
vickybuser
 
Adapting your message to your audience (continue)
Adapting your message to your audience (continue)Adapting your message to your audience (continue)
Adapting your message to your audience (continue)
Ирина Цозац-Котовская
 
Business communication Course
Business communication CourseBusiness communication Course
Business communication Course
Azhar Hussain
 
Selfish Accessibility: WordCamp Buffalo 2014
Selfish Accessibility: WordCamp Buffalo 2014Selfish Accessibility: WordCamp Buffalo 2014
Selfish Accessibility: WordCamp Buffalo 2014
Adrian Roselli
 
How to Develop Discussion Materials for Public Dialogue
How to Develop Discussion Materials for Public DialogueHow to Develop Discussion Materials for Public Dialogue
How to Develop Discussion Materials for Public Dialogue
Everyday Democracy
 
New Media Slides for First Half of Fall, 2014
New Media Slides for First Half of Fall, 2014New Media Slides for First Half of Fall, 2014
New Media Slides for First Half of Fall, 2014
Todd Van Hoosear
 

Similar to Counts, comparisons, collocations, contestations: Towards a dictionary of the future (20)

Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
Texas Library Association 2021 - Social Media for Librarians
Texas Library Association 2021 - Social Media for LibrariansTexas Library Association 2021 - Social Media for Librarians
Texas Library Association 2021 - Social Media for Librarians
 
LIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting informationLIB300 Week 9 finding, analyzing, and documenting information
LIB300 Week 9 finding, analyzing, and documenting information
 
Webinar Slides-Three Knows to Great Writing Nov 4 2014
Webinar Slides-Three Knows to Great Writing Nov 4 2014Webinar Slides-Three Knows to Great Writing Nov 4 2014
Webinar Slides-Three Knows to Great Writing Nov 4 2014
 
IELTS Writing task 2 structuring , organizing and brainstorming
IELTS Writing task 2 structuring , organizing and brainstormingIELTS Writing task 2 structuring , organizing and brainstorming
IELTS Writing task 2 structuring , organizing and brainstorming
 
Evaluation of music magazine
Evaluation of music magazineEvaluation of music magazine
Evaluation of music magazine
 
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
Test & Learn: How to Leverage Design to Learn & Deliver Results Quickly
 
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
Using Surveys to Improve Your Library: Part 2 (Sept. 2018)
 
Preparing a Speech: Outline
Preparing a Speech: OutlinePreparing a Speech: Outline
Preparing a Speech: Outline
 
Selfish Accessibility: a11y Camp Toronto 2014
Selfish Accessibility: a11y Camp Toronto 2014Selfish Accessibility: a11y Camp Toronto 2014
Selfish Accessibility: a11y Camp Toronto 2014
 
Template Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docxTemplate Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docx
 
ielts-essay_87091.ppt
ielts-essay_87091.pptielts-essay_87091.ppt
ielts-essay_87091.ppt
 
Searching skills
Searching skillsSearching skills
Searching skills
 
Bc week 4 powerpoint
Bc week 4 powerpointBc week 4 powerpoint
Bc week 4 powerpoint
 
Your data is great, but does it work for your users
Your data is great, but does it work for your usersYour data is great, but does it work for your users
Your data is great, but does it work for your users
 
Adapting your message to your audience (continue)
Adapting your message to your audience (continue)Adapting your message to your audience (continue)
Adapting your message to your audience (continue)
 
Business communication Course
Business communication CourseBusiness communication Course
Business communication Course
 
Selfish Accessibility: WordCamp Buffalo 2014
Selfish Accessibility: WordCamp Buffalo 2014Selfish Accessibility: WordCamp Buffalo 2014
Selfish Accessibility: WordCamp Buffalo 2014
 
How to Develop Discussion Materials for Public Dialogue
How to Develop Discussion Materials for Public DialogueHow to Develop Discussion Materials for Public Dialogue
How to Develop Discussion Materials for Public Dialogue
 
New Media Slides for First Half of Fall, 2014
New Media Slides for First Half of Fall, 2014New Media Slides for First Half of Fall, 2014
New Media Slides for First Half of Fall, 2014
 

More from Idibon1

Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Idibon1
 
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Idibon1
 
Conspiracy, complaints, and fraud: The language of reasons
Conspiracy, complaints, and fraud: The language of reasonsConspiracy, complaints, and fraud: The language of reasons
Conspiracy, complaints, and fraud: The language of reasons
Idibon1
 
Ciara Sanker: Personal epistemology and epistemic learning
Ciara Sanker: Personal epistemology and epistemic learningCiara Sanker: Personal epistemology and epistemic learning
Ciara Sanker: Personal epistemology and epistemic learning
Idibon1
 
Suzanne Wertheim: Linguistic Anthropology meets NLP
Suzanne Wertheim: Linguistic Anthropology meets NLPSuzanne Wertheim: Linguistic Anthropology meets NLP
Suzanne Wertheim: Linguistic Anthropology meets NLP
Idibon1
 
Will Monroe: Text to 3D scene generation with lexical grounding
Will Monroe: Text to 3D scene generation with lexical groundingWill Monroe: Text to 3D scene generation with lexical grounding
Will Monroe: Text to 3D scene generation with lexical grounding
Idibon1
 
Gender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methodsGender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methods
Idibon1
 
Pattern recognition and the crowd
Pattern recognition and the crowdPattern recognition and the crowd
Pattern recognition and the crowd
Idibon1
 
Dan Jurafsky: The Language of Food
Dan Jurafsky: The Language of FoodDan Jurafsky: The Language of Food
Dan Jurafsky: The Language of Food
Idibon1
 
Chris Potts: Sentiment analysis in context
Chris Potts: Sentiment analysis in contextChris Potts: Sentiment analysis in context
Chris Potts: Sentiment analysis in context
Idibon1
 

More from Idibon1 (10)

Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
 
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
Understanding Community Needs: Scalable SMS Processing for UNICEF Nigeria and...
 
Conspiracy, complaints, and fraud: The language of reasons
Conspiracy, complaints, and fraud: The language of reasonsConspiracy, complaints, and fraud: The language of reasons
Conspiracy, complaints, and fraud: The language of reasons
 
Ciara Sanker: Personal epistemology and epistemic learning
Ciara Sanker: Personal epistemology and epistemic learningCiara Sanker: Personal epistemology and epistemic learning
Ciara Sanker: Personal epistemology and epistemic learning
 
Suzanne Wertheim: Linguistic Anthropology meets NLP
Suzanne Wertheim: Linguistic Anthropology meets NLPSuzanne Wertheim: Linguistic Anthropology meets NLP
Suzanne Wertheim: Linguistic Anthropology meets NLP
 
Will Monroe: Text to 3D scene generation with lexical grounding
Will Monroe: Text to 3D scene generation with lexical groundingWill Monroe: Text to 3D scene generation with lexical grounding
Will Monroe: Text to 3D scene generation with lexical grounding
 
Gender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methodsGender, language, and Twitter: Social theory and computational methods
Gender, language, and Twitter: Social theory and computational methods
 
Pattern recognition and the crowd
Pattern recognition and the crowdPattern recognition and the crowd
Pattern recognition and the crowd
 
Dan Jurafsky: The Language of Food
Dan Jurafsky: The Language of FoodDan Jurafsky: The Language of Food
Dan Jurafsky: The Language of Food
 
Chris Potts: Sentiment analysis in context
Chris Potts: Sentiment analysis in contextChris Potts: Sentiment analysis in context
Chris Potts: Sentiment analysis in context
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 

Counts, comparisons, collocations, contestations: Towards a dictionary of the future

  • 1. T O WA R D S A D I C T I O N A R Y O F T H E F UT UR E COUNTS, COMPARISONS, COLLOCATIONS, CONTESTATIONS
  • 3.
  • 4.
  • 5. SOME OTHER PLACES TO CHECK OUT • The Google Ngram Viewer helps you understand trends across a bazillion books that Google has digitized. It’s an amazing resource: • So are the Corpus of Historical American English: http://corpus.byu.edu/coha/ (COHA) • And the Corpus of Contemporary English: http://corpus.byu.edu/coca/ (COCA)
  • 8. TAKING CARE WITH COUNTS • The counts in the last two slides are too small to be anything more than interesting • The next slide shows us tracking the collocates of future • Collocates are the words that appear near a given word—one of the chief collocates of salt is pepper, for example
  • 11. What’s going on in Urban Dictionary? • Identity • Play • Politics
  • 12. KEYWORDS • What are the words that are most contested? • How do they change? • Who controls the future? • Liberty vs. Freedom
  • 13. JACK GRIEVE FINDING WOTY’S • See also http://idibon.com/quantifying-word-year/
  • 14. • p.s.—in my ideal Dictionary of the Future, we understand the geography of how a word is used
  • 15. MEANING IS IN THE USE • “For a large class of cases of the employment of the word ‘meaning’— though not for all— this way can be explained in this way: the meaning of a word is its use in the language” — Wittgenstein, Philosophical Investigations
  • 16. MEANING IN THE USE • Tumblr moms use over 4 x’s as many and as Twitter peeps • What are the collocates? • Blue: his he him • Purple: she’s she • No pink heart option! • See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and http://idibon.com/emomji-emoji-new-moms-use/
  • 17. CO-OCCURRENCES MATTER (MOVIE REVIEW RATINGS AND WORDS) • The idea here is that if you’re writing a review and use the word wow, you’re being very positive or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often. • If you’re using however, however, you’re likely to be in the middle of your movie review rating or travel summary—not at the very positive/negative extremes. • See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf
  • 18. FOUR CASE STUDIES • Wholesomeness: http://idibon.com/wholesome- branding-campaign-effectiveness/ • Entrepreneur: http://idibon.com/entrepreneurs- french-spanish-english/ • Because X: http://idibon.com/innovating- innovation/ • #BlackLivesMatter: http://idibon.com/blacklivesmatter-events-change- conversations/
  • 19. WHOLESOMENESS HTTP://I DI BON.COM/WHOLESOME -BRA NDI NG - CA MPA IGN -EFFECTI VENESS /
  • 21. DEEP HISTORY • The first uses of wholesome tended to be about ‘virtuous teachings’. • In Wycliffe’s Bible way back in 1382: The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3) (Modern versions treat wordis as ‘words’, ‘teachings’, or ‘instructions’.)
  • 23. HOW ABOUT IN SOCIAL MEDIA? • You have to deal with spam (11% of data in this case; another 36% of data is “Wholesome Radio”, which is probably irrelevant) • In 2014 tweets: • Food: 23% (but mostly not about Honey Maid) • Humans: 23% (and how they can/should live; church- related mentions are prominent) • Entertainment: 13% (movies, TV) • Now let’s compare this to 2011 tweet uses: • Humans: 32% • Entertainment: 12% • Food: 9%
  • 25. MORE ON CONTESTED WORDS • In the next slide, you’ll see an image from Monroe et al (2008) • This is work that takes the basic thing we know: Republicans and Democrats speak about the same issue differently. • In the next slide, they are showing methods that can pull about how the parties speak about abortion when they take the floor. • The words at the top are the Democratic party words, the ones at the bottom are the Republican party words. • http://languagelog.ldc.upenn.edu/myl/Monroe.pdf
  • 26.
  • 27. ENTREPRENEUR HTTP ://I DI BON.COM/ENTREPRENEURS -FRENCH -SPA NI SH- ENGLI SH/
  • 28. ENTREPRENEUR IN ENGLISH, FRENCH, SPANISH • Tycoon, mogul, industrialist • A flavor of ‘ill-gotten gains’ • Entrepreuneur doesn’t seem to have this—in English right now • Collocates have to do with: • Advice • Success • Investors • Marketing • Social (media/services/topics/techniques) • Failure (especially fear-of) • Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn, Etsy) • The people using entrepreneur identify themselves as • Authors, speakers, writers, bloggers, strategists, (life) coaches, consultants, moms, wives, husbands, fathers, food-lovers, music-lovers
  • 29. KEY: GET COMPARISON SETS Group/Context A Group/Context B
  • 30. INTERCONNECTED AXES OF DIFFERENCE • Genre (State of the Unions vs. Reddit comments) • Time (1940s vs. the last ten years) • Geography (hella vs. wicked) • Traditional demographics (age, gender, education) • Personal identity/style (nerd, goth, bro, mom)
  • 32. INNOVATIONS AND THEIR COMMUNITIES • Because X’ers disporportionately like: • YouTube • Tumblr • One Direction (especially Harry) • Justin Bieber • Ariana Grande • “bands” • pizza • sex • cats • books • They are decidedly less likely to talk about • software • basketball • NASCAR • business • words associated with African- American Vernacular English
  • 34. Part of speech Word counts ≥ 50 Noun (people, spoilers) 32.02% Compressed clause (ilysm) 21.78% Adjective (ugly, tired) 16.04% Interjection (sweg, omg) 14.71% Agreement (yeah, no) 12.97% Pronoun (you, me) 2.45% PART OF SPEECH TAGGERS ARE GOOD • There’s even a pretty good one for Twitter POS
  • 36. #BLACKLIVESMATTER HTTP://I DI BON.COM/BLA CKLI VESMA TTER -EVENTS - CHA NGE -CONVERSA TI ONS /
  • 37. TOPIC MODELING • In the previous sections, I’ve been noting what you can do when you have two or more comparison sets • How is wholesome used in time x vs. time y vs. time z • What are the differences between English speakers talking about entrepreneurship vs. French speakers and Spanish speakers? • How are people who use the innovative Because X construction different than people who don’t use it? • In this section, we talk about topic modeling, which is a way to automatically identify clusters within a data set, even if you don’t have a comparison set. • We’ll use this to explore conversations around #blacklivesmatter, but we’ll also see how these conversations shift before/after a particular moment in time
  • 39. TOPICS (EVEN WHEN YOU DON’T HAVE AN A PRIORI COMPARISON SET)
  • 40. UNKNOWN UNKNOWNS • In general, topic modeling is a way of addressing the limits of our knowledge. If you’re asking a question about data, you probably know something about the data going in. • But what we hear from people is that they are keenly aware that they don’t know what they don’t know. • Topic modeling is meant to help that. • In the next slides, another use of topic modeling: identifying the themes of Martin Luther King Jr.’s major speeches and sermons
  • 41. • Topic modeling Dr. King’s major speeches and sermons gets these topics • Which change over time • See also http://idibon.com/ topic-detection- mlk/