What are "lexical resources" that can go into defining words and phrases? Visualizations and resources for studying language. (Presentation given at Dictionary.com)
5. SOME OTHER PLACES TO CHECK OUT
• The Google Ngram Viewer helps you understand
trends across a bazillion books that Google has
digitized. It’s an amazing resource:
• So are the Corpus of Historical American English:
http://corpus.byu.edu/coha/ (COHA)
• And the Corpus of Contemporary English:
http://corpus.byu.edu/coca/ (COCA)
8. TAKING CARE WITH COUNTS
• The counts in the last two slides are too small to be
anything more than interesting
• The next slide shows us tracking the collocates of
future
• Collocates are the words that appear near a given
word—one of the chief collocates of salt is pepper,
for example
15. MEANING IS IN THE USE
• “For a large class of
cases of the
employment of the
word ‘meaning’—
though not for all—
this way can be
explained in this way:
the meaning of a
word is its use in the
language” —
Wittgenstein,
Philosophical
Investigations
16. MEANING IN THE USE
• Tumblr moms use
over 4 x’s as many
and
as Twitter peeps
• What are the
collocates?
• Blue: his he him
• Purple: she’s she
• No pink heart option!
• See also http://www.washingtonpost.com/sf/opinions/2015/02/12/why-moms-love-emoji/ and
http://idibon.com/emomji-emoji-new-moms-use/
17. CO-OCCURRENCES MATTER (MOVIE
REVIEW RATINGS AND WORDS)
• The idea here is that if you’re writing a review and use the word wow, you’re being very positive
or very negative. You don’t say Wow, I have a balanced and neutral opinion on this very often.
• If you’re using however, however, you’re likely to be in the middle of your movie review rating or
travel summary—not at the very positive/negative extremes.
• See also http://web.stanford.edu/~cgpotts/manuscripts/potts-schwarz-exclamatives08.pdf and
http://web.stanford.edu/~cgpotts/papers/constant-davis-potts-schwarz-expressives.pdf
18. FOUR CASE STUDIES
• Wholesomeness: http://idibon.com/wholesome-
branding-campaign-effectiveness/
• Entrepreneur: http://idibon.com/entrepreneurs-
french-spanish-english/
• Because X: http://idibon.com/innovating-
innovation/
• #BlackLivesMatter:
http://idibon.com/blacklivesmatter-events-change-
conversations/
21. DEEP HISTORY
• The first uses of wholesome tended to be about
‘virtuous teachings’.
• In Wycliffe’s Bible way back in 1382:
The..holsum wordis of oure Lord Jhesu Crist. (1 Timothy 6:3)
(Modern versions treat wordis as ‘words’, ‘teachings’, or
‘instructions’.)
23. HOW ABOUT IN SOCIAL MEDIA?
• You have to deal with spam (11% of data in this
case; another 36% of data is “Wholesome Radio”,
which is probably irrelevant)
• In 2014 tweets:
• Food: 23% (but mostly not about Honey Maid)
• Humans: 23% (and how they can/should live; church-
related mentions are prominent)
• Entertainment: 13% (movies, TV)
• Now let’s compare this to 2011 tweet uses:
• Humans: 32%
• Entertainment: 12%
• Food: 9%
25. MORE ON CONTESTED WORDS
• In the next slide, you’ll see an image from Monroe
et al (2008)
• This is work that takes the basic thing we know:
Republicans and Democrats speak about the same
issue differently.
• In the next slide, they are showing methods that
can pull about how the parties speak about
abortion when they take the floor.
• The words at the top are the Democratic party
words, the ones at the bottom are the Republican
party words.
• http://languagelog.ldc.upenn.edu/myl/Monroe.pdf
28. ENTREPRENEUR IN ENGLISH, FRENCH,
SPANISH
• Tycoon, mogul, industrialist
• A flavor of ‘ill-gotten gains’
• Entrepreuneur doesn’t seem to have this—in English right now
• Collocates have to do with:
• Advice
• Success
• Investors
• Marketing
• Social (media/services/topics/techniques)
• Failure (especially fear-of)
• Lots of named entities (SXSW, Dubai, #KSA, Twitter, Google, LinkedIn,
Etsy)
• The people using entrepreneur identify themselves as
• Authors, speakers, writers, bloggers, strategists, (life) coaches,
consultants, moms, wives, husbands, fathers, food-lovers, music-lovers
30. INTERCONNECTED AXES OF
DIFFERENCE
• Genre (State of the Unions vs. Reddit comments)
• Time (1940s vs. the last ten years)
• Geography (hella vs. wicked)
• Traditional demographics (age, gender, education)
• Personal identity/style (nerd, goth, bro, mom)
32. INNOVATIONS AND THEIR
COMMUNITIES
• Because X’ers
disporportionately like:
• YouTube
• Tumblr
• One Direction (especially
Harry)
• Justin Bieber
• Ariana Grande
• “bands”
• pizza
• sex
• cats
• books
• They are decidedly less likely
to talk about
• software
• basketball
• NASCAR
• business
• words associated with African-
American Vernacular English
34. Part of speech Word counts ≥ 50
Noun (people, spoilers) 32.02%
Compressed clause
(ilysm)
21.78%
Adjective (ugly, tired) 16.04%
Interjection (sweg, omg) 14.71%
Agreement (yeah, no) 12.97%
Pronoun (you, me) 2.45%
PART OF SPEECH TAGGERS ARE GOOD
• There’s even a pretty good one for Twitter POS
37. TOPIC MODELING
• In the previous sections, I’ve been noting what you can
do when you have two or more comparison sets
• How is wholesome used in time x vs. time y vs. time z
• What are the differences between English speakers talking
about entrepreneurship vs. French speakers and Spanish
speakers?
• How are people who use the innovative Because X
construction different than people who don’t use it?
• In this section, we talk about topic modeling, which is a
way to automatically identify clusters within a data set,
even if you don’t have a comparison set.
• We’ll use this to explore conversations around
#blacklivesmatter, but we’ll also see how these
conversations shift before/after a particular moment in
time
40. UNKNOWN UNKNOWNS
• In general, topic modeling is a way of addressing
the limits of our knowledge. If you’re asking a
question about data, you probably know
something about the data going in.
• But what we hear from people is that they are keenly aware
that they don’t know what they don’t know.
• Topic modeling is meant to help that.
• In the next slides, another use of topic modeling:
identifying the themes of Martin Luther King Jr.’s
major speeches and sermons
41. • Topic modeling Dr.
King’s major
speeches and
sermons gets
these topics
• Which change
over time
• See also
http://idibon.com/
topic-detection-
mlk/