SlideShare a Scribd company logo
Gender, language and Twitter:
Social theory and computational methods
Tyler Schnoebelen (including work with
David Bamman and Jacob Eisenstein)
Tweet this talk!
@Tschnoebelen
Welcome to the slide-u-ment
• Hi, you may want to check out the “Notes”
fields for additional context.
At its most basic
At its most basic
• Assumption 1: Men and women use different
vocabularies
– Hypothesis I: Computational methods can cut through
noise and predict speaker gender based on the words
they use
• Assumption 2: Social networks are typically
“homophilous” (birds of a feather flock together)
– Hypothesis II: Adding the gender make-up of a user’s
social network should get even better prediction
Let’s say we can predict gender
• So what?
• Does it license us to connect words/word
groups to the social category in question?
• This assumes that gender is
– Stable
– The primary driving force
Our actual goal
• Problematize gender prediction as a task
– Define a system where we could just “stop” and
call it good
– But NOT ACTUALLY STOP
• Demonstrate that simple gender binaries
aren’t actually descriptively accurate
• Show ways to combine social theory and
computational methods that expand the
questions on both sides
QUICK LITERATURE REVIEW
“Standard” is a keyword
Typical findings
• Women use standard variables
more often than men.
– In fact, early dialectologists
ignored women completely
because they wanted
“NORMS”—non-mobile, older,
rural male speakers, seen as
preserving the purest regional
(non-standard) forms
• See Chambers and Trudgill
(1980).
– Did they do it for prestige (to
acquire social capital)?
– To avoid losing status?
– Are women actually creating
norms, not following them?
Computational/corpus work
• People are fascinated by gender differences
• In order to get statistical significance, you
have to have enough data where you can
detect a signal
• In the past, this has led researchers to roll up
words into word classes
The most common distinctions
• Men use informative language
– Prepositions (to), attributive adjectives (fat),
higher word lengths (gargantuan)
• Women use involved language
– First and second person pronouns (you), present
tense verbs (goes), contractions (don’t)
• (Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler,
Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in
Biber 1995 and Chafe 1982)
Or “contextuality”
• Men are formal and explicit
– Nouns (floor), adjectives (big), prepositions (to), articles
(the)
• Women are deictic and contextual
– Pronouns (you), verbs (run), adverbs (happily),
interjections (oh!)
• “Contextuality” decreases when an unambiguous
understanding is more important or difficult—when
people are physically or socially farther away
• (Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of
Heylighen and Dewaele 2002)
Are all nouns really the same?
Are all nouns really the same?
And what about…
And what about…
Our approach also lumps
• It’s just at a lower level
– instead of “nouns” or “blog
words”
– we assume all usages of a
unigram are identical
• Lumping itself isn’t a
problem. In fact, you have
to.
– But ideologies are going to
structure your lumpings
and divisions, so watch
out!
OUR WORK
(WITH DAVID BAMMAN AND JACOB EISENSTEIN)
Data
• Public Twitter messages in same-gender and cross-gender
social networks
– Word frequencies (unigrams)
– Gender (induced from first names using the Social Security
Administration data)
• 14,464 Twitter users (56% male)
– Geolocated in the US
– Must use 50 of top 1,000 most frequent words
– Between 4 and 100 ties (at least 2 “mutual @’s” separated by 14
days)
• Women have 58% female friends
• Men have 67% male friends
• 9.2M tweets, Jan-Jun 2011
Twitter has a pretty good swath (Pew)
• Nearly identical usage among women and
men:
– 15% of female internet users are on Twitter
– 14% of male internet users
• High usage among non-Hispanic Blacks (28%)
• Even distribution across income and education
levels
• Higher usage among young adults (26% for
ages 18-29, 4% for ages 65+)
First names are highly gendered
100
97
86
15
0
0
3
14
85
100
0 20 40 60 80 100
Matt
Alex
Chris
Kelly
Sarah
% female
% male
95% of users have a name 85% associated with one gender
Median user name is 99.6% associated with its majority gender
First step: gender prediction
• Logistic regression:
– Will you have a heart attack Y/N?
– Will you vote for X or Y?
– Will your Brazilian Portuguese nouns and modifiers
agree in number?
• Logistic regression is the statistical technique at
the core of variable rule analysis (Tagliamonte
2006)
• But we’re going to reverse the direction for what
sociolinguists typically do
First step: gender prediction
• The relevant linguistic variables aren’t known
beforehand
• So the dependent variable—the thing we are
trying to predict—is author gender
• The independent variables are the 10,000
most frequent lexical items in the tweets
Preventing overfitting
• This involves estimating a lot of parameters.
• Which raises the risk of overfitting: learning
parameter values that perfectly describe the
training data but won’t generalize to new data
Why regularize?
Regularization dampens the effect of an
individual variable (Hastie et al 2009).
A single regularization parameter controls the
tradeoff between perfectly describing the
training data and generalizing to unseen data.
Evaluating accuracy
• We use the typical method of cross-validation.
1. Randomly divide the full dataset into 10 parts.
2. Train on 80% of the data
3. Use 10% of the data to tune the regularization
parameter
4. Now, use the model to predict the other 10%
5. Compare the predictions to what really happened
• Do this 10 times and take the average.
Gender prediction results
• State-of-the-art accuracy: 88.0%
– Lexical features strongly predict gender
– Ignoring syntax (treating tweets as “bags of
words”) does pretty good
Previous literature In our data
Pronouns F F
Emotion terms F F
Family terms F Mixed results
"Blog words" (lol, omg) F F
Conjunctions F F (weakly)
Articles M No results
Numbers M M
Quantifiers M No results
Technology words M M
Prepositions Mixed results F (weakly)
Swear words Mixed results M
Assent Mixed results Mixed results
Negation Mixed results Mixed results
Emoticons Mixed results F
Hesitation markers Mixed results F
Top 500 markers for each gender
At a corpus level, women use more non-dictionary words and men use more
named entities. In a moment we’ll ask how universal this is.
Hand classification of most frequent
10k words (90.0% agreement)
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
Involvement
• Using traditional definitions, it looks as if our
data confirms:
– men as more informational (all those named
entities)
– women as more interactive/involved (pronouns,
emoticons, etc.)
• Note that most of the named entities for the
men are sports figures and teams
Right. These guys are not “involved”.
Clustering without regard to gender
• We apply probabilistic clustering in order to
group authors who are linguistically similar
• Each author is represented as a list of word
counts across the 10,000 words used in the
classification experiment
Clustering! (Hastie et al 2009)
Easy example: 2 clusters “Expectation Maximization”
1. Randomly assign all authors to
one of 20 clusters
2. Calculate the center of the cluster
from the average word counts of
all authors put in it
3. Assign each author to the nearest
cluster, based on the distance
between their word counts and
the average word counts of the
cluster center
4. Keep iterating through this moving
from random clustering to
meaningful clusters
5. Repeat steps 1-4 (25 times)
6. Pick the best
Some definitions
• Style: combinations of linguistic resources
• Cluster: a group of authors who use a
particular style
• Social network: each author has a social
network made up of people who they send
AND receive messages from
• An author’s social network does not have to
be a part of that author’s cluster
Majority female clusters
Size % fem Top words
c14 1,345 89.60%
hubs blogged bloggers giveaway @klout recipe fabric
recipes blogging tweetup
c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl
c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:
c16 200 78.00%
xo blessings -) xoxoxo #music #love #socialmedia slash
:)) xoxo
c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww
c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___-
c4 1,376 63.00%
&& hipster #idol #photo #lessambitiousmovies hipsters
#americanidol #oscars totes #goldenglobes
c9 458 60.00%
wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls
niggas finna
Looks like “women are trying to
destroy the English language”
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
Clusters that are majority female
• At the population level, women use many non-
dictionary words.
• But there are clusters of (mostly) women who
actually use fewer words like lol, nah, haha than men
do
Size % fem Top words
c14 1,345 89.60%
hubs blogged bloggers giveaway @klout recipe fabric
recipes blogging tweetup
c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:
c4 1,376 63.00%
&& hipster #idol #photo #lessambitiousmovies hipsters
#americanidol #oscars totes #goldenglobes
Consider xo
• A lot more women use xo than
men
– 11% of all women
– 2.5% of all men
• But that means that 89% of
women aren’t using it at all.
• People who use xo are three
times more likely to use ttyl (‘talk
to you later’)
– The style is more commonly
adopted by women
– But there’s other stuff going on
here: age, job, etc.
– It’s not clear that gender is even
the most important, it’s just that
we’re starting with gender-colored
glasses
Shit Girls Say
http://www.youtube.com/watch?feature=player_embedded&v=u-yLGIH7W9Y
Meme-splosion!
Group Gender Activity/social role Interactions Geography
Shit Guys Don't Say Out Loud
Shit College Freshmen Say
Shit Girlfriends Say
Shit Asian Dads Say
Shit Redneck Guys Say
Shit Girls Say to Gay Guys Say
Shit Black Girls Say Say
Shit Black Guys Say Say
Shit People Say in LA
Shit White Girls Say…to Black Girls
Shit New Yorkers Say
Shit Frat Guys Say
Shit Whipped Guys Say
Shit Guys Don't Say Say
Shit Asian Girls Say
Shit Tumblr Girls Say
Shit Brides Say
Shit Spanish Girls Say
Shit Asian Moms Say
Shit Vegans Say
Shit Hipsters Say
Shit Cyclists Say
Shit Yogis Say
Shit Skiers Say
Notice
• That gender wasn’t really limited to the
“gender” column
– “Moms” and “dads” are gendered social roles
• And that the words “guys” and “girls” aren’t
really the same as “male” and “female”
– What are the plausible age ranges and social
styles for “guys” and “girls”?
Clusters that are majority male
Size % male Top words
c13 761 89.40%
#nhl #bruins #mlb nhl #knicks qb @darrenrovell inning
boozer jimmer
c10 1,865 85.40%
/cc api ios ui portal developer e3 apple's plugin
developers
c18 623 81.10%
@macmiller niggas flyers cena bosh pacers @wale
bruh melo @fucktyler
c11 432 73.80%
niggas wyd nigga finna shyt lls ctfu #oomf lmaoo
lmaooo
c20 429 72.50%
gop dems senate unions conservative democrats
liberal palin republican republicans
c15 963 65.30%
#photo /cc #fb (@ brewing #sxsw @getglue startup
brewery @foursquare
Looks like “men are Twitter-headed
sailor-swearing accountants”
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
Aggregates generally don’t hold
Top words Notes
c13
#nhl #bruins #mlb nhl #knicks qb
@darrenrovell inning boozer jimmer
Few Taboo/Hashes
Lots of Punc
c10
/cc api ios ui portal developer e3 apple's
plugin developers
Few Taboo/Hashes
Lots of Punc
c18
@macmiller niggas flyers cena bosh
pacers @wale bruh melo @fucktyler
c11
niggas wyd nigga finna shyt lls ctfu #oomf
lmaoo lmaooo
Few Dict words,
Lots of unPron and Pron
c20
gop dems senate unions conservative
democrats liberal palin republican
republicans
Few Taboo/Hashes
Lots of Punc
c15
#photo /cc #fb (@ brewing #sxsw
@getglue startup brewery @foursquare
Few Taboo
Lots of Punc
Small exceptions
• At the population level, men use many named
entities and numbers
• Clusters use these at various rates, but:
– No female-skewed clusters use them *more* than the
male average
– No male-skewed clusters use them *less* than the
female average
• But again, the other 6 generalizations about
gender we might have made at an aggregate
aren’t supported once we get to clusters
Erasure!
• Clusters are highly gendered
• For example, let’s consider
clusters made up of 60% or more
of people of the same gender
– That covers 82.95% of all the
authors
– But what about the 1,242 men
who are part of female-majority
clusters?
– The 1,052 women who are part of
male-majority clusters?
– Are they just noise? Odd-balls? Is
there no structure to what they’re
doing?
– These people are using language to
do identity work, even as they
construct identities at odds with
conventional notions of
masculinity and femininity.
Clusters vs. social networks
• The more skewed a cluster is, the more
skewed the social networks of its members
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
percent male
percentmalefriends
Women with female networks use the
most female markers
1 2 3 4 5 6 7 8 9 10
0.20.40.60.81.0
female authors
percent female social network
femalemarkerproportion
Men with male networks use the most
male markers
1 2 3 4 5 6 7 8 9 10
0.00.20.40.60.8
male authors
percent male social network
malemarkerproportion
Women with male networks use more
male markers (and vice versa)
Women with highly female networks
are easier to classify (and vice versa)
In other words
• The classifier is picking up on the fact that if you
insist upon a gender binary then people with
same-gender networks use language in a more
“gender-coherent” way.
Does social network help prediction?
• 88% accuracy with text alone
– Logistic regression, 10-fold cross-validation
– State-of-the-art accuracy
• Add network information…
– Still 88% accuracy
Once we have 1000 words/author,
network info doesn’t help
Words
Accuracy
0.50.60.70.80.91.0
0 10 100 1,000 10,000 all
words plus social network
words only
0.880
Wait, why not?
• A new feature is only going to improve classification
accuracy if it adds new information.
• There is strong homophily: 63% of the connections are
between same-gender individuals.
• But language and social network can’t mutually
disambiguate because they aren’t independent views
on gender.
• Individuals who use linguistic resources from “the
other gender” consistently have denser social network
connections to the other gender.
– Performance, style, accommodation
• Gender is not an “A or B” kind of thing
If we seek only predictive accuracy…
We’re awesome!
Not so simple
• If we want to understand categories, we
should start with people in interactions.
– Counting is great but we have to watch our bins
and investigate them, too.
Look at words a different way
Not markers…
Not markers…makers
Positioning
Positioning and stance
• “Stance” is usually seen as an
expression of a speaker’s
relationship to their talk and
their interlocutors
– E.g., Kiesling (2009); Du Bois
(2007); Bednarek (2008)
• But “stance” (and “roles”)
seem static
• I’d like something with more
motion and dynamism
Positioning and stance
• “Stance” is usually seen as an
expression of a speaker’s
relationship to their talk and
their interlocutors
– E.g., Kiesling (2009); Du Bois
(2007); Bednarek (2008)
• But “stance” (and “roles”)
seem static
• I’d like something with more
motion and dynamism
• I develop positioning to
connect linguistic forms to
social structures
• (Particularly affect, actually)
Positioning in a social grid
Sister
Daughter
Spinster
Subject
Object
Dentist
Farmer
Father
Positioning in a social grid
• Social structures are
created, maintained,
and changed by
specific interactions
• People enter
interactions already
positioned
• Interactions change
these positions,
people are attentive
to changes
Conventions
• Different linguistic
resources come to be
associated with different
positionings
• Distributions of
experiences are usually
maintained
• The maintenance and
disruption of expectations
has (affective)
consequences
A LITTLE BIT OF LITTLE
CHILDES (MacWhinney, 2000)
• 4,676 transcripts of parent-child interactions
– American English
Observed little Expected little O/E
Mothers-to-boys 4,313 4,158 1.037
Fathers-to-boys 1,516 1,381 1.098
Mothers-to-girls 6,312 5,441 1.160
Fathers-to-girls 230 281 0.819
Girls-to-mothers 1,221 1,533 0.796
Girls-to-fathers 4 3 1.482
Boys-to-mothers 875 1,526 0.573
Boys-to-fathers 117 265 0.441
Gender and little
• Women tend to use little more—multiple corpora show significant
differences
• But this misses the point
Buckeye
OE
CALLHOME
OE
Female 1.170 1.073
Male 0.855 0.725
Add interlocutor gender
CHILDES
Parent-
Child OE
CHILDES
Child-
Parent OE
Buckeye OE
Fisher Am.
Eng. OE
Fisher
Ohioans OE
CALLHOME
OE
Female to
female
1.160 0.796 0.936 1.051 1.160 1.088
Female to
male
1.037 1.482 1.290 0.887 0.771 1.064
Male to
male
1.098 0.441 0.879 1.071 0.830 0.685
Male to
female
0.819 0.573 0.908 0.842 0.836 0.727
Gender and topics
• Some topics are more face-threatening than others.
– Face-threatening topics get less little.
• When topic is held constant, men and women mostly have the
same little usage .
– Regardless of the gender of the person they’re talking to.
• But there are some exceptions, which are connected to issues of
masculinity, femininity, and emotional regulation.
– Some examples:
• Generally, people don’t use little to talk about terrorism. EXCEPT women
speaking to women use little to modify emotions (terrified, scared)
• Generally, people DO use little to talk about fitness. EXCEPT men talking to
men. The men talking to women use little to talk about their pudgy, flabby
bodies. The few men talking to men who use little use it to talk about working
out a little harder or putting on a little more muscle mass.
ICSI meeting corpus (Janin et al., 2003)
• 75 meetings from Berkeley’s International
Computer Science Institute (2000-2002)
– 3-10 participants (avg of 6)
– 17-103 minutes each (usually an hour)
– 72 hours of data
# speakers
(avg age)
Observed
little
Expected
little
O/E
Undergrad 6 (30 yo) 59 34 1.734
Grad 14 (29 yo) 234 223 1.049
Postdoc 1 (not given) 51 75 0.676
Ph.D. 11 (37 yo) 152 228 0.667
Professor 4 (52 yo) 278 213 1.302
Gender, genre, topic, style
• “Different ways of saying things are intended
to signal different ways of being, which
includes different potential things to say.”
(Eckert 2008)
Majority female clusters
Size % fem Top words
c14 1,345 89.60%
hubs blogged bloggers giveaway @klout recipe fabric
recipes blogging tweetup
c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl
c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:
c16 200 78.00%
xo blessings -) xoxoxo #music #love #socialmedia slash
:)) xoxo
c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww
c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___-
c4 1,376 63.00%
&& hipster #idol #photo #lessambitiousmovies hipsters
#americanidol #oscars totes #goldenglobes
c9 458 60.00%
wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls
niggas finna
Clusters that are majority male
Size % male Top words
c13 761 89.40%
#nhl #bruins #mlb nhl #knicks qb @darrenrovell inning
boozer jimmer
c10 1,865 85.40%
/cc api ios ui portal developer e3 apple's plugin
developers
c18 623 81.10%
@macmiller niggas flyers cena bosh pacers @wale
bruh melo @fucktyler
c11 432 73.80%
niggas wyd nigga finna shyt lls ctfu #oomf lmaoo
lmaooo
c20 429 72.50%
gop dems senate unions conservative democrats
liberal palin republican republicans
c15 963 65.30%
#photo /cc #fb (@ brewing #sxsw @getglue startup
brewery @foursquare
Gender is not something people have
It’s something people *do*
And there are a lot of ways to “do” gender.
Computational Judith Butler!
Gender is binary only with blinders
• “My mom doesn’t
say that’s lovely or
omg!...”
– “Nevermind that!”
• Problem: Sliding
from predictive
accuracy to causal
stories
• Realistic finding:
There are lots of
ways to do gender
Big data, big opportunities
• Big data offers us
the opportunity
to let clusters
emerge (and test
them against our
big bins)
• We can show how
language reflects
and creates the
social worlds we
live in
THANKS!

More Related Content

What's hot

Transactional and Interactional Function of Language
Transactional and Interactional Function of LanguageTransactional and Interactional Function of Language
Transactional and Interactional Function of Language
Universitas Sebelas Maret
 
Connected Speech
Connected SpeechConnected Speech
Discourse Analysis and Pragmatics
Discourse Analysis and PragmaticsDiscourse Analysis and Pragmatics
Discourse Analysis and Pragmatics
Mutiara Ayu
 
Knowledge in discourse
Knowledge in discourseKnowledge in discourse
Knowledge in discourse
İpek Karacığan
 
Easy Ways To Teach Pronunciation
Easy Ways To Teach PronunciationEasy Ways To Teach Pronunciation
Easy Ways To Teach Pronunciation
Erin Lowry
 
Teaching of pronunciation
Teaching of pronunciationTeaching of pronunciation
Teaching of pronunciation
Dr. Bhavin Chauhan
 
Presentation on cda
Presentation on cdaPresentation on cda
Presentation on cda
abdul manan
 
syntax
 syntax syntax
syntax
Leslie Gomez
 
Theme and rheme in the english language
Theme and rheme in the english languageTheme and rheme in the english language
Theme and rheme in the english language
shagala
 
Processes of word formation
Processes of word formationProcesses of word formation
Processes of word formation
furrakhabbas
 
Discourse analysis and grammar
Discourse analysis and grammar Discourse analysis and grammar
Discourse analysis and grammar
Septy Riani Pangindoman
 
CODES in Sociolinguistics
CODES in SociolinguisticsCODES in Sociolinguistics
CODES in Sociolinguistics
Yulia Zakia
 
Speak Up: Encouraging Students to Speak in the Classroom
Speak Up: Encouraging Students to Speak in the ClassroomSpeak Up: Encouraging Students to Speak in the Classroom
Speak Up: Encouraging Students to Speak in the Classroom
Julie Hanks
 
#Grammar# in Linguistics
#Grammar# in Linguistics#Grammar# in Linguistics
#Grammar# in Linguistics
MurugesanAnnalakshmi
 
Introduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & SyntaxIntroduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & Syntax
Meagan Louie
 
Synonymy and its types
Synonymy and its typesSynonymy and its types
Synonymy and its types
Farhang Ahmed
 
ENGLISH SYNTAX
ENGLISH SYNTAXENGLISH SYNTAX
ENGLISH SYNTAX
Videoconferencias UTPL
 
Systemic functional linguistics
Systemic functional  linguisticsSystemic functional  linguistics
Systemic functional linguistics
AgostinaSanguinetti
 
Sociolinguistics : Language Change
Sociolinguistics : Language ChangeSociolinguistics : Language Change
Sociolinguistics : Language ChangeAthira Uzir
 
Teaching pronunciation to children
Teaching pronunciation to childrenTeaching pronunciation to children
Teaching pronunciation to children
Yamith José Fandiño Parra
 

What's hot (20)

Transactional and Interactional Function of Language
Transactional and Interactional Function of LanguageTransactional and Interactional Function of Language
Transactional and Interactional Function of Language
 
Connected Speech
Connected SpeechConnected Speech
Connected Speech
 
Discourse Analysis and Pragmatics
Discourse Analysis and PragmaticsDiscourse Analysis and Pragmatics
Discourse Analysis and Pragmatics
 
Knowledge in discourse
Knowledge in discourseKnowledge in discourse
Knowledge in discourse
 
Easy Ways To Teach Pronunciation
Easy Ways To Teach PronunciationEasy Ways To Teach Pronunciation
Easy Ways To Teach Pronunciation
 
Teaching of pronunciation
Teaching of pronunciationTeaching of pronunciation
Teaching of pronunciation
 
Presentation on cda
Presentation on cdaPresentation on cda
Presentation on cda
 
syntax
 syntax syntax
syntax
 
Theme and rheme in the english language
Theme and rheme in the english languageTheme and rheme in the english language
Theme and rheme in the english language
 
Processes of word formation
Processes of word formationProcesses of word formation
Processes of word formation
 
Discourse analysis and grammar
Discourse analysis and grammar Discourse analysis and grammar
Discourse analysis and grammar
 
CODES in Sociolinguistics
CODES in SociolinguisticsCODES in Sociolinguistics
CODES in Sociolinguistics
 
Speak Up: Encouraging Students to Speak in the Classroom
Speak Up: Encouraging Students to Speak in the ClassroomSpeak Up: Encouraging Students to Speak in the Classroom
Speak Up: Encouraging Students to Speak in the Classroom
 
#Grammar# in Linguistics
#Grammar# in Linguistics#Grammar# in Linguistics
#Grammar# in Linguistics
 
Introduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & SyntaxIntroduction to Language and Linguistics 005: Morphology & Syntax
Introduction to Language and Linguistics 005: Morphology & Syntax
 
Synonymy and its types
Synonymy and its typesSynonymy and its types
Synonymy and its types
 
ENGLISH SYNTAX
ENGLISH SYNTAXENGLISH SYNTAX
ENGLISH SYNTAX
 
Systemic functional linguistics
Systemic functional  linguisticsSystemic functional  linguistics
Systemic functional linguistics
 
Sociolinguistics : Language Change
Sociolinguistics : Language ChangeSociolinguistics : Language Change
Sociolinguistics : Language Change
 
Teaching pronunciation to children
Teaching pronunciation to childrenTeaching pronunciation to children
Teaching pronunciation to children
 

Viewers also liked

Language & gender presentation
Language & gender presentationLanguage & gender presentation
Language & gender presentationHasan BİLOKCUOGLU
 
The Effects Of Gender In Speaking And Using Language Of Elesp Students In
The Effects Of Gender In Speaking And Using Language Of Elesp Students InThe Effects Of Gender In Speaking And Using Language Of Elesp Students In
The Effects Of Gender In Speaking And Using Language Of Elesp Students In
UCsanatadharma
 
Language change theories
Language change theoriesLanguage change theories
Language change theoriesRobertagillum
 
Sociolinguistics and gender
Sociolinguistics and genderSociolinguistics and gender
Sociolinguistics and genderHadile Koubida
 
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...Bahram Kazemian
 
كتاب نظريات التعلم Pdf
كتاب نظريات التعلم Pdfكتاب نظريات التعلم Pdf
كتاب نظريات التعلم Pdfnoufa2003
 
language acquisition THEORIES
language acquisition THEORIESlanguage acquisition THEORIES
language acquisition THEORIESAbeeraShaikh
 
Language Learning Theory
Language Learning TheoryLanguage Learning Theory
Language Learning Theory
Anne Cunningham
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
Ronnier Barrientos
 

Viewers also liked (12)

Language & Gender
Language & GenderLanguage & Gender
Language & Gender
 
Language & gender presentation
Language & gender presentationLanguage & gender presentation
Language & gender presentation
 
The Effects Of Gender In Speaking And Using Language Of Elesp Students In
The Effects Of Gender In Speaking And Using Language Of Elesp Students InThe Effects Of Gender In Speaking And Using Language Of Elesp Students In
The Effects Of Gender In Speaking And Using Language Of Elesp Students In
 
Language change theories
Language change theoriesLanguage change theories
Language change theories
 
Sociolinguistics and gender
Sociolinguistics and genderSociolinguistics and gender
Sociolinguistics and gender
 
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...
Gender Issues in the Lion and the Jewel by Wole Soyinka: A Linguistics-Orient...
 
Gender and language
Gender and languageGender and language
Gender and language
 
كتاب نظريات التعلم Pdf
كتاب نظريات التعلم Pdfكتاب نظريات التعلم Pdf
كتاب نظريات التعلم Pdf
 
language acquisition THEORIES
language acquisition THEORIESlanguage acquisition THEORIES
language acquisition THEORIES
 
Language and Gender (Sociolinguistic)
Language and Gender (Sociolinguistic)Language and Gender (Sociolinguistic)
Language and Gender (Sociolinguistic)
 
Language Learning Theory
Language Learning TheoryLanguage Learning Theory
Language Learning Theory
 
Sociolinguistics
SociolinguisticsSociolinguistics
Sociolinguistics
 

Similar to Gender and language (linguistics, social network theory, Twitter!)

Better Questions, Better Results!
Better Questions, Better Results!Better Questions, Better Results!
Better Questions, Better Results!
Angela Peery
 
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docxSpeaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
williame8
 
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docxSpeaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
rafbolet0
 
190802 GeBNLP
190802 GeBNLP190802 GeBNLP
190802 GeBNLP
WarNik Chow
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
Meena Nagarajan
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
sodhi3
 
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxCommunity Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
donnajames55
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedmannatematias
 
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Sean Munson
 
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docxMy BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
roushhsiu
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
Sandy Man
 
Samples Of An Argumentative Essay
Samples Of An Argumentative EssaySamples Of An Argumentative Essay
Samples Of An Argumentative Essay
Jessica Hurt
 
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenVariation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Tyler Schnoebelen
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
SMART Infrastructure Facility
 
Ms7005 interviewing(1)
Ms7005 interviewing(1)Ms7005 interviewing(1)
Ms7005 interviewing(1)Coffee Dai
 
Neurosexism TESOL 2017 - Carol Lethaby
Neurosexism TESOL 2017 - Carol LethabyNeurosexism TESOL 2017 - Carol Lethaby
Neurosexism TESOL 2017 - Carol Lethaby
Carol Lethaby
 
Obtaining real meaning from your enterprise social measurement (with a few su...
Obtaining real meaning from your enterprise social measurement (with a few su...Obtaining real meaning from your enterprise social measurement (with a few su...
Obtaining real meaning from your enterprise social measurement (with a few su...
Cai Kjaer
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
IT Arena
 
Compare Contrast Essay Template
Compare Contrast Essay TemplateCompare Contrast Essay Template
Compare Contrast Essay Template
Ally Gonzales
 
Training and DevelopmentFinal ProjectNow its your turn! Below.docx
Training and DevelopmentFinal ProjectNow its your turn! Below.docxTraining and DevelopmentFinal ProjectNow its your turn! Below.docx
Training and DevelopmentFinal ProjectNow its your turn! Below.docx
TakishaPeck109
 

Similar to Gender and language (linguistics, social network theory, Twitter!) (20)

Better Questions, Better Results!
Better Questions, Better Results!Better Questions, Better Results!
Better Questions, Better Results!
 
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docxSpeaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
 
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docxSpeaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
Speaker Profession Xiomara Mejia, Melanie Sanoff, Claudia Le.docx
 
190802 GeBNLP
190802 GeBNLP190802 GeBNLP
190802 GeBNLP
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxReading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docx
 
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxCommunity Teaching Plan Teaching Experience Paper 1Unsatisf.docx
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docx
 
Research Methods 101, by Elliott Hedman
Research Methods 101, by Elliott HedmanResearch Methods 101, by Elliott Hedman
Research Methods 101, by Elliott Hedman
 
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
 
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docxMy BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docx
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
 
Samples Of An Argumentative Essay
Samples Of An Argumentative EssaySamples Of An Argumentative Essay
Samples Of An Argumentative Essay
 
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenVariation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in between
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
Ms7005 interviewing(1)
Ms7005 interviewing(1)Ms7005 interviewing(1)
Ms7005 interviewing(1)
 
Neurosexism TESOL 2017 - Carol Lethaby
Neurosexism TESOL 2017 - Carol LethabyNeurosexism TESOL 2017 - Carol Lethaby
Neurosexism TESOL 2017 - Carol Lethaby
 
Obtaining real meaning from your enterprise social measurement (with a few su...
Obtaining real meaning from your enterprise social measurement (with a few su...Obtaining real meaning from your enterprise social measurement (with a few su...
Obtaining real meaning from your enterprise social measurement (with a few su...
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
Compare Contrast Essay Template
Compare Contrast Essay TemplateCompare Contrast Essay Template
Compare Contrast Essay Template
 
Training and DevelopmentFinal ProjectNow its your turn! Below.docx
Training and DevelopmentFinal ProjectNow its your turn! Below.docxTraining and DevelopmentFinal ProjectNow its your turn! Below.docx
Training and DevelopmentFinal ProjectNow its your turn! Below.docx
 

More from Tyler Schnoebelen

Emoji are great and/or they will destroy the world
Emoji are great and/or they will destroy the worldEmoji are great and/or they will destroy the world
Emoji are great and/or they will destroy the world
Tyler Schnoebelen
 
The Ethics of Everybody Else
The Ethics of Everybody ElseThe Ethics of Everybody Else
The Ethics of Everybody Else
Tyler Schnoebelen
 
Introduction to emotion detection
Introduction to emotion detectionIntroduction to emotion detection
Introduction to emotion detection
Tyler Schnoebelen
 
Studying emotion in the field
Studying emotion in the fieldStudying emotion in the field
Studying emotion in the field
Tyler Schnoebelen
 
Emoji linguistics
Emoji linguisticsEmoji linguistics
Emoji linguistics
Tyler Schnoebelen
 
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Tyler Schnoebelen
 
Crowdsourcing big data_industry_jun-25-2015_for_slideshare
Crowdsourcing big data_industry_jun-25-2015_for_slideshareCrowdsourcing big data_industry_jun-25-2015_for_slideshare
Crowdsourcing big data_industry_jun-25-2015_for_slideshare
Tyler Schnoebelen
 
Towards a dictionary of the future
Towards a dictionary of the futureTowards a dictionary of the future
Towards a dictionary of the future
Tyler Schnoebelen
 

More from Tyler Schnoebelen (8)

Emoji are great and/or they will destroy the world
Emoji are great and/or they will destroy the worldEmoji are great and/or they will destroy the world
Emoji are great and/or they will destroy the world
 
The Ethics of Everybody Else
The Ethics of Everybody ElseThe Ethics of Everybody Else
The Ethics of Everybody Else
 
Introduction to emotion detection
Introduction to emotion detectionIntroduction to emotion detection
Introduction to emotion detection
 
Studying emotion in the field
Studying emotion in the fieldStudying emotion in the field
Studying emotion in the field
 
Emoji linguistics
Emoji linguisticsEmoji linguistics
Emoji linguistics
 
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...
 
Crowdsourcing big data_industry_jun-25-2015_for_slideshare
Crowdsourcing big data_industry_jun-25-2015_for_slideshareCrowdsourcing big data_industry_jun-25-2015_for_slideshare
Crowdsourcing big data_industry_jun-25-2015_for_slideshare
 
Towards a dictionary of the future
Towards a dictionary of the futureTowards a dictionary of the future
Towards a dictionary of the future
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Gender and language (linguistics, social network theory, Twitter!)

  • 1. Gender, language and Twitter: Social theory and computational methods Tyler Schnoebelen (including work with David Bamman and Jacob Eisenstein) Tweet this talk! @Tschnoebelen
  • 2. Welcome to the slide-u-ment • Hi, you may want to check out the “Notes” fields for additional context.
  • 3. At its most basic
  • 4. At its most basic • Assumption 1: Men and women use different vocabularies – Hypothesis I: Computational methods can cut through noise and predict speaker gender based on the words they use • Assumption 2: Social networks are typically “homophilous” (birds of a feather flock together) – Hypothesis II: Adding the gender make-up of a user’s social network should get even better prediction
  • 5.
  • 6. Let’s say we can predict gender • So what? • Does it license us to connect words/word groups to the social category in question? • This assumes that gender is – Stable – The primary driving force
  • 7. Our actual goal • Problematize gender prediction as a task – Define a system where we could just “stop” and call it good – But NOT ACTUALLY STOP • Demonstrate that simple gender binaries aren’t actually descriptively accurate • Show ways to combine social theory and computational methods that expand the questions on both sides
  • 10. Typical findings • Women use standard variables more often than men. – In fact, early dialectologists ignored women completely because they wanted “NORMS”—non-mobile, older, rural male speakers, seen as preserving the purest regional (non-standard) forms • See Chambers and Trudgill (1980). – Did they do it for prestige (to acquire social capital)? – To avoid losing status? – Are women actually creating norms, not following them?
  • 11. Computational/corpus work • People are fascinated by gender differences • In order to get statistical significance, you have to have enough data where you can detect a signal • In the past, this has led researchers to roll up words into word classes
  • 12. The most common distinctions • Men use informative language – Prepositions (to), attributive adjectives (fat), higher word lengths (gargantuan) • Women use involved language – First and second person pronouns (you), present tense verbs (goes), contractions (don’t) • (Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler, Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in Biber 1995 and Chafe 1982)
  • 13. Or “contextuality” • Men are formal and explicit – Nouns (floor), adjectives (big), prepositions (to), articles (the) • Women are deictic and contextual – Pronouns (you), verbs (run), adverbs (happily), interjections (oh!) • “Contextuality” decreases when an unambiguous understanding is more important or difficult—when people are physically or socially farther away • (Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of Heylighen and Dewaele 2002)
  • 14. Are all nouns really the same?
  • 15. Are all nouns really the same?
  • 18. Our approach also lumps • It’s just at a lower level – instead of “nouns” or “blog words” – we assume all usages of a unigram are identical • Lumping itself isn’t a problem. In fact, you have to. – But ideologies are going to structure your lumpings and divisions, so watch out!
  • 19. OUR WORK (WITH DAVID BAMMAN AND JACOB EISENSTEIN)
  • 20. Data • Public Twitter messages in same-gender and cross-gender social networks – Word frequencies (unigrams) – Gender (induced from first names using the Social Security Administration data) • 14,464 Twitter users (56% male) – Geolocated in the US – Must use 50 of top 1,000 most frequent words – Between 4 and 100 ties (at least 2 “mutual @’s” separated by 14 days) • Women have 58% female friends • Men have 67% male friends • 9.2M tweets, Jan-Jun 2011
  • 21. Twitter has a pretty good swath (Pew) • Nearly identical usage among women and men: – 15% of female internet users are on Twitter – 14% of male internet users • High usage among non-Hispanic Blacks (28%) • Even distribution across income and education levels • Higher usage among young adults (26% for ages 18-29, 4% for ages 65+)
  • 22. First names are highly gendered 100 97 86 15 0 0 3 14 85 100 0 20 40 60 80 100 Matt Alex Chris Kelly Sarah % female % male 95% of users have a name 85% associated with one gender Median user name is 99.6% associated with its majority gender
  • 23. First step: gender prediction • Logistic regression: – Will you have a heart attack Y/N? – Will you vote for X or Y? – Will your Brazilian Portuguese nouns and modifiers agree in number? • Logistic regression is the statistical technique at the core of variable rule analysis (Tagliamonte 2006) • But we’re going to reverse the direction for what sociolinguists typically do
  • 24. First step: gender prediction • The relevant linguistic variables aren’t known beforehand • So the dependent variable—the thing we are trying to predict—is author gender • The independent variables are the 10,000 most frequent lexical items in the tweets
  • 25. Preventing overfitting • This involves estimating a lot of parameters. • Which raises the risk of overfitting: learning parameter values that perfectly describe the training data but won’t generalize to new data
  • 26. Why regularize? Regularization dampens the effect of an individual variable (Hastie et al 2009). A single regularization parameter controls the tradeoff between perfectly describing the training data and generalizing to unseen data.
  • 27. Evaluating accuracy • We use the typical method of cross-validation. 1. Randomly divide the full dataset into 10 parts. 2. Train on 80% of the data 3. Use 10% of the data to tune the regularization parameter 4. Now, use the model to predict the other 10% 5. Compare the predictions to what really happened • Do this 10 times and take the average.
  • 28. Gender prediction results • State-of-the-art accuracy: 88.0% – Lexical features strongly predict gender – Ignoring syntax (treating tweets as “bags of words”) does pretty good
  • 29. Previous literature In our data Pronouns F F Emotion terms F F Family terms F Mixed results "Blog words" (lol, omg) F F Conjunctions F F (weakly) Articles M No results Numbers M M Quantifiers M No results Technology words M M Prepositions Mixed results F (weakly) Swear words Mixed results M Assent Mixed results Mixed results Negation Mixed results Mixed results Emoticons Mixed results F Hesitation markers Mixed results F Top 500 markers for each gender
  • 30. At a corpus level, women use more non-dictionary words and men use more named entities. In a moment we’ll ask how universal this is. Hand classification of most frequent 10k words (90.0% agreement) Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g., :), lmao) 4.28% 2.99% Non-standard, pronounceable words (e.g., luv) 3.55% 3.35% Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%
  • 31. Involvement • Using traditional definitions, it looks as if our data confirms: – men as more informational (all those named entities) – women as more interactive/involved (pronouns, emoticons, etc.) • Note that most of the named entities for the men are sports figures and teams
  • 32. Right. These guys are not “involved”.
  • 33.
  • 34. Clustering without regard to gender • We apply probabilistic clustering in order to group authors who are linguistically similar • Each author is represented as a list of word counts across the 10,000 words used in the classification experiment
  • 35. Clustering! (Hastie et al 2009) Easy example: 2 clusters “Expectation Maximization” 1. Randomly assign all authors to one of 20 clusters 2. Calculate the center of the cluster from the average word counts of all authors put in it 3. Assign each author to the nearest cluster, based on the distance between their word counts and the average word counts of the cluster center 4. Keep iterating through this moving from random clustering to meaningful clusters 5. Repeat steps 1-4 (25 times) 6. Pick the best
  • 36. Some definitions • Style: combinations of linguistic resources • Cluster: a group of authors who use a particular style • Social network: each author has a social network made up of people who they send AND receive messages from • An author’s social network does not have to be a part of that author’s cluster
  • 37. Majority female clusters Size % fem Top words c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d: c16 200 78.00% xo blessings -) xoxoxo #music #love #socialmedia slash :)) xoxo c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___- c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes c9 458 60.00% wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls niggas finna
  • 38. Looks like “women are trying to destroy the English language” Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g., :), lmao) 4.28% 2.99% Non-standard, pronounceable words (e.g., luv) 3.55% 3.35% Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%
  • 39. Clusters that are majority female • At the population level, women use many non- dictionary words. • But there are clusters of (mostly) women who actually use fewer words like lol, nah, haha than men do Size % fem Top words c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d: c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes
  • 40. Consider xo • A lot more women use xo than men – 11% of all women – 2.5% of all men • But that means that 89% of women aren’t using it at all. • People who use xo are three times more likely to use ttyl (‘talk to you later’) – The style is more commonly adopted by women – But there’s other stuff going on here: age, job, etc. – It’s not clear that gender is even the most important, it’s just that we’re starting with gender-colored glasses
  • 43. Group Gender Activity/social role Interactions Geography Shit Guys Don't Say Out Loud Shit College Freshmen Say Shit Girlfriends Say Shit Asian Dads Say Shit Redneck Guys Say Shit Girls Say to Gay Guys Say Shit Black Girls Say Say Shit Black Guys Say Say Shit People Say in LA Shit White Girls Say…to Black Girls Shit New Yorkers Say Shit Frat Guys Say Shit Whipped Guys Say Shit Guys Don't Say Say Shit Asian Girls Say Shit Tumblr Girls Say Shit Brides Say Shit Spanish Girls Say Shit Asian Moms Say Shit Vegans Say Shit Hipsters Say Shit Cyclists Say Shit Yogis Say Shit Skiers Say
  • 44. Notice • That gender wasn’t really limited to the “gender” column – “Moms” and “dads” are gendered social roles • And that the words “guys” and “girls” aren’t really the same as “male” and “female” – What are the plausible age ranges and social styles for “guys” and “girls”?
  • 45. Clusters that are majority male Size % male Top words c13 761 89.40% #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer c10 1,865 85.40% /cc api ios ui portal developer e3 apple's plugin developers c18 623 81.10% @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler c11 432 73.80% niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo c20 429 72.50% gop dems senate unions conservative democrats liberal palin republican republicans c15 963 65.30% #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare
  • 46. Looks like “men are Twitter-headed sailor-swearing accountants” Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g., :), lmao) 4.28% 2.99% Non-standard, pronounceable words (e.g., luv) 3.55% 3.35% Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%
  • 47. Aggregates generally don’t hold Top words Notes c13 #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer Few Taboo/Hashes Lots of Punc c10 /cc api ios ui portal developer e3 apple's plugin developers Few Taboo/Hashes Lots of Punc c18 @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler c11 niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo Few Dict words, Lots of unPron and Pron c20 gop dems senate unions conservative democrats liberal palin republican republicans Few Taboo/Hashes Lots of Punc c15 #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare Few Taboo Lots of Punc
  • 48. Small exceptions • At the population level, men use many named entities and numbers • Clusters use these at various rates, but: – No female-skewed clusters use them *more* than the male average – No male-skewed clusters use them *less* than the female average • But again, the other 6 generalizations about gender we might have made at an aggregate aren’t supported once we get to clusters
  • 49. Erasure! • Clusters are highly gendered • For example, let’s consider clusters made up of 60% or more of people of the same gender – That covers 82.95% of all the authors – But what about the 1,242 men who are part of female-majority clusters? – The 1,052 women who are part of male-majority clusters? – Are they just noise? Odd-balls? Is there no structure to what they’re doing? – These people are using language to do identity work, even as they construct identities at odds with conventional notions of masculinity and femininity.
  • 50. Clusters vs. social networks • The more skewed a cluster is, the more skewed the social networks of its members 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 percent male percentmalefriends
  • 51. Women with female networks use the most female markers 1 2 3 4 5 6 7 8 9 10 0.20.40.60.81.0 female authors percent female social network femalemarkerproportion
  • 52. Men with male networks use the most male markers 1 2 3 4 5 6 7 8 9 10 0.00.20.40.60.8 male authors percent male social network malemarkerproportion
  • 53. Women with male networks use more male markers (and vice versa)
  • 54. Women with highly female networks are easier to classify (and vice versa)
  • 55. In other words • The classifier is picking up on the fact that if you insist upon a gender binary then people with same-gender networks use language in a more “gender-coherent” way.
  • 56. Does social network help prediction? • 88% accuracy with text alone – Logistic regression, 10-fold cross-validation – State-of-the-art accuracy • Add network information… – Still 88% accuracy
  • 57. Once we have 1000 words/author, network info doesn’t help Words Accuracy 0.50.60.70.80.91.0 0 10 100 1,000 10,000 all words plus social network words only 0.880
  • 58. Wait, why not? • A new feature is only going to improve classification accuracy if it adds new information. • There is strong homophily: 63% of the connections are between same-gender individuals. • But language and social network can’t mutually disambiguate because they aren’t independent views on gender. • Individuals who use linguistic resources from “the other gender” consistently have denser social network connections to the other gender. – Performance, style, accommodation • Gender is not an “A or B” kind of thing
  • 59. If we seek only predictive accuracy…
  • 61. Not so simple • If we want to understand categories, we should start with people in interactions. – Counting is great but we have to watch our bins and investigate them, too.
  • 62. Look at words a different way
  • 66. Positioning and stance • “Stance” is usually seen as an expression of a speaker’s relationship to their talk and their interlocutors – E.g., Kiesling (2009); Du Bois (2007); Bednarek (2008) • But “stance” (and “roles”) seem static • I’d like something with more motion and dynamism
  • 67. Positioning and stance • “Stance” is usually seen as an expression of a speaker’s relationship to their talk and their interlocutors – E.g., Kiesling (2009); Du Bois (2007); Bednarek (2008) • But “stance” (and “roles”) seem static • I’d like something with more motion and dynamism • I develop positioning to connect linguistic forms to social structures • (Particularly affect, actually)
  • 68. Positioning in a social grid
  • 70. Positioning in a social grid • Social structures are created, maintained, and changed by specific interactions • People enter interactions already positioned • Interactions change these positions, people are attentive to changes
  • 71. Conventions • Different linguistic resources come to be associated with different positionings • Distributions of experiences are usually maintained • The maintenance and disruption of expectations has (affective) consequences
  • 72. A LITTLE BIT OF LITTLE
  • 73. CHILDES (MacWhinney, 2000) • 4,676 transcripts of parent-child interactions – American English Observed little Expected little O/E Mothers-to-boys 4,313 4,158 1.037 Fathers-to-boys 1,516 1,381 1.098 Mothers-to-girls 6,312 5,441 1.160 Fathers-to-girls 230 281 0.819 Girls-to-mothers 1,221 1,533 0.796 Girls-to-fathers 4 3 1.482 Boys-to-mothers 875 1,526 0.573 Boys-to-fathers 117 265 0.441
  • 74. Gender and little • Women tend to use little more—multiple corpora show significant differences • But this misses the point Buckeye OE CALLHOME OE Female 1.170 1.073 Male 0.855 0.725
  • 75. Add interlocutor gender CHILDES Parent- Child OE CHILDES Child- Parent OE Buckeye OE Fisher Am. Eng. OE Fisher Ohioans OE CALLHOME OE Female to female 1.160 0.796 0.936 1.051 1.160 1.088 Female to male 1.037 1.482 1.290 0.887 0.771 1.064 Male to male 1.098 0.441 0.879 1.071 0.830 0.685 Male to female 0.819 0.573 0.908 0.842 0.836 0.727
  • 76. Gender and topics • Some topics are more face-threatening than others. – Face-threatening topics get less little. • When topic is held constant, men and women mostly have the same little usage . – Regardless of the gender of the person they’re talking to. • But there are some exceptions, which are connected to issues of masculinity, femininity, and emotional regulation. – Some examples: • Generally, people don’t use little to talk about terrorism. EXCEPT women speaking to women use little to modify emotions (terrified, scared) • Generally, people DO use little to talk about fitness. EXCEPT men talking to men. The men talking to women use little to talk about their pudgy, flabby bodies. The few men talking to men who use little use it to talk about working out a little harder or putting on a little more muscle mass.
  • 77. ICSI meeting corpus (Janin et al., 2003) • 75 meetings from Berkeley’s International Computer Science Institute (2000-2002) – 3-10 participants (avg of 6) – 17-103 minutes each (usually an hour) – 72 hours of data # speakers (avg age) Observed little Expected little O/E Undergrad 6 (30 yo) 59 34 1.734 Grad 14 (29 yo) 234 223 1.049 Postdoc 1 (not given) 51 75 0.676 Ph.D. 11 (37 yo) 152 228 0.667 Professor 4 (52 yo) 278 213 1.302
  • 78. Gender, genre, topic, style • “Different ways of saying things are intended to signal different ways of being, which includes different potential things to say.” (Eckert 2008)
  • 79. Majority female clusters Size % fem Top words c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d: c16 200 78.00% xo blessings -) xoxoxo #music #love #socialmedia slash :)) xoxo c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___- c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes c9 458 60.00% wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls niggas finna
  • 80. Clusters that are majority male Size % male Top words c13 761 89.40% #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer c10 1,865 85.40% /cc api ios ui portal developer e3 apple's plugin developers c18 623 81.10% @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler c11 432 73.80% niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo c20 429 72.50% gop dems senate unions conservative democrats liberal palin republican republicans c15 963 65.30% #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare
  • 81. Gender is not something people have
  • 82. It’s something people *do* And there are a lot of ways to “do” gender.
  • 84. Gender is binary only with blinders • “My mom doesn’t say that’s lovely or omg!...” – “Nevermind that!” • Problem: Sliding from predictive accuracy to causal stories • Realistic finding: There are lots of ways to do gender
  • 85. Big data, big opportunities • Big data offers us the opportunity to let clusters emerge (and test them against our big bins) • We can show how language reflects and creates the social worlds we live in

Editor's Notes

  1. E.g., Cheshire (2004), Cameron & Coates (1989), Eckert & McConnell-Ginet (1999), Holmes (1997), or Romaine (2003).
  2. (Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler, Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in Biber 1995 and Chafe 1982)
  3. (Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of Heylighen and Dewaele 2002)
  4. We also ran our work with part-of-speech tagged unigrams for one level less lumping—the results are basically the same but not reported here.
  5. We only select users with first names that occur over 1,000 times in the census data (approximately 9,000 names), the most infrequent of which include Cherylann, Kailin and Zeno. We further filtered our sample to only those individuals who are actively engaging with their social network. Twitter contains an explicit social network in the links between individuals who have chosen to receive each other’s messages. However, Kwak, Lee, Park, and Moon (2010) found that only 22 percent of such links are reciprocal, and that a small number of hubs account for a high proportion of the total number of links. Instead, we define a social network based on direct, mutual interactions. In Twitter, it is possible to address a public message towards another user by prepending the @ symbol before the recipient’s user name. We build an undirected network of these links. To ensure that the network is mutual and as close of a proxy to a real social network as possible, we form a link between two users only if we observe at least two mentions (one in each direction) separated by at least two weeks. This filters spam accounts, unrequited mentions (e.g., users attempting to attract the attention of celebrities), and one-time conversations. We selected only those users with between four and 100 mutual-mention friends. The upper bound helps avoid ‘broadcast-oriented’ Twitter accounts such as news media, corporations, and celebrities. --e.g., The Social Security Administration says: Tyler is a male name 97.36% of the time Annette is a female name 100% of the time Robin is female 87.69% of the time
  6. Some names are ambiguous by gender, but in our dataset, such ambiguity is rare: the median user has a name that is 99.6% associated with its majority gender; 95% of all users have name that is at least 85% associated with its majority gender. We assume that users tend to self-report their true name; while this may be largely true on aggregate, there are bound to be exceptions. Our analysis therefore focuses on aggregate trends and not individual case studies. A second potential concern is that social categories are not equally relevant in every utterance. But while this is certainly true in some cases, it is not true on aggregate — otherwise, accurate gender prediction from text would not be possible. Later, we address this issue by analyzing the social behavior of individuals whose language is not easily associated with their gender.
  7. All words were converted to lower-case but no other preprocessing or stopword filtering was performed.
  8. Basically you train on part of the data but hold out part of it to tune the regularization parameter.
  9. Example: If a single word, like indubitably were used three times by men and never by women, an overfit model would have high confidence that anyone who uses indubitably is a man, regardless of other words they use That would be dumb. So we use regularization.
  10. The accuracy in gender prediction by this method is 88.0%, which is state of the art compared with gender prediction on similar datasets (Burger et al. 2011). While more expressive features might perform better still, the high accuracy of lexical features shows that they capture a great deal of language’s predictive power with regard to gender.
  11. More specifically, we apply the standard machine learning technique of logistic regression (Hastie, Tibshirani, & Friedman, 2009) . The model learns a column vector of weights w to parametrize a conditional distribution over labels (gender) as , where and x represents a column vector of term frequencies. The weights are chosen to maximize the conditional likelihood P(y| x; w) on a training set, using quasi-Newton optimization. To prevent overfitting of the training data, we use standard L2 regularization; this is equivalent to ridge regression in linear regression models. As features, we used a boolean indicator for each of the most frequent 10,000 words in the dataset. Train a statistical model on part of the data. Logistic regression (Hastie, Tibshirani, & Friedman, 2009) Test it on a different part of the data, hiding the gender labels. 10-fold cross-validation: 10 unique training/test splits (so the test is a different 10% of the data)
  12. Not shown: Clitics: previous lit “F”, our data: weakly “F”
  13. Because the counts are so high, all differences are statistically significant at p < 0.01. Hand-classified by two authors, disagreements decided by discussion between all three authors.
  14. But categories are never simply descriptive; they are normative statements that draw lines around who is included and excluded (Butler 1990).
  15. Expectation-maximizing (EM) algorithm (Dempster et al., 1977); basically k-means with log-linear distributions (Eisenstein, Ahmed, and Xing, ICML 2011) 25 runs with randomly generated Q(zn=k) and select the iteration with the highest joint likelihood. Each author is assigned a distribution over clusters ; each cluster has a probability distribution over word counts and a prior strength . In the EM algorithm, these parameter are iteratively updated until convergence. The probability distribution over words uses the Sparse Additive Generative Model (Eisenstein, Ahmed, and Xing 2011), which is especially well suited to high-dimensional data like text. For simplicity, we perform a hard clustering, sometimes known as hard EM. Since the EM algorithm can find only a local optimum, we make 25 runs with randomly-generated initial assignments, and select the run with the highest likelihood.
  16. Each cluster is associated with a probability distribution over text and each author is placed in a cluster with the best probabilistic fit for their language. The maximum-likelihood solution is the clustering that assigns the greatest probability to all of the observed text.
  17. Because the counts are so high, all differences are statistically significant at p < 0.01. Hand-classified by two authors, disagreements decided by discussion between all three authors.
  18. That is, we’re comparing these clusters’ rates with the aggregated-men’s rates. We’re reporting the clusters that are significantly different. (In other words, women in these three clusters are using lol-like words significantly less than men-on-a-whole do. If women were really non-standard across the board, we wouldn’t expect any clusters to use less than the aggregated MALE number.)
  19. These are some of the most popular “Shit X Say” videos. Notice that “Gender” is not limited to the “gender” category—e.g., ”girls” does not include “elderly women” and “Moms” doesn’t really include teenagers (even if they are young mothers).
  20. Because the counts are so high, all differences are statistically significant at p < 0.01. Hand-classified by two authors, disagreements decided by discussion between all three authors.
  21. I developed my ideas about positioning out of the data, but the metaphor is powerful and after I was mostly done, I found that Rom Harré and colleagues had made their own explorations/elaborations of “positioning”. We took different paths to a fairly similar end point. I’m happy to have my work be considered an extension of his.
  22. “You” and “I” aren’t references to objects independent of time and space They are momentary status updates (Harré, 1983) Even when they aren’t explicitly there, you and I are there—our talk relates us to each other
  23. 1b:But the structure does impose constraints on interactions (Bourdieu, 1977; Butler, 1999; Giddens 1984) 3: citation (Goffman 1981) 3b: People make use of conversational forms and strategies that are available to them (Harré, 1986; Vygotsky, 1962)
  24. 2b: Expectations are maintained 2c: People are enabled and constrained by these expectations
  25. It’s a “female marker” in Twitter.
  26. But categories are never simply descriptive; they are normative statements that draw lines around who is included and excluded (Butler 1990).
  27. And we can’t trust the idea that we’ll just figure out each of the independent parts—if we figure out “woman” and “African American” then we’ll understand “African American women”.
  28. ----- Meeting Notes (4/21/14 10:03) ----- Sports teams and gender, Americanness, "doing sports teams” (not “doing gender”) Online utterances: we can tailor, we can delete, how feed in to this Tried to take into account time of tweet, predictors of spelling out words Apply techniques like LDA or hierarchical model, maybe person is drawing from clusters. Gender or cluster. People are distributions over topics. Fisher topics vs. Twitter topics Statistican asks about using clusters to predict gender, then realizes I’m moving the goalposts