1. Machine learning approaches for understanding
social interactions on Twitter
May 6, 2014
Alice Oh
alice.oh@kaist.edu
aoh@seas.harvard.edu
http://uilab.kaist.ac.kr/members/aliceoh/
2. Our Research
• Topic Modeling
• ICML 2014: Hierarchical Dirichlet scaling process
• IJCAI 2013: Context-dependent conceptualization
• NIPS Big Learning Workshop 2012: Distributed online learning for latent Dirichlet
allocation
• CIKM 2012: Recursive Chinese restaurant processes for modeling topic hierarchies
• ICML 2012: Dirichlet processes with mixed random measures
• Social Media Analysis
• ACL 2014 Workshop: Self-disclosure topic model
• WWW 2014: Computational analysis of agenda setting theory
• AAAI 2013: Hierarchical aspect-sentiment model
• ICWSM 2012: Social aspects of emotions in Twitter conversations
• ACL 2012: Self-disclosure and relationship strength in Twitter conversations
• WSDM 2011: Aspect sentiment unification model for online review analysis
2
3. Contact Information
• At Harvard until end of July, 2014 and open for
• Collaborations: writing papers, sharing data, etc.
• Discussions about topic modeling and computational social science
• Going back to KAIST in August
• http://uilab.kaist.ac.kr
• alice.oh@kaist.edu
• Can recommend students for intern, postdoc, and researcher positions
• Please consider attending
• ICWSM (program co-chair), Ann Arbor, MI
• ACL Workshop on Social Dynamics and Personal Attributes (co-
organizer), Baltimore, MD
3
7. Motivation
• What are the topics discussed in the article?
• Is the article related to
• household finances?
• price of gasoline?
• price of Apple stock?
• How would you build an automatic system for answering these questions?
13. Graphical Representation of LDA
Topic Distributions
nascar, races, track, raceway, race, cars, fuel, auto, racing
economic, slowdown, sales, recession, costs, spending, save
fans, spectators, sports, leagues, teams, competition
Topics: multinomial over words
Topics
sales xxx slowdown
recession cars races
spending xxx save
costs fuel
13
14. Do you feel what I feel?
Social Aspects of Emotions in Twitter Conversations
Suin Kim, JinYeong Bak, Alice Oh
ICWSM 2012
14
15. Twitter conversation data
• Twitter conversation data: approx 220k dyads who “reply” to each other,
1,670k conversational chains (We now have about 5x this amount)
!
1!
2!
3!
4!
17. Emotion cycles
We propose that organizational dyads and groups inhabit
emotion cycles: Emotions of an individual influence the
emotions, thoughts and behaviors of others; others’ reactions
can then influence their future interactions with the individual
expressing the original emotion, as well as that individual’s
future emotions and behaviors. People can mimic the
emotions of others, thereby extending the social presence of a
specific emotion, but can also respond to others’ emotions,
extending the range of emotions present.
17
18. Topic model with a twist
• Dirichlet forest prior (Andrzejewski et al.)
• Mixture of Dirichlet tree distribution
• Dirichlet tree: Generalization of Dirichlet distribution
• Knowledge is expressed using Must-link and Cannot-link
primitives
• Must-link(love, sweetheart)
• Cannot-link(exciting, bored)
18
q
⌘
DF-LDA
19. Domain knowledge in Dirichlet forest prior
19
Seed Words
anticipation
hope
wait
await
inspir
excit
bore
readi
expect
nervou
calm
motiv
prepar
certain
anxiou
optimist
forese
joy
awesom
amaz
wonder
excit
glad
fine
beauti
high
lucki
super
perfect
complet
special
bless
safe
proud
anger
shit
bitch
ass
mean
damn
mad
jealou
piss
annoi
angri
upset
moron
rage
screw
stuck
irrit
surprise
amaz
wow
wonder
weird
lucki
differ
awkward
confus
holi
strang
shock
odd
embarrass
overwhelm
astound
astonish
fear
scare
stress
horror
nervou
terror
alarm
behind
panic
fear
afraid
desper
threaten
tens
terrifi
fright
anxiou
sadness
sorri
bad
aw
sad
wrong
hurt
blue
dead
lost
crush
weak
depress
wors
low
terribl
lone
disgust
sick
wrong
evil
fat
ugli
horribl
gross
terribl
selfish
miser
pathet
disgust
worthless
aw
asham
fuck
acceptance
okai
ok
same
alright
safe
lazi
relax
peac
content
normal
secur
complet
numb
fulfil
comfort
defeat
Must-link within a class Cannot-link between classes
20. Emotion Topics How do we express emotions?
JoyAnticipation Anger
Topic 114
omg
love
haha
thank
really
Topic 107
love
thank
follow
wow
Topic 159
good
day
hope
morning
thank
Topic 158
love
thank
miss
hug
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss
Topic 146
come
wait
week
day
june
Topic 146
good
day
time
work
Topic 131
lmao
fuck
ass
bitch
shit
Topic 4
ass
yo
lmao
nigga
Topic 19
lmao
shit
damn
fuck
oh
Topic 13
shit
nigga
smh
yea
Fear
Topic 48
omg
oh
lmao
shit
scare
Topic 78
happen
heart
attack
hospital
Topic 27
don’t
come
night
sleep
outside
Topic 140
time
got
work
day
Surprise
Topic 172
yeag
know
think
true
funny
Topic 89
know
don’t
think
look
Topic 15
think
don’t
know
make
really
Topic 94
haha
dont
think
really
29 70 21 14 5
Sadness Disgust
Topic 6
oh
sorry
haha
know
didnt
Topic 59
hurt
got
good
bad
Topic 106
tweet
reply
didn’t
read
sorry
Topic 155
oh
really
make
feel
Topic 116
oh
fuck
don’t
ye
ew
Topic 116
look
haha
oh
know
Topic 22
don’t
oh
think
yeah
lmao
Topic 174
don’t
think
say
people
Acceptance
Topic 43
ok
oh
thank
cool
okay
Topic 102
know
try
let
ok
Topic 199
xx
thank
good
okay
follow
Topic 8
night
love
good
sleep
17 7 18 Neutral
Topic 180
com
www
http
check
youtube
Topic 156
twitter
facebook
people
account
Topic 184
account
google
app
work
email
Topic 67
food
chicken
cook
rt
19
20
21. Emotion Topics How do we express emotions?
JoyAnticipation
Topic 114
omg
love
haha
thank
really
Topic 107
love
thank
follow
wow
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss
Sadness
Topic 6
oh
sorry
haha
know
didnt
Topic 59
hurt
got
good
bad
Neutral
Topic 180
com
www
http
check
youtube
Topic 156
twitter
facebook
people
account
GreetingCaring
Sympathy
IT/Tech
21
22. Emotion-tagged
conversations 22
A (Love): @amithpr @dhempe @OperaIndia - Would you have any update on
@mrunmaiy's health - hope she is recovering well?
B (neut): @labnol @dhempe she is recovering but slow. The injury is on the spine
therefore worrisome. Still in icu.
A (Sadness): @amithpr thanks for the update.. extremely said to hear that news..
B (neut): @labnol #prayformrun She is a fighter and will come out of this
B (neut): @AyeItsMeiMei just tell ur followers to report her for spam. then she'll be
kicked off twitter
A (Anger): @Jakeosaurous dude I didn't even do shit to her I'm just here tweeting &
she calls me a ugly bitch? I was like oh wow thanks?
B (neut): @AyeItsMeiMei yeah clearly shes so ugly she cant even use her real pic:P
so dont feel bad
A (Love): @Jakeosaurous haha. I don't care. She's getting spammed with hate.
Hahaha. (": thanks though.
B (neut): @AyeItsMeiMei np
24. Defining “Influence”
emotion influencing tweet
User A
User B
Having a tough day
today. RIP Harrison. I’ll
miss you a ton :/
Just pray about it.
God will help you.
Not really religious,
but thanks man. :)
If you need talk
you know I’m here.
Time
(Sadness)
(Acceptance)
(Anticipation)
24
25. Topic 117
tweet
people
don’t
read
post
Topic 59
hurt
got
bad
pain
feel
Emotion Influences What can you say to make your
partner feel better?
Joy → SadnessSadness → Joy
Topic 18
wear
look
think
love
black
Topic 24
love
thank
great
new
look
Anticipation → Surprise
Topic 96
music
listen
play
song
good
Topic 178
follow
tweet
people
twitter
thank
Acceptance → Anger
Topic 31
i’m
got
lmax
shit
da
Topic 13
lmao
shit
nigga
smh
yea
Disgust → Joy
Topic 61
watch
new
live
tv
tonight
Topic 63
watch
good
think
know
look
Suggesting Greeting
Sympathy
Swear words Complaining
25
26. 0
0.075
0.15
0.225
0.3
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.041
0.0710.082
0.053
0.265
0.061
0.081
0.0420.051
Emotion Influence: Sadness to Joy
Emotion Influence: Joy to Anger
0
0.09
0.18
0.27
0.36
Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral
0.211
0.230.2140.209
0.191
0.2370.253
0.358
0.273
Expressing Anger has 26.5% of chance
of changing the partner’s emotion from
Joy to Anger.
26
Expressing Joy has 35.8% of chance of changing
the partner’s emotion from Sadness to Joy.
28. Self-disclosure Research using Twitter
• People disclose personal and secretive information
• to build and maintain interpersonal relationship
• to get social support
• Twitter is a great source for naturally-occurring, large-
scale, longitudinal data on self-disclosure behavior
• We develop a topic model for classifying self-disclosure
behavior into three categories: G (general, no disclosure),
M (medium disclosure), H (high disclosure)
• We look at the correlation of self-disclosure behavior and
frequency of Twitter conversations in longitudinal data
28
29. Self-disclosure in Twitter conversations
29
Conversa)on
2:
I'm
moving
out.
@xxxx
???
What's
going
on
bb?
@yyyy
Mother.
Done
with
her.
I
am
planning
to
get
out
now.
There's
nothing
I
can
do,
we
dont
get
along
@xxxx
I'm.sorry
hunn.
That's
rough.
Where
are
you
going
to
go
though?
@yyyy
Probably
stay
at
a
friends
place
in
the
Cmebeing
unCl
I
find
a
place
to
live!
@xxxx
:/
well
I'm
glad
your
geHng
out
if
she
is
being
horrible
to
you
Conversa)on
3:
Oh,
prepregnancy
pants,
you
are
so
uncomfortable.
@eeee
You
can
put
them
on?
Jealous.
@ffff
they
are
cuHng
into
my
flesh
and
are
giving
me
a
ridiculous
muffin
top.
It
isn't
preOy.
But
we
have
company
coming
over.
@eeee
Yea,
I
tried
yesterday.
I
got
one
pair
of
shorts
to
buOon
painfully
and
my
jeans
just
laughed
at
me.
Conversa)on
1:
So
my
brother
is
going
to
Roskilde
FesCval
and
my
mother
and
sister
is
going
to
England..
That
leaves
me,
my
dad
and
my
dog.
@cccc
why
aren't
you
going
to
england?
@dddd
because
my
sister
is
going
with
3
of
her
friends
and
my
mom's
just
there...
to
be
there.
And
my
sister
didn't
want
me
to
come
:(
30. Data
• Full data
• 88k users, 51k dyads
• 1.3M conversations
• 10.5M tweets
• Longitudinal data from August 2007 to July 2013
• Labeled data (gold standard for self-disclosure level)
• 101 conversations
• 673 tweets
30
31. Graphical Representation of SDTM
3 sets of topics, one for G, M, and H levels
By using a topic model, we can !
-classify the levels of disclosure!
-discover topics associated with each level!
-generalize to other social media sites using the same set of seed words
32. Seed Words
• Medium level: frequent trigrams for personally identifiable
information
!
!
!
!
• High level: automatically extracted from sixbillionsecrets Website
32
45. Future directions
• Develop model for prediction of language choice in bilinguals
• Look at how English is used throughout the world
• Cognitive studies of first- and second- language
• Self-disclosure and relationship building
• Email me for data sharing, collaborating, discussing, …
• alice.oh@kaist.edu