Topic Models & Computational Social Science
October 17, 2013
Alice Oh
alice.oh@kaist.edu
aoh@seas.harvard.edu
http://uilab...
What is topic modeling?

Thursday, October 17, 2013
Blei, Communications of the ACM, 2012
Thursday, October 17, 2013
Motivation

Thursday, October 17, 2013
Motivation
• What are the topics discussed in the article?
• Is the article related to
• household finances?
• price of gas...
http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?hp

nascar, races, track, raceway, race, cars, fuel, aut...
nascar, races, track, raceway, race, cars, fuel, auto, racing
economic, slowdown, sales, recession, costs, spending, save
...
nascar, races, track, raceway, race, cars, fuel, auto, racing
economic, slowdown, sales, recession, costs, spending, save
...
http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?

nascar, races, track, raceway, race, cars, fuel, auto,...
http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?

nascar, races, track, raceway, race, cars, fuel, auto,...
http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?

nascar, races, track, raceway, race, cars, fuel, auto,...
Input to LDA

8
Thursday, October 17, 2013
Input to LDA

http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?

8
Thursday, October 17, 2013
Topics Discovered by LDA
nascar

0.12

spending

0.09

sports

0.12

races

0.10

economic

0.07

team

0.11

cars

0.10

...
Graphical View

10
Thursday, October 17, 2013
Graphical View

Observed
sales xxx slowdown
recession cars races
spending xxx save
costs fuel
10
Thursday, October 17, 201...
Graphical View
Discovered

Topic Distributions

Observed
Discovered
nascar, races, track, raceway, race, cars, fuel, auto,...
Do you feel what I feel?
Social Aspects of Emotions in Twitter Conversations
Suin Kim, JinYeong Bak, Alice Oh
ICWSM 2012

...
Twitter conversation data
• Twitter conversation data: approx 220k dyads who “reply” to each other,
1,670k conversational ...
Asking Research Questions

13
Thursday, October 17, 2013
Asking Research Questions

13
Thursday, October 17, 2013
Asking Research Questions
Human emotion is typically studied as a within-person, one-direction,
non-repetitive phenomenon;...
Topic model with a twist
•

Dirichlet forest prior (Andrzejewski et al.)

•

Mixture of Dirichlet tree distribution
•

•

...
Topic model with a twist
•

Dirichlet forest prior (Andrzejewski et al.)

•

Mixture of Dirichlet tree distribution
•

•

...
Domain knowledge in Dirichlet forest prior
Seed Words
joy
awesom
amaz
wonder
excit
glad
fine
beauti
high
lucki
super
perfe...
Anticipation
Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss

29

Topic 146
come
wait
week
day
june
To...
Anticipation

Joy

Sadness

Neutral

Topic 125
hope
better
feel
thank
soon
Topic 26
good
thank
hope
miss

Topic 114
omg
lo...
A (Love): @amithpr @dhempe @OperaIndia - Would you have any update on
@mrunmaiy's health - hope she is recovering well?
B ...
Joy
39.7%

0.34

0.26
Anticipation
15.1%

0.51
0.23

0.31
Acceptance
10.4%

0.13
0.14

0.32

0.21

0.15

0.37

0.11
Fear

...
Defining “Influence”

User A
User B

Having a tough day
Not really religious,
today. RIP Harrison. I’ll
but thanks man. :)
m...
Defining “Influence”

User A
User B

Having a tough day
Not really religious,
today. RIP Harrison. I’ll
but thanks man. :)
m...
Disgust → Joy

Sadness → Joy

Acceptance → Anger

Topic 61
watch
new
live
tv
tonight
Topic 63
watch
good
think
know
look

...
Self-disclosure and relationship strength in online
conversations
JinYeong Bak, Suin Kim, and Alice Oh
ACL 2012

23
Thursd...
Methodology
}

Twitter Data
}
}

}

Relationship Strength
}
}

}

Chain frequency (CF)
Chain length (CL)

Self-Disc...
Relationship Strength
} Social

psychology literature states relationship strength can be
measured by communication frequ...
Self-Disclosure
}

Open communication - Openness
}
}
}
}
}

}

Personal Information
}
}

}

Negative openness
No...
Self-Disclosure - Openness
Negative openness

}

Method
We use ASUM with emoticons as seed words
[ “Aspect and sentiment ...
Self-Disclosure - Openness
Nonverbal openness

}

Method
We look for emoticons, ‘lol’, ‘xxx’
} Emoticons are like facial...
Self-Disclosure - Openness
Emotional openness

}

Method
}

Look for tweets that contain common expressions of feeling w...
Self-Disclosure – Personal Information
Personally Identifiable Information (PII)
Ex) name, location,
email address, job,
s...
Self-Disclosure – Personal Information
}  

31
Thursday, October 17, 2013

2012-07-11
Self-Disclosure – Personal Information
Example of PII, PEI and Profanity topics
}

Shown by high probability words in eac...
Results

2012-07-11
Thursday, October 17, 2013
sentiment

nonverbal

emotional

profanity

PII & PEI

weak ßà

strong

weak ßà

strong

weak ßà

strong

weak ßà
...
emotional

PII & PEI

weak ßà

Thursday, October 17, 2013

weak ßà

strong

weak ßà

35

strong

strong

weak ßà

...
Results: Interpretation
} Emotional
}

openness

When they are not very close, they express frequent encouragements,
or ...
Results: Interpretation
} PII
}

When they meet new acquaintances, they use PII to introduce
themselves

37
Thursday, Oc...
Results
Analyzing outliers: a dyad linked weakly but shows high selfdisclosure

38
Thursday, October 17, 2013

2012-07-11
Computational Analysis of Agenda Setting Theory
Yeooul Kim and Alice Oh
alice.oh@kaist.edu

Thursday, October 17, 2013
Agenda Setting Theory
Thursday, October 17, 2013

How does media affect the
thoughts of the audience?
Agenda Setting Theory (McCombs & Shaw, 1972)
• Media affects audiences by having an influence on
• What to think about
• Ho...
Limitation of Traditional Media Studies
1.Use of traditional off-line newspapers and TV as target media
• Analysis is limi...
Computational Analysis of Agenda Setting Theory
1.Use of traditional off-line newspapers and TV as target media
• Crawl on...
AUDIENCE’S BEHAVIOR

Gay	
  marriage

COMMENT

SHARE

44
Thursday, October 17, 2013
AUDIENCE’S BEHAVIOR

Gay	
  marriage

COMMENT

SHARE

44
Thursday, October 17, 2013
DATA STATISTICS
2011.01 – 2013.04
Section

#Articles

#Comments

#Commenters

#Shares

Politics

1,863

174,680

14,106

2...
Issue Detection using HDP
Section

Issue (Labeled by using Mturk)

#Articles

Politics

presidential election
infringement...
▶ Effects from media exposure

CORRELATION IN ISSUE

47
Thursday, October 17, 2013
Contentious Issues

48
Thursday, October 17, 2013
Contentious Issues

49
Thursday, October 17, 2013
Content Polarity & Audience Behavior
INFLUENTIAL FACTOR
Tone (Polarity) of article
GOAL
Identify the effects of article to...
ARTICLE POLARITY

	
  

51
Thursday, October 17, 2013
DETECTED POS./NEG. WORDS
BUSINESS
Positive
joined
viral
smoothly
better
balance
respect
forward
empower
fair
moderate

Neg...
Positive and Negative Articles

53
Thursday, October 17, 2013
For more information
David	
  Blei’s	
  homepage:
h2p://www.cs.princeton.edu/~blei/
David	
  Mimno’s	
  bibliography:
h2p:...
Upcoming SlideShare
Loading in...5
×

Boston Dataswap Topic Modeling by Alice Oh

3,283

Published on

Published in: Sports, Automotive
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,283
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Boston Dataswap Topic Modeling by Alice Oh

  1. 1. Topic Models & Computational Social Science October 17, 2013 Alice Oh alice.oh@kaist.edu aoh@seas.harvard.edu http://uilab.kaist.ac.kr/members/aliceoh/ Thursday, October 17, 2013
  2. 2. What is topic modeling? Thursday, October 17, 2013
  3. 3. Blei, Communications of the ACM, 2012 Thursday, October 17, 2013
  4. 4. Motivation Thursday, October 17, 2013
  5. 5. Motivation • What are the topics discussed in the article? • Is the article related to • household finances? • price of gasoline? • price of Apple stock? • How would you build an automatic system for answering these questions? Thursday, October 17, 2013
  6. 6. http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?hp nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition 6 Thursday, October 17, 2013
  7. 7. nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topics: multinomial over words Thursday, October 17, 2013
  8. 8. nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
  9. 9. http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
  10. 10. http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
  11. 11. http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
  12. 12. Input to LDA 8 Thursday, October 17, 2013
  13. 13. Input to LDA http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? 8 Thursday, October 17, 2013
  14. 14. Topics Discovered by LDA nascar 0.12 spending 0.09 sports 0.12 races 0.10 economic 0.07 team 0.11 cars 0.10 recession 0.06 game 0.10 racing 0.09 save 0.05 player 0.10 track 0.08 money 0.05 athlete 0.09 speed 0.06 cut 0.04 win 0.07 ... money ... 0.002 speed ... 0.003 nascar 0.001 Topics: multinomial over vocabulary 9 Thursday, October 17, 2013
  15. 15. Graphical View 10 Thursday, October 17, 2013
  16. 16. Graphical View Observed sales xxx slowdown recession cars races spending xxx save costs fuel 10 Thursday, October 17, 2013
  17. 17. Graphical View Discovered Topic Distributions Observed Discovered nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topics: multinomial over words Thursday, October 17, 2013 Topics sales xxx slowdown recession cars races spending xxx save costs fuel 10
  18. 18. Do you feel what I feel? Social Aspects of Emotions in Twitter Conversations Suin Kim, JinYeong Bak, Alice Oh ICWSM 2012 11 Thursday, October 17, 2013
  19. 19. Twitter conversation data • Twitter conversation data: approx 220k dyads who “reply” to each other, 1,670k conversational chains (We now have about 5x this amount) ! "! $! #! %! Thursday, October 17, 2013
  20. 20. Asking Research Questions 13 Thursday, October 17, 2013
  21. 21. Asking Research Questions 13 Thursday, October 17, 2013
  22. 22. Asking Research Questions Human emotion is typically studied as a within-person, one-direction, non-repetitive phenomenon; focus has traditionally been on how one individual feels in reaction to various stimuli at a certain point of time. But people recognize and inevitably react emotionally and otherwise to expressions of emotion of other people. We propose that organizational dyads and groups inhabit emotion cycles: Emotions of an individual influence the emotions, thoughts and behaviors of others; others’ reactions can then influence their future interactions with the individual expressing the original emotion, as well as that individual’s future emotions and behaviors. People can mimic the emotions of others, thereby extending the social presence of a specific emotion, but can also respond to others’ emotions, extending the range of emotions present. 14 Thursday, October 17, 2013
  23. 23. Topic model with a twist • Dirichlet forest prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • • Dirichlet tree: Generalization of Dirichlet distribution Knowledge is expressed using Must-link and Cannot-link primitives • Must-link(love, sweetheart) • Cannot-link(exciting, bored) 15 Thursday, October 17, 2013 DF-LDA
  24. 24. Topic model with a twist • Dirichlet forest prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • • Dirichlet tree: Generalization of Dirichlet distribution Knowledge is expressed using Must-link and Cannot-link primitives • Must-link(love, sweetheart) • Cannot-link(exciting, bored) β q η 15 Thursday, October 17, 2013 DF-LDA
  25. 25. Domain knowledge in Dirichlet forest prior Seed Words joy awesom amaz wonder excit glad fine beauti high lucki super perfect complet special bless safe proud sadness anticipation surprise acceptance disgust sorri bad aw sad wrong hurt blue dead lost crush weak depress wors low terribl lone hope wait await inspir excit bore readi expect nervou calm motiv prepar certain anxiou optimist forese amaz wow wonder weird lucki differ awkward confus holi strang shock odd embarrass overwhelm astound astonish okai ok same alright safe lazi relax peac content normal secur complet numb fulfil comfort defeat Must-link within a class fear shit bitch ass mean damn mad jealou piss annoi angri upset moron rage screw stuck irrit scare stress horror nervou terror alarm behind panic fear afraid desper threaten tens terrifi fright anxiou Cannot-link between classes 16 Thursday, October 17, 2013 sick wrong evil fat ugli horribl gross terribl selfish miser pathet disgust worthless aw asham fuck anger
  26. 26. Anticipation Topic 125 hope better feel thank soon Topic 26 good thank hope miss 29 Topic 146 come wait week day june Topic 146 good day time work Sadness Topic 6 oh sorry haha know didnt Topic 59 hurt got good bad Joy Topic 114 omg love haha thank really Topic 107 love thank follow wow 17 Topic 106 tweet reply didn’t read sorry Topic 155 oh really make feel 70 Topic 159 good day hope morning thank Topic 158 love thank miss hug Anger Topic 131 lmao fuck ass bitch shit Topic 4 ass yo lmao nigga Disgust Topic 116 oh fuck don’t ye ew Topic 116 look haha oh know 7 Topic 22 don’t oh think yeah lmao Topic 174 don’t think say people 21 Topic 19 lmao shit damn fuck oh Topic 13 shit nigga smh yea Surprise Topic 172 yeag know think true funny Topic 89 know don’t think look Acceptance Topic 43 ok oh thank cool okay Topic 102 know try let ok Emotion Topics Topic 199 xx thank good okay follow Topic 8 night love good sleep 14 Topic 15 think don’t know make really Topic 94 haha dont think really 18 Fear Topic 48 omg oh lmao shit scare Topic 78 happen heart attack hospital 5 Topic 27 don’t come night sleep outside Topic 140 time got work day Neutral Topic 180 com www http check youtube Topic 156 twitter facebook people account 19 Topic 184 account google app work email Topic 67 food chicken cook rt How do we express emotions? 17 Thursday, October 17, 2013
  27. 27. Anticipation Joy Sadness Neutral Topic 125 hope better feel thank soon Topic 26 good thank hope miss Topic 114 omg love haha thank really Topic 107 love thank follow wow Topic 6 oh sorry haha know didnt Topic 59 hurt got good bad Topic 180 com www http check youtube Topic 156 twitter facebook people account Caring Greeting Sympathy Emotion Topics IT/Tech How do we express emotions? 18 Thursday, October 17, 2013
  28. 28. A (Love): @amithpr @dhempe @OperaIndia - Would you have any update on @mrunmaiy's health - hope she is recovering well? B (neut): @labnol @dhempe she is recovering but slow. The injury is on the spine therefore worrisome. Still in icu. A (Sadness): @amithpr thanks for the update.. extremely said to hear that news.. B (neut): @labnol #prayformrun She is a fighter and will come out of this B (neut): @AyeItsMeiMei just tell ur followers to report her for spam. then she'll be kicked off twitter A (Anger): @Jakeosaurous dude I didn't even do shit to her I'm just here tweeting & she calls me a ugly bitch? I was like oh wow thanks? B (neut): @AyeItsMeiMei yeah clearly shes so ugly she cant even use her real pic:P so dont feel bad A (Love): @Jakeosaurous haha. I don't care. She's getting spammed with hate. Hahaha. (": thanks though. B (neut): @AyeItsMeiMei np Emotion-tagged conversations Thursday, October 17, 2013 19
  29. 29. Joy 39.7% 0.34 0.26 Anticipation 15.1% 0.51 0.23 0.31 Acceptance 10.4% 0.13 0.14 0.32 0.21 0.15 0.37 0.11 Fear 2.6% Anger 12.8% 0.15 0.33 0.33 0.31 0.11 Disgust 2.9% Sadness 9.1% Emotion Transitions 0.19 Surprise 7.4% 0.17 Plutchik’s Wheel of Emotions 20 Thursday, October 17, 2013
  30. 30. Defining “Influence” User A User B Having a tough day Not really religious, today. RIP Harrison. I’ll but thanks man. :) miss you a ton :/ (Acceptance) (Sadness) Just pray about it. God will help you. (Anticipation) Time If you need talk you know I’m here. 21 Thursday, October 17, 2013
  31. 31. Defining “Influence” User A User B Having a tough day Not really religious, today. RIP Harrison. I’ll but thanks man. :) miss you a ton :/ (Acceptance) (Sadness) Just pray about it. God will help you. (Anticipation) Time If you need talk you know I’m here. emotion influencing tweet 21 Thursday, October 17, 2013
  32. 32. Disgust → Joy Sadness → Joy Acceptance → Anger Topic 61 watch new live tv tonight Topic 63 watch good think know look Topic 18 wear look think love black Topic 24 love thank great new look Topic 31 i’m got lmax shit da Topic 13 lmao shit nigga smh yea Suggesting Greeting Sympathy Swear words Emotion Influences Joy → Sadness Topic 117 tweet people don’t read post Topic 59 hurt got bad pain feel Anticipation → Surprise Topic 96 music listen play song good Topic 178 follow tweet people twitter thank Complaining What can you say to make your partner feel better? 22 Thursday, October 17, 2013
  33. 33. Self-disclosure and relationship strength in online conversations JinYeong Bak, Suin Kim, and Alice Oh ACL 2012 23 Thursday, October 17, 2013
  34. 34. Methodology } Twitter Data } } } Relationship Strength } } } Chain frequency (CF) Chain length (CL) Self-Disclosure } } } } 131K users 2M conversations Personal information Open communication Profanity Analysis with Topic Models } } Latent Dirichlet allocation (LDA, [Blei, JMLR 2003]) Aspect and sentiment unification model (ASUM, [Jo, WSDM 2011]) 24 Thursday, October 17, 2013 2012-07-11
  35. 35. Relationship Strength } Social psychology literature states relationship strength can be measured by communication frequency and length [Granovetter, 1973; Levin and Cross, 2004] } CF: chain frequency } The number of conversational chains between the dyad averaged per month } CL: chain } length The length of conversational chains between the dyad averaged per month } Relationship strength A high CF or CL for a dyad means the relationship is strong } A low CF or CL for a dyad means the relationship is weak } 25 Thursday, October 17, 2013 2012-07-11
  36. 36. Self-Disclosure } Open communication - Openness } } } } } } Personal Information } } } Negative openness Nonverbal openness Emotional openness Receptive openness – difficult to find in tweets General-style openness – not clearly defined in the literature Personally Identifiable Information (PII) Personally Embarrassing Information (PEI) Profanity } nigga, ass, wtf, lmao 26 Thursday, October 17, 2013 2012-07-11
  37. 37. Self-Disclosure - Openness Negative openness } Method We use ASUM with emoticons as seed words [ “Aspect and sentiment unification model for online review analysis”, Jo, WSDM’11] } ASUM is LDA-based joint model of topic and sentiment } ASUM takes unannotated data and classifies each sentence (tweet) as positive/negative/neutral } 27 Thursday, October 17, 2013 2012-07-11
  38. 38. Self-Disclosure - Openness Nonverbal openness } Method We look for emoticons, ‘lol’, ‘xxx’ } Emoticons are like facial expressions -- :) :( :P } ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a similar manner to nonverbal openness } 28 Thursday, October 17, 2013 2012-07-11
  39. 39. Self-Disclosure - Openness Emotional openness } Method } Look for tweets that contain common expressions of feeling words [We feel fine (Harris, J, 2009)] 29 Thursday, October 17, 2013 2012-07-11
  40. 40. Self-Disclosure – Personal Information Personally Identifiable Information (PII) Ex) name, location, email address, job, social security number Personally Embarrassing Information (PEI) Ex) clinical history, sexual life, job loss, family problem 30 Thursday, October 17, 2013 2012-07-11
  41. 41. Self-Disclosure – Personal Information }   31 Thursday, October 17, 2013 2012-07-11
  42. 42. Self-Disclosure – Personal Information Example of PII, PEI and Profanity topics } Shown by high probability words in each topic PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity san tonight pants teeth family nigga live time wear doctor brother lmao state tomorrow boobs dr sister shit texas good naked dentist uncle ass south ill wearing tooth cousin bitch 32 Thursday, October 17, 2013 2012-07-11
  43. 43. Results 2012-07-11 Thursday, October 17, 2013
  44. 44. sentiment nonverbal emotional profanity PII & PEI weak ßà strong weak ßà strong weak ßà strong weak ßà strong 34 Thursday, October 17, 2013 2012-07-11
  45. 45. emotional PII & PEI weak ßà Thursday, October 17, 2013 weak ßà strong weak ßà 35 strong strong weak ßà strong 2012-07-11
  46. 46. Results: Interpretation } Emotional } openness When they are not very close, they express frequent encouragements, or polite reactions to baby or pets 36 Thursday, October 17, 2013 2012-07-11
  47. 47. Results: Interpretation } PII } When they meet new acquaintances, they use PII to introduce themselves 37 Thursday, October 17, 2013 2012-07-11
  48. 48. Results Analyzing outliers: a dyad linked weakly but shows high selfdisclosure 38 Thursday, October 17, 2013 2012-07-11
  49. 49. Computational Analysis of Agenda Setting Theory Yeooul Kim and Alice Oh alice.oh@kaist.edu Thursday, October 17, 2013
  50. 50. Agenda Setting Theory Thursday, October 17, 2013 How does media affect the thoughts of the audience?
  51. 51. Agenda Setting Theory (McCombs & Shaw, 1972) • Media affects audiences by having an influence on • What to think about • How to think about it • Examples of traditional media studies • Media affects the outcome of presidential elections (Perloff and Krauss, 1985) • Media coverage influences the control of infectious diseases (Cui et al., 2008) • Tone of news articles affects the number of visitors to museums (Zyglidopoulos et al., 2012) Thursday, October 17, 2013
  52. 52. Limitation of Traditional Media Studies 1.Use of traditional off-line newspapers and TV as target media • Analysis is limited to a small volume over a short duration • Issues are arbitrarily chosen 2.Use of off-line MIP (Most Important Problems) surveys • Self-reports are not reliable • Only a small subset of the population can be surveyed 3.Use of manual coding for content analysis • You need experts • It is difficult to replicate and generalize to other domains Thursday, October 17, 2013
  53. 53. Computational Analysis of Agenda Setting Theory 1.Use of traditional off-line newspapers and TV as target media • Crawl online news to get several years’ data • Use machine learning to automatically discover the important issues 2.Use of off-line MIP (Most Important Problems) surveys • Look at counts of social media shares • Look at counts of user comments 3.Use of manual coding for content analysis • Use unsupervised machine learning to analyze content for tone (polarity) of articles and comments • Try it for different issues to see whether ML approach can generalize over many domains Thursday, October 17, 2013
  54. 54. AUDIENCE’S BEHAVIOR Gay  marriage COMMENT SHARE 44 Thursday, October 17, 2013
  55. 55. AUDIENCE’S BEHAVIOR Gay  marriage COMMENT SHARE 44 Thursday, October 17, 2013
  56. 56. DATA STATISTICS 2011.01 – 2013.04 Section #Articles #Comments #Commenters #Shares Politics 1,863 174,680 14,106 2,080,889 Business 2,043 130,921 17,791 3,657,544 Opinion 4,820 149,618 30,556 6,620,489 Sports 814 17,282 5,484 712,507 Technology 456 13,571 4,993 570,732 Science 945 50,113 11,114 4,709,041 World 3,673 134,572 14,882 3,534,637 Health 3,060 92,964 18,185 6,001,082 17,674 763,721 117,111 27,886,921 Total From http://www.npr.org/ 45 Thursday, October 17, 2013
  57. 57. Issue Detection using HDP Section Issue (Labeled by using Mturk) #Articles Politics presidential election infringement of human rights race for Washington government economics presidential campaigns and money candidate-marriage & immigration political viewpoints 575 195 167 274 163 261 157 Business economic decline under Obama employment and paid slavery agriculture banks and loan stock market and business housing market tax and business energy and finance new business and running 514 218 131 198 166 170 180 222 138 Health health care reform laws vaccination HIV and treatment medication healthcare and costs food and obesity sleep study and children food and safety health tech and new treatment mental health in families 349 189 496 197 224 245 210 223 125 117 Detected Issue list and the number of articles of each issue for three sections out of eight sections. 46 Thursday, October 17, 2013
  58. 58. ▶ Effects from media exposure CORRELATION IN ISSUE 47 Thursday, October 17, 2013
  59. 59. Contentious Issues 48 Thursday, October 17, 2013
  60. 60. Contentious Issues 49 Thursday, October 17, 2013
  61. 61. Content Polarity & Audience Behavior INFLUENTIAL FACTOR Tone (Polarity) of article GOAL Identify the effects of article tone, positive and negative, on the commenting and sharing behaviors of the audience 50 Thursday, October 17, 2013
  62. 62. ARTICLE POLARITY   51 Thursday, October 17, 2013
  63. 63. DETECTED POS./NEG. WORDS BUSINESS Positive joined viral smoothly better balance respect forward empower fair moderate Negative cutthroat axed lawsuit beating lose opposite battle unjust fuming sequester SCIENCE Positive fortunate cleanup essential credit safety comforting milestone learn gang dim Negative spill crude busted upset concern problems dark smash prize creating HEALTH Positive care respect admit clarify essential healthy repair benign hope repaired Negative tough severe emergency affected risk dying war spitting tricks abnormal SPORTS Positive victory won grace fun champion passion ace belief luck balance Negative chase shock busted beating defeat thwart lost alleged assault cockeyed OPINION Positive spectacular useful created prize confirm love sublime win confident mellow Negative weird fog distressing slam doubted fail wrong fears slippery peril TECHNOLOGY Positive best fancy easy help intelligence strong improve fit trust fame Negative blocks shabby shy wicked rash shaky mortal grave pity unfinished POLITICS Positive expert forward proud consent carol rights great worth integrity truth Negative ironic heinous arguing dick undo grinding outlaw meaningless theft lost WORLD Positive free respected support moderate consistent prompt afford gratitude joined affluent Negative tension protest heavy raging slam war crime oppress poverty poor The sets of positive and negative words obtained from model analysis for news articles. Words depending on sections differentiate positive and negative traits of each section. 52 Thursday, October 17, 2013
  64. 64. Positive and Negative Articles 53 Thursday, October 17, 2013
  65. 65. For more information David  Blei’s  homepage: h2p://www.cs.princeton.edu/~blei/ David  Mimno’s  bibliography: h2p://www.cs.princeton.edu/~mimno/topics.html videolectures.net  –  David  Blei,  Yee-­‐Whye  Teh,  Michael  Jordan Conferences:  NIPS,  ICML,  UAI,  ECML,  KDD,  EMNLP Tools:  Mallet,  GenSym,  various  LDA  libraries Email  me:  alice.oh@kaist.edu Thursday, October 17, 2013
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×