Boston Dataswap Topic Modeling by Alice Oh
Upcoming SlideShare
Loading in...5
×
 

Boston Dataswap Topic Modeling by Alice Oh

on

  • 2,750 views

 

Statistics

Views

Total Views
2,750
Slideshare-icon Views on SlideShare
2,746
Embed Views
4

Actions

Likes
0
Downloads
7
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Boston Dataswap Topic Modeling by Alice Oh Boston Dataswap Topic Modeling by Alice Oh Presentation Transcript

    • Topic Models & Computational Social Science October 17, 2013 Alice Oh alice.oh@kaist.edu aoh@seas.harvard.edu http://uilab.kaist.ac.kr/members/aliceoh/ Thursday, October 17, 2013
    • What is topic modeling? Thursday, October 17, 2013
    • Blei, Communications of the ACM, 2012 Thursday, October 17, 2013
    • Motivation Thursday, October 17, 2013
    • Motivation • What are the topics discussed in the article? • Is the article related to • household finances? • price of gasoline? • price of Apple stock? • How would you build an automatic system for answering these questions? Thursday, October 17, 2013
    • http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html?hp nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition 6 Thursday, October 17, 2013
    • nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topics: multinomial over words Thursday, October 17, 2013
    • nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
    • http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
    • http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
    • http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topic Distributions Thursday, October 17, 2013 Topics: multinomial over words
    • Input to LDA 8 Thursday, October 17, 2013
    • Input to LDA http://www.nytimes.com/2010/08/09/sports/autoracing/09nascar.html? 8 Thursday, October 17, 2013
    • Topics Discovered by LDA nascar 0.12 spending 0.09 sports 0.12 races 0.10 economic 0.07 team 0.11 cars 0.10 recession 0.06 game 0.10 racing 0.09 save 0.05 player 0.10 track 0.08 money 0.05 athlete 0.09 speed 0.06 cut 0.04 win 0.07 ... money ... 0.002 speed ... 0.003 nascar 0.001 Topics: multinomial over vocabulary 9 Thursday, October 17, 2013
    • Graphical View 10 Thursday, October 17, 2013
    • Graphical View Observed sales xxx slowdown recession cars races spending xxx save costs fuel 10 Thursday, October 17, 2013
    • Graphical View Discovered Topic Distributions Observed Discovered nascar, races, track, raceway, race, cars, fuel, auto, racing economic, slowdown, sales, recession, costs, spending, save fans, spectators, sports, leagues, teams, competition Topics: multinomial over words Thursday, October 17, 2013 Topics sales xxx slowdown recession cars races spending xxx save costs fuel 10
    • Do you feel what I feel? Social Aspects of Emotions in Twitter Conversations Suin Kim, JinYeong Bak, Alice Oh ICWSM 2012 11 Thursday, October 17, 2013
    • Twitter conversation data • Twitter conversation data: approx 220k dyads who “reply” to each other, 1,670k conversational chains (We now have about 5x this amount) ! "! $! #! %! Thursday, October 17, 2013
    • Asking Research Questions 13 Thursday, October 17, 2013
    • Asking Research Questions 13 Thursday, October 17, 2013
    • Asking Research Questions Human emotion is typically studied as a within-person, one-direction, non-repetitive phenomenon; focus has traditionally been on how one individual feels in reaction to various stimuli at a certain point of time. But people recognize and inevitably react emotionally and otherwise to expressions of emotion of other people. We propose that organizational dyads and groups inhabit emotion cycles: Emotions of an individual influence the emotions, thoughts and behaviors of others; others’ reactions can then influence their future interactions with the individual expressing the original emotion, as well as that individual’s future emotions and behaviors. People can mimic the emotions of others, thereby extending the social presence of a specific emotion, but can also respond to others’ emotions, extending the range of emotions present. 14 Thursday, October 17, 2013
    • Topic model with a twist • Dirichlet forest prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • • Dirichlet tree: Generalization of Dirichlet distribution Knowledge is expressed using Must-link and Cannot-link primitives • Must-link(love, sweetheart) • Cannot-link(exciting, bored) 15 Thursday, October 17, 2013 DF-LDA
    • Topic model with a twist • Dirichlet forest prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • • Dirichlet tree: Generalization of Dirichlet distribution Knowledge is expressed using Must-link and Cannot-link primitives • Must-link(love, sweetheart) • Cannot-link(exciting, bored) β q η 15 Thursday, October 17, 2013 DF-LDA
    • Domain knowledge in Dirichlet forest prior Seed Words joy awesom amaz wonder excit glad fine beauti high lucki super perfect complet special bless safe proud sadness anticipation surprise acceptance disgust sorri bad aw sad wrong hurt blue dead lost crush weak depress wors low terribl lone hope wait await inspir excit bore readi expect nervou calm motiv prepar certain anxiou optimist forese amaz wow wonder weird lucki differ awkward confus holi strang shock odd embarrass overwhelm astound astonish okai ok same alright safe lazi relax peac content normal secur complet numb fulfil comfort defeat Must-link within a class fear shit bitch ass mean damn mad jealou piss annoi angri upset moron rage screw stuck irrit scare stress horror nervou terror alarm behind panic fear afraid desper threaten tens terrifi fright anxiou Cannot-link between classes 16 Thursday, October 17, 2013 sick wrong evil fat ugli horribl gross terribl selfish miser pathet disgust worthless aw asham fuck anger
    • Anticipation Topic 125 hope better feel thank soon Topic 26 good thank hope miss 29 Topic 146 come wait week day june Topic 146 good day time work Sadness Topic 6 oh sorry haha know didnt Topic 59 hurt got good bad Joy Topic 114 omg love haha thank really Topic 107 love thank follow wow 17 Topic 106 tweet reply didn’t read sorry Topic 155 oh really make feel 70 Topic 159 good day hope morning thank Topic 158 love thank miss hug Anger Topic 131 lmao fuck ass bitch shit Topic 4 ass yo lmao nigga Disgust Topic 116 oh fuck don’t ye ew Topic 116 look haha oh know 7 Topic 22 don’t oh think yeah lmao Topic 174 don’t think say people 21 Topic 19 lmao shit damn fuck oh Topic 13 shit nigga smh yea Surprise Topic 172 yeag know think true funny Topic 89 know don’t think look Acceptance Topic 43 ok oh thank cool okay Topic 102 know try let ok Emotion Topics Topic 199 xx thank good okay follow Topic 8 night love good sleep 14 Topic 15 think don’t know make really Topic 94 haha dont think really 18 Fear Topic 48 omg oh lmao shit scare Topic 78 happen heart attack hospital 5 Topic 27 don’t come night sleep outside Topic 140 time got work day Neutral Topic 180 com www http check youtube Topic 156 twitter facebook people account 19 Topic 184 account google app work email Topic 67 food chicken cook rt How do we express emotions? 17 Thursday, October 17, 2013
    • Anticipation Joy Sadness Neutral Topic 125 hope better feel thank soon Topic 26 good thank hope miss Topic 114 omg love haha thank really Topic 107 love thank follow wow Topic 6 oh sorry haha know didnt Topic 59 hurt got good bad Topic 180 com www http check youtube Topic 156 twitter facebook people account Caring Greeting Sympathy Emotion Topics IT/Tech How do we express emotions? 18 Thursday, October 17, 2013
    • A (Love): @amithpr @dhempe @OperaIndia - Would you have any update on @mrunmaiy's health - hope she is recovering well? B (neut): @labnol @dhempe she is recovering but slow. The injury is on the spine therefore worrisome. Still in icu. A (Sadness): @amithpr thanks for the update.. extremely said to hear that news.. B (neut): @labnol #prayformrun She is a fighter and will come out of this B (neut): @AyeItsMeiMei just tell ur followers to report her for spam. then she'll be kicked off twitter A (Anger): @Jakeosaurous dude I didn't even do shit to her I'm just here tweeting & she calls me a ugly bitch? I was like oh wow thanks? B (neut): @AyeItsMeiMei yeah clearly shes so ugly she cant even use her real pic:P so dont feel bad A (Love): @Jakeosaurous haha. I don't care. She's getting spammed with hate. Hahaha. (": thanks though. B (neut): @AyeItsMeiMei np Emotion-tagged conversations Thursday, October 17, 2013 19
    • Joy 39.7% 0.34 0.26 Anticipation 15.1% 0.51 0.23 0.31 Acceptance 10.4% 0.13 0.14 0.32 0.21 0.15 0.37 0.11 Fear 2.6% Anger 12.8% 0.15 0.33 0.33 0.31 0.11 Disgust 2.9% Sadness 9.1% Emotion Transitions 0.19 Surprise 7.4% 0.17 Plutchik’s Wheel of Emotions 20 Thursday, October 17, 2013
    • Defining “Influence” User A User B Having a tough day Not really religious, today. RIP Harrison. I’ll but thanks man. :) miss you a ton :/ (Acceptance) (Sadness) Just pray about it. God will help you. (Anticipation) Time If you need talk you know I’m here. 21 Thursday, October 17, 2013
    • Defining “Influence” User A User B Having a tough day Not really religious, today. RIP Harrison. I’ll but thanks man. :) miss you a ton :/ (Acceptance) (Sadness) Just pray about it. God will help you. (Anticipation) Time If you need talk you know I’m here. emotion influencing tweet 21 Thursday, October 17, 2013
    • Disgust → Joy Sadness → Joy Acceptance → Anger Topic 61 watch new live tv tonight Topic 63 watch good think know look Topic 18 wear look think love black Topic 24 love thank great new look Topic 31 i’m got lmax shit da Topic 13 lmao shit nigga smh yea Suggesting Greeting Sympathy Swear words Emotion Influences Joy → Sadness Topic 117 tweet people don’t read post Topic 59 hurt got bad pain feel Anticipation → Surprise Topic 96 music listen play song good Topic 178 follow tweet people twitter thank Complaining What can you say to make your partner feel better? 22 Thursday, October 17, 2013
    • Self-disclosure and relationship strength in online conversations JinYeong Bak, Suin Kim, and Alice Oh ACL 2012 23 Thursday, October 17, 2013
    • Methodology } Twitter Data } } } Relationship Strength } } } Chain frequency (CF) Chain length (CL) Self-Disclosure } } } } 131K users 2M conversations Personal information Open communication Profanity Analysis with Topic Models } } Latent Dirichlet allocation (LDA, [Blei, JMLR 2003]) Aspect and sentiment unification model (ASUM, [Jo, WSDM 2011]) 24 Thursday, October 17, 2013 2012-07-11
    • Relationship Strength } Social psychology literature states relationship strength can be measured by communication frequency and length [Granovetter, 1973; Levin and Cross, 2004] } CF: chain frequency } The number of conversational chains between the dyad averaged per month } CL: chain } length The length of conversational chains between the dyad averaged per month } Relationship strength A high CF or CL for a dyad means the relationship is strong } A low CF or CL for a dyad means the relationship is weak } 25 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure } Open communication - Openness } } } } } } Personal Information } } } Negative openness Nonverbal openness Emotional openness Receptive openness – difficult to find in tweets General-style openness – not clearly defined in the literature Personally Identifiable Information (PII) Personally Embarrassing Information (PEI) Profanity } nigga, ass, wtf, lmao 26 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure - Openness Negative openness } Method We use ASUM with emoticons as seed words [ “Aspect and sentiment unification model for online review analysis”, Jo, WSDM’11] } ASUM is LDA-based joint model of topic and sentiment } ASUM takes unannotated data and classifies each sentence (tweet) as positive/negative/neutral } 27 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure - Openness Nonverbal openness } Method We look for emoticons, ‘lol’, ‘xxx’ } Emoticons are like facial expressions -- :) :( :P } ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a similar manner to nonverbal openness } 28 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure - Openness Emotional openness } Method } Look for tweets that contain common expressions of feeling words [We feel fine (Harris, J, 2009)] 29 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure – Personal Information Personally Identifiable Information (PII) Ex) name, location, email address, job, social security number Personally Embarrassing Information (PEI) Ex) clinical history, sexual life, job loss, family problem 30 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure – Personal Information }   31 Thursday, October 17, 2013 2012-07-11
    • Self-Disclosure – Personal Information Example of PII, PEI and Profanity topics } Shown by high probability words in each topic PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity san tonight pants teeth family nigga live time wear doctor brother lmao state tomorrow boobs dr sister shit texas good naked dentist uncle ass south ill wearing tooth cousin bitch 32 Thursday, October 17, 2013 2012-07-11
    • Results 2012-07-11 Thursday, October 17, 2013
    • sentiment nonverbal emotional profanity PII & PEI weak ßà strong weak ßà strong weak ßà strong weak ßà strong 34 Thursday, October 17, 2013 2012-07-11
    • emotional PII & PEI weak ßà Thursday, October 17, 2013 weak ßà strong weak ßà 35 strong strong weak ßà strong 2012-07-11
    • Results: Interpretation } Emotional } openness When they are not very close, they express frequent encouragements, or polite reactions to baby or pets 36 Thursday, October 17, 2013 2012-07-11
    • Results: Interpretation } PII } When they meet new acquaintances, they use PII to introduce themselves 37 Thursday, October 17, 2013 2012-07-11
    • Results Analyzing outliers: a dyad linked weakly but shows high selfdisclosure 38 Thursday, October 17, 2013 2012-07-11
    • Computational Analysis of Agenda Setting Theory Yeooul Kim and Alice Oh alice.oh@kaist.edu Thursday, October 17, 2013
    • Agenda Setting Theory Thursday, October 17, 2013 How does media affect the thoughts of the audience?
    • Agenda Setting Theory (McCombs & Shaw, 1972) • Media affects audiences by having an influence on • What to think about • How to think about it • Examples of traditional media studies • Media affects the outcome of presidential elections (Perloff and Krauss, 1985) • Media coverage influences the control of infectious diseases (Cui et al., 2008) • Tone of news articles affects the number of visitors to museums (Zyglidopoulos et al., 2012) Thursday, October 17, 2013
    • Limitation of Traditional Media Studies 1.Use of traditional off-line newspapers and TV as target media • Analysis is limited to a small volume over a short duration • Issues are arbitrarily chosen 2.Use of off-line MIP (Most Important Problems) surveys • Self-reports are not reliable • Only a small subset of the population can be surveyed 3.Use of manual coding for content analysis • You need experts • It is difficult to replicate and generalize to other domains Thursday, October 17, 2013
    • Computational Analysis of Agenda Setting Theory 1.Use of traditional off-line newspapers and TV as target media • Crawl online news to get several years’ data • Use machine learning to automatically discover the important issues 2.Use of off-line MIP (Most Important Problems) surveys • Look at counts of social media shares • Look at counts of user comments 3.Use of manual coding for content analysis • Use unsupervised machine learning to analyze content for tone (polarity) of articles and comments • Try it for different issues to see whether ML approach can generalize over many domains Thursday, October 17, 2013
    • AUDIENCE’S BEHAVIOR Gay  marriage COMMENT SHARE 44 Thursday, October 17, 2013
    • AUDIENCE’S BEHAVIOR Gay  marriage COMMENT SHARE 44 Thursday, October 17, 2013
    • DATA STATISTICS 2011.01 – 2013.04 Section #Articles #Comments #Commenters #Shares Politics 1,863 174,680 14,106 2,080,889 Business 2,043 130,921 17,791 3,657,544 Opinion 4,820 149,618 30,556 6,620,489 Sports 814 17,282 5,484 712,507 Technology 456 13,571 4,993 570,732 Science 945 50,113 11,114 4,709,041 World 3,673 134,572 14,882 3,534,637 Health 3,060 92,964 18,185 6,001,082 17,674 763,721 117,111 27,886,921 Total From http://www.npr.org/ 45 Thursday, October 17, 2013
    • Issue Detection using HDP Section Issue (Labeled by using Mturk) #Articles Politics presidential election infringement of human rights race for Washington government economics presidential campaigns and money candidate-marriage & immigration political viewpoints 575 195 167 274 163 261 157 Business economic decline under Obama employment and paid slavery agriculture banks and loan stock market and business housing market tax and business energy and finance new business and running 514 218 131 198 166 170 180 222 138 Health health care reform laws vaccination HIV and treatment medication healthcare and costs food and obesity sleep study and children food and safety health tech and new treatment mental health in families 349 189 496 197 224 245 210 223 125 117 Detected Issue list and the number of articles of each issue for three sections out of eight sections. 46 Thursday, October 17, 2013
    • ▶ Effects from media exposure CORRELATION IN ISSUE 47 Thursday, October 17, 2013
    • Contentious Issues 48 Thursday, October 17, 2013
    • Contentious Issues 49 Thursday, October 17, 2013
    • Content Polarity & Audience Behavior INFLUENTIAL FACTOR Tone (Polarity) of article GOAL Identify the effects of article tone, positive and negative, on the commenting and sharing behaviors of the audience 50 Thursday, October 17, 2013
    • ARTICLE POLARITY   51 Thursday, October 17, 2013
    • DETECTED POS./NEG. WORDS BUSINESS Positive joined viral smoothly better balance respect forward empower fair moderate Negative cutthroat axed lawsuit beating lose opposite battle unjust fuming sequester SCIENCE Positive fortunate cleanup essential credit safety comforting milestone learn gang dim Negative spill crude busted upset concern problems dark smash prize creating HEALTH Positive care respect admit clarify essential healthy repair benign hope repaired Negative tough severe emergency affected risk dying war spitting tricks abnormal SPORTS Positive victory won grace fun champion passion ace belief luck balance Negative chase shock busted beating defeat thwart lost alleged assault cockeyed OPINION Positive spectacular useful created prize confirm love sublime win confident mellow Negative weird fog distressing slam doubted fail wrong fears slippery peril TECHNOLOGY Positive best fancy easy help intelligence strong improve fit trust fame Negative blocks shabby shy wicked rash shaky mortal grave pity unfinished POLITICS Positive expert forward proud consent carol rights great worth integrity truth Negative ironic heinous arguing dick undo grinding outlaw meaningless theft lost WORLD Positive free respected support moderate consistent prompt afford gratitude joined affluent Negative tension protest heavy raging slam war crime oppress poverty poor The sets of positive and negative words obtained from model analysis for news articles. Words depending on sections differentiate positive and negative traits of each section. 52 Thursday, October 17, 2013
    • Positive and Negative Articles 53 Thursday, October 17, 2013
    • For more information David  Blei’s  homepage: h2p://www.cs.princeton.edu/~blei/ David  Mimno’s  bibliography: h2p://www.cs.princeton.edu/~mimno/topics.html videolectures.net  –  David  Blei,  Yee-­‐Whye  Teh,  Michael  Jordan Conferences:  NIPS,  ICML,  UAI,  ECML,  KDD,  EMNLP Tools:  Mallet,  GenSym,  various  LDA  libraries Email  me:  alice.oh@kaist.edu Thursday, October 17, 2013