Topic and text analysis for sentiment, emotion, and computational social science

6,322 views
6,072 views

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,322
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
36
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Topic and text analysis for sentiment, emotion, and computational social science

  1. 1. Topic and Text Analysis for Sentiment, Emotion, and Computational Social Science November 2012 Alice Oh alice.oh@kaist.edu Users & Information Lab http://uilab.kaist.ac.kr 1 Thursday, December 6, 2012
  2. 2. Overview • Topic modeling research • CIKM 2011: Distance-dependent Chinese restaurant franchise (ddCRF) • ICML 2012: Dirichlet process with random mixed measures (DP-MRM) • CIKM 2012: Recursive chinese restaurant process for modeling topic hierarchies (rCRP) • NIPS Big Learning Workshop 2012: Distributed Online Learning for Latent Dirichlet Allocation (DoLDA) • Computational social science research • WSDM 2011: Aspect sentiment unification model for online review analysis • ICWSM 2012: Social aspects of emotions in Twitter conversations • ACL 2012: Self-disclosure and relationship strength in Twitter conversations 2 Thursday, December 6, 2012
  3. 3. Do you feel what I feel? Social Aspects of Emotions in Twitter Conversations Suin Kim, JinYeong Bak, Alice Oh ICWSM 2012 3 Thursday, December 6, 2012
  4. 4. Asking Research Questions 4 Thursday, December 6, 2012
  5. 5. Asking Research Questions 4 Thursday, December 6, 2012
  6. 6. Asking Research Questions Human emotion is typically studied as a within-person, one-direction, non-repetitive phenomenon; focus has traditionally been on how one individual feels in reaction to various stimuli at a certain point of time. But people recognize and inevitably react emotionally and otherwise to expressions of emotion of other people. We propose that organizational dyads and groups inhabit emotion cycles: Emotions of an individual influence the emotions, thoughts and behaviors of others; others’ reactions can then influence their future interactions with the individual expressing the original emotion, as well as that individual’s future emotions and behaviors. People can mimic the emotions of others, thereby extending the social presence of a specific emotion, but can also respond to others’ emotions, extending the range of emotions present. 5 Thursday, December 6, 2012
  7. 7. Social Aspects of Emotions: Motivating Question How are our emotions affected by others we talk to? Thursday, December 6, 2012
  8. 8. Social Aspects of Emotions: Research Questions • How do we communicate our emotions? • Use a topic model on Twitter conversations to discover the “topics” that represent the eight emotions • Analyze the proportions of the total tweets for the emotions • How do we influence other people’s emotions? • Analyze the and emotion transitions of the tweets • Look for topics that change the emotions of the conversation partners • Find interesting patterns of emotion pairs Thursday, December 6, 2012
  9. 9. Social Aspects of Emotions: Data • Twitter conversation data: approx 220k dyads who “reply” to each other, 1,670k conversational chains ! "! #! $! %! Thursday, December 6, 2012
  10. 10. Seed Words (We Feel Fine by Harris & Kamvar) anticipation hope wait await inspir excit bore readi expect nervou calm motiv prepar certain anxiou optimist forese joy awesom amaz wonder excit glad fine beauti high lucki super perfect complet special bless safe proud anger shit bitch ass mean damn mad jealou piss annoi angri upset moron rage screw stuck irrit surprise amaz wow wonder weird lucki differ awkward confus holi strang shock odd embarrass overwhelm astound astonish fear scare stress horror nervou terror alarm behind panic fear afraid desper threaten tens terrifi fright anxiou sadness sorri bad aw sad wrong hurt blue dead lost crush weak depress wors low terribl lone disgust sick wrong evil fat ugli horribl gross terribl selfish miser pathet disgust worthless aw asham fuck acceptance okai ok same alright safe lazi relax peac content normal secur complet numb fulfil comfort defeat Thursday, December 6, 2012
  11. 11. Dirichlet Forest Prior • Dirichlet Forest Prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • Dirichlet tree: Generalization of Dirichlet distribution • Knowledge is expressed using Must-link and Cannot-link primitives • Must-link (love, sweetheart) • Cannot-link (exciting, bored) 10 DF-LDA Thursday, December 6, 2012
  12. 12. Dirichlet Forest Prior • Dirichlet Forest Prior (Andrzejewski et al.) • Mixture of Dirichlet tree distribution • Dirichlet tree: Generalization of Dirichlet distribution • Knowledge is expressed using Must-link and Cannot-link primitives • Must-link (love, sweetheart) • Cannot-link (exciting, bored) 10 q β η DF-LDA Thursday, December 6, 2012
  13. 13. Domain Knowledge in Dirichlet Forest Prior 11 Seed Words anticipation hope wait await inspir excit bore readi expect nervou calm motiv prepar certain anxiou optimist forese joy awesom amaz wonder excit glad fine beauti high lucki super perfect complet special bless safe proud anger shit bitch ass mean damn mad jealou piss annoi angri upset moron rage screw stuck irrit surprise amaz wow wonder weird lucki differ awkward confus holi strang shock odd embarrass overwhelm astound astonish fear scare stress horror nervou terror alarm behind panic fear afraid desper threaten tens terrifi fright anxiou sadness sorri bad aw sad wrong hurt blue dead lost crush weak depress wors low terribl lone disgust sick wrong evil fat ugli horribl gross terribl selfish miser pathet disgust worthless aw asham fuck acceptance okai ok same alright safe lazi relax peac content normal secur complet numb fulfil comfort defeat Must-link within a class Cannot-link between classes Thursday, December 6, 2012
  14. 14. Dirichlet Forest vs. Dirichlet 12 Fear DF-LDA don’t think but know why even wanna care worry understand Fear LDA good exam lol luck just school haha i’m xx worry tomorrow Surprise DF-LDA that very really cool wow wonder just some differ amazing Surprise LDA just rt holy got thank did shit new love lol awesome buy oh Sadness DF-LDA bad my real feel life aw sad kill lost dead hurt wrong sick Sadness LDA lol just know sorry isn’t oh tweet did haha don’t thought think Thursday, December 6, 2012
  15. 15. Emotion Topics How do we express emotions? JoyAnticipation Anger Topic 114 omg love haha thank really Topic 107 love thank follow wow Topic 159 good day hope morning thank Topic 158 love thank miss hug Topic 125 hope better feel thank soon Topic 26 good thank hope miss Topic 146 come wait week day june Topic 146 good day time work Topic 131 lmao fuck ass bitch shit Topic 4 ass yo lmao nigga Topic 19 lmao shit damn fuck oh Topic 13 shit nigga smh yea Fear Topic 48 omg oh lmao shit scare Topic 78 happen heart attack hospital Topic 27 don’t come night sleep outside Topic 140 time got work day Surprise Topic 172 yeag know think true funny Topic 89 know don’t think look Topic 15 think don’t know make really Topic 94 haha dont think really 29 70 21 14 5 Sadness Disgust Topic 6 oh sorry haha know didnt Topic 59 hurt got good bad pain Topic 106 tweet reply didn’t read sorry Topic 155 oh really make feel Topic 116 oh fuck don’t ye ew Topic 116 look haha oh know Topic 22 don’t oh think yeah lmao Topic 174 don’t think say people Acceptance Topic 43 ok oh thank cool okay Topic 102 know try let ok Topic 199 xx thank good okay follow Topic 8 night love good sleep 17 7 18 Neutral Topic 180 com www http check youtube Topic 156 twitter facebook people account Topic 184 account google app work email Topic 67 food chicken cook rt 19 13 Thursday, December 6, 2012
  16. 16. Emotion Topics How do we express emotions? JoyAnticipation Topic 114 omg love haha thank really Topic 107 love thank follow wow Topic 125 hope better feel thank soon Topic 26 good thank hope miss Sadness Topic 6 oh sorry know didnt Topic 59 hurt got good bad pain Neutral Topic 180 com www http check youtube Topic 156 twitter facebook people account GreetingCaring Sympathy IT/Tech 14 Thursday, December 6, 2012
  17. 17. Emotion Transitions Plutchik’s Wheel of Emotions Joy 39.7% 0.51 Acceptance 10.4% 0.23 Fear 2.6% 0.11 Surprise 7.4% 0.17 Anticipation 15.1% 0.26 Disgust 2.9% 0.11 Sadness 9.1% 0.19 0.31 Anger 12.8% 0.37 0.33 0.32 0.31 0.33 0.21 0.34 0.15 0.14 0.13 0.15 15 Thursday, December 6, 2012
  18. 18. Defining “Influence” User A User B Having a tough day today. RIP Harrison. I’ll miss you a ton :/ Just pray about it. God will help you. Not really religious, but thanks man. :) If you need talk you know I’m here. Time (Sadness) (Acceptance) (Anticipation) 16 Thursday, December 6, 2012
  19. 19. Defining “Influence” emotion influencing tweet User A User B Having a tough day today. RIP Harrison. I’ll miss you a ton :/ Just pray about it. God will help you. Not really religious, but thanks man. :) If you need talk you know I’m here. Time (Sadness) (Acceptance) (Anticipation) 16 Thursday, December 6, 2012
  20. 20. Topic 117 tweet people don’t read post Topic 59 hurt got bad pain feel Emotion Influences What can you say to make your partner feel better? Joy → SadnessSadness → Joy Topic 18 wear look think love black Topic 24 love thank great new look Acceptance → Anger Topic 31 i’m got lmax shit da Topic 13 lmao shit nigga smh yea Greeting Sympathizing Swearing Complaining 17 Thursday, December 6, 2012
  21. 21. 0 0.075 0.15 0.225 0.3 Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral 0.041 0.0710.082 0.053 0.265 0.061 0.081 0.0420.051 Emotion Influence: Sadness to Joy Emotion Influence: Joy to Anger 0 0.1 0.2 0.3 0.4 Anticipation Joy Surprise Fear Anger Sadness Disgust Acceptance Neutral 0.211 0.230.2140.209 0.191 0.2370.253 0.358 0.273 Expressing Anger has 26.5% of chance of changing the partner’s emotion from Joy to Anger. 18 Expressing Joy has 35.8% of chance of changing the partner’s emotion from Sadness to Joy. Thursday, December 6, 2012
  22. 22. Outliers 19 A: Sorry to hear about your bags. If you would like us to get someone to contact you DM us your reference and contact number. B: it's on it's way to manch. If the woman on the check in desk in Miami hadn't been trying to be all smart! Been no problem. A: Sorry about that. Pleased to hear they located it quickly for you though. B: mistakes happen. Thursday, December 6, 2012
  23. 23. Analyzing Self-Disclosure Behaviors in Twitter Conversations Using Text Mining Techniques (Presented at ACL 2012) JinYeong Bak, Suin Kim, Alice Oh {jy.bak, suin.kim}@kaist.ac.kr, alice.oh@kaist.edu Department of Computer Science, KAIST Thursday, December 6, 2012
  24. 24. 2012-07-11 In social psychology } Degree of self-disclosure in a relationship depends on the strength of the relationship } Strategic self-disclosure can strengthen the relationship Introduction 21 I like you too! You’re my best friend! Thursday, December 6, 2012
  25. 25. 2012-07-11 Hypothesis 22 Twitter conversations also show a similar pattern } Dyads with high relationship strength show more self-disclosure behavior } Dyads with low relationship strength show less self-disclosure behavior I like you too! You’re my best friend! Hello~ Hi Thursday, December 6, 2012
  26. 26. 2012-07-11 Methodology } Twitter Data } 131K users } 2M conversations } Relationship Strength } Chain frequency (CF) } Chain length (CL) } Self-Disclosure } Personal information } Open communication } Profanity } Analysis with Topic Models } Latent Dirichlet allocation (LDA, [Blei, JMLR 2003]) } Aspect and sentiment unification model (ASUM, [Jo,WSDM 2011]) 23 Thursday, December 6, 2012
  27. 27. 2012-07-11 Twitter Conversation } A Twitter conversation chain } 3 or more tweets } at least one reply by each user } Our Twitter conversation data } Oct 2011 to Dec 2011 } 131K users } 2M chains } 11M tweets 24 https://twitter.com/#!/britneyspears Example of a conversation chain Thursday, December 6, 2012
  28. 28. 2012-07-11 Relationship Strength } Social psychology literature states relationship strength can be measured by communication frequency and length [Granovetter, 1973; Levin and Cross, 2004] } CF: chain frequency } The number of conversational chains between the dyad averaged per month } CL: chain length } The length of conversational chains between the dyad averaged per month } Relationship strength } A high CF or CL for a dyad means the relationship is strong } A low CF or CL for a dyad means the relationship is weak 25 Thursday, December 6, 2012
  29. 29. 2012-07-11 Self-Disclosure } Open communication - Openness } Negative openness } Nonverbal openness } Emotional openness } Receptive openness – difficult to find in tweets } General-style openness – not clearly defined in the literature } Personal Information } Personally Identifiable Information (PII) } Personally Embarrassing Information (PEI) } Profanity } nigga, ass, wtf, lmao 26 Thursday, December 6, 2012
  30. 30. 2012-07-11 Negative openness } Method } We use ASUM with emoticons as seed words [ “Aspect and sentiment unification model for online review analysis”, Jo,WSDM’11] } ASUM is LDA-based joint model of topic and sentiment } ASUM takes unannotated data and classifies each sentence (tweet) as positive/negative/neutral Self-Disclosure - Openness 27 Thursday, December 6, 2012
  31. 31. 2012-07-11 Self-Disclosure - Openness Nonverbal openness } Method } We look for emoticons,‘lol’,‘xxx’ } Emoticons are like facial expressions -- :) :( :P } ‘lol’ (laughing out loud) and ‘xxx’ (kisses) are very frequently used in a similar manner to nonverbal openness 28 Thursday, December 6, 2012
  32. 32. 2012-07-11 Self-Disclosure - Openness Emotional openness } Method } Look for tweets that contain common expressions of feeling words [We feel fine (Harris, J, 2009)] 29 Thursday, December 6, 2012
  33. 33. 2012-07-11 Self-Disclosure – Personal Information Personally Identifiable Information (PII) Personally Embarrassing Information (PEI) 30 Ex) name, location, email address, job, social security number Ex) clinical history, sexual life, job loss, family problem Thursday, December 6, 2012
  34. 34. 2012-07-11 Self-Disclosure – Personal Information }   31 Thursday, December 6, 2012
  35. 35. 2012-07-11 Self-Disclosure – Personal Information Example of PII, PEI and Profanity topics } Shown by high probability words in each topic PII 1 PII 2 PEI 1 PEI 2 PEI 3 Profanity san tonight pants teeth family nigga live time wear doctor brother lmao state tomorrow boobs dr sister shit texas good naked dentist uncle ass south ill wearing tooth cousin bitch 32 Thursday, December 6, 2012
  36. 36. 2012-07-11 Results Thursday, December 6, 2012
  37. 37. 2012-07-1134 weak ßà strong weak ßà strong weak ßà strong weak ßà strong sentiment nonverbal emotional profanity PII & PEI Thursday, December 6, 2012
  38. 38. 2012-07-1135 weak ßà strong weak ßà strong emotional PII & PEI weak ßà strong weak ßà strong Thursday, December 6, 2012
  39. 39. 2012-07-11 Results: Interpretation } Emotional openness } When they are not very close, they express frequent encouragements, or polite reactions to baby or pets 36 Thursday, December 6, 2012
  40. 40. 2012-07-11 Results: Interpretation } PII } When they meet new acquaintances, they use PII to introduce themselves 37 Thursday, December 6, 2012
  41. 41. 2012-07-11 Results Analyzing outliers: a dyad linked weakly but shows high self- disclosure 38 Thursday, December 6, 2012
  42. 42. Distributed Online Learning for Latent Dirichlet Allocation JinYeong Bak, Dongwoo Kim, and Alice Oh NIPS 2012 Workshop on Big Learning 39 Thursday, December 6, 2012
  43. 43. Motivation • Problem 1: Inference for LDA takes a long time • Problem 2: Continuously expanding corpus necessitates continuous updates of model parameters • But updating of model parameters is not possible with plain LDA • Must re-train with the entire updated corpus • Solution to 1: Distributed inference shortens inference time (Newman JMLR 2009, Wang WWW 2012) • Solution to 2: Online (batch) learning enables updates to model parameters (Hoffman NIPS 2010) • Our Approach: Combine distributed inference and online learning 40 Thursday, December 6, 2012
  44. 44. Distributed Online LDA • Based on variational inference • Mini-batch updates via stochastic learning (variational EM) • Distribute variational EM using MapReduce 41 Thursday, December 6, 2012
  45. 45. Experimental Setup • Data: 5.1M Twitter conversations • 4.8M English Wikipedia articles • 60 node Hadoop system • Each node with 8 x 2.30GHz cores 42 Thursday, December 6, 2012
  46. 46. Wikipedia Results 43 Topic 0 Topic 22 Topic 42 Topic 65 Topic 94 Topic 170 Topic 232 relativity physics einstein quantum gravity channel television tv cable news milk chocolate sugar food cream god bible moses chapter genesis party election president member elected season team league game football album song band music released Minibatch oLDA DoLDA Speedup 16,384 238666.25 47994.03 4.97 32,768 188508.71 33470.03 5.63 65,536 206290.27 26788.53 7.70 Thursday, December 6, 2012
  47. 47. Twitter Temporal Patterns of Topics 44 Conversation b1 on November 2, 2010 A I wish I could vote today, but I have to work for 14 hours B is it legal for them not to give you time off to vote? A probably Conversation b2 on March 31, 2012 A Mitt Romney: "Obama should release the notes and transcripts of all his meetings with world leaders" B Why is he being held to higher standard than any other president. A did you see my Santorum 'slip' tweet? Is the media afraid to comment on it? B oh yes I did. I saw it mentioned yesterday also. disgusting and he should be raked over hot coals for it. 0.005 0.010 0.015 10−10 11−01 11−04 11−07 11−10 12−01 12−04 Day Documentproportion 0.004 0.006 0.008 0.010 0.012 11−07 11−10 12−01 Day Documentproportion Conversation c1 on September 5, 2011 A Oh god, miss Waite ran over to me up the school just now! :L on the plus subjects are now picked! :D B what did you pick?? A english, RE, art and psychology! :) was unsure between history and psych but found out bubbles was teaching it so nooo! :L Conversation c2 on October 12, 2011 A :) My day's been okay! It feels long! But school' was okayish. I hope you have an awesome day! :D B that's good then! Ahh hope it's not cause anything bad happened? Thanks! Have a great sleep :) A no! Class was just boring lol and thanks! :) i will! Even though i have to wake up early tomorrow for a midterm! :S <Topic words: party vote people politics obama> <Topic words: school mate class teacher grade> Thursday, December 6, 2012
  48. 48. CAVEAT 45 Big Data, social media data, do not always get the right answers! They contain much noise and much bias. Sentiment analysis is also full of problems at the big data-level because every small assumption can turn out to cause wide swings in the final interpretation of the data. They are valuable because they have opened up possibilities for analyses of naturally-occurring data in huge amounts. We need better methods and tools that are tailored for social media. We need to ask the right questions that can be answered well despite the biases of the social media data. Thursday, December 6, 2012
  49. 49. For details, visit our webpage: http://uilab.kaist.ac.kr Or email me: alice.oh@kaist.edu Thursday, December 6, 2012

×