Successfully reported this slideshow.
Your SlideShare is downloading. ×

2016 datascience emotion analysis - english version

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Natural Language processing
Natural Language processing
Loading in …3
×

Check these out next

1 of 59 Ad
Advertisement

More Related Content

Viewers also liked (18)

Similar to 2016 datascience emotion analysis - english version (20)

Advertisement

Recently uploaded (20)

Advertisement

2016 datascience emotion analysis - english version

  1. 1. Emotion Analysis for Big Data NTHU CS, Yi-Shin Chen
  2. 2. Hello! I am Yi-Shin Chen Currently in NTHU CS Intelligent Data Engineering and Application Lab (IDEA Lab) You can find me at: yishin@gmail.com 2
  3. 3. We Promote Diversity at More than 50 % students come from other countries Belize France St Lucia Honduras India China Japan Taiwan Indonesia São Tomé 3
  4. 4. 1. Why Emotion Analysis There are few personal reasons 4
  5. 5. “I don’t understand woman!! Their words are very vague and ambiguous” From Carlos Argueta, my first foreign Ph.D. graduate He’s the one to select the topic of sentiment analysis. And the first suffering from depression in our lab 5
  6. 6. Children are Bewildering They don't say and they cannot say. 6
  7. 7. 2. Emotion Analysis Let's see what others did/do 7
  8. 8. Natural Language Processing ▷Analyze Part-of-Speech (POS) tagging ▷Understand word meaning ▷Analyze the relationships between words  Need dictionaries & semantic relationships  Word positions affect statement meanings  Need different data for different languages This is the best thing happened in my life. Det. Det. NN PNPre.Verb VerbAdj Difficult 8
  9. 9. Data Mining/Machine Learning ▷Collect massive data ▷Manually annotate training data ▷Analyze data with classifiers  Recollect training data for different languages  Low recall rates (<<25%) Easier? 9
  10. 10. 3. Learning from Experience Difference between Reality and Practice 10
  11. 11. Emotion Embedded in Trivia ▷Most trivia are ignored in previous works • Stop Words are the first batch to be removed →E.g., often, above, again • Determiner, pronoun are usually ignored • Most nouns are considered unimportant My mom always said school is more important 😒 Angry 😂 Sad 👶 Joy 11
  12. 12. Emotional Mistakes ▷Mistakes everywhere • Some are careless →E.g., Luve you • Some are intentional →E.g., I’m soooooooo happppppy ▷Mistakes are not recorded in dictionaries • How to annotate mistakes? → Annotation cost A LOT! 12
  13. 13. Children are our mentors Mumbling from a mom ▷My one-year-old kid can detect my emotion • Without seeing my face • I did not change my tone • How come she is always right? ▷Guessing • She did not know grammar • She did not memorize any dictionary • My statements might have a lot of mistakes Goal Multi-lingual 13
  14. 14. 4. Overcome Challenges Insufficient Research Fund 14
  15. 15. Free Resources ▷Free Data • As long as they can be legally accessed ▷Open source software 15
  16. 16. Philosophy Slow Life ▷Our students are often delayed by various reasons ▷Not follow the trends • Usually against common sense in academic No POS Tagging No dictionary Multilingual 😱 Failure Success 16 POS Tagging Multiple dictionaries One language
  17. 17. Teamwork ▷Implementation team • Coding • More coding ▷Dreaming team • Reading papers • Design ▷Boasting team • Writing papers • Generating presentation ▷Anonymous 17
  18. 18. Crowdsourcing Merriam-Webster: Obtaining needed services, ideas, or content by soliciting contributions from a large group of people, especially an online community Cost $$$ 18
  19. 19. Subconscious Crowdsourcing ▷Crowdsourcing in subconscious • Free • Extract the subconscious from daily-life records → Ex1: “computers/companies/product-support/apple” in delicious tag → Ex2: “Trump” “Nickname generator” in search log → Ex3: “School day again #sad” in Twitter Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August, 2016 19
  20. 20. 5. Case1: Analyze Emotions from Text Utilize subconscious emotion patterns 20
  21. 21. Subconscious Emotion Big Data ▷Twitter, a good public source Throwing my phone always calms me down #anger My sister always makes things look much more worse than they seem >:[ #anger Why my brother always crabby !?!? #rude #youranadult #anger #issues WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger Im wanna crazy,if my life always sucks like this. #anger Hashtag and emoticon can represent emotion well; hence can be treated as annotated answers 21
  22. 22. Collect Emotion Data 22
  23. 23. Collect Emotion Data 23
  24. 24. Collect Emotion Data Wait! Need Control Group 24
  25. 25. Not-Emotion Data 25
  26. 26. Not-Emotion Data 26
  27. 27. Not-Emotion Data 27
  28. 28. Preprocessing Steps ▷Hints: Remove troublesome ones o Too short → Too short to get important features o Contain too many hashtags → Too much information to process o Are retweets → Increase the complexity o Have URLs → Too trouble to collect the page data o Convert user mentions to <usermention> and hashtags to <hashtag> → Remove the identification. We should not peek answers! Big Data anyway 28
  29. 29. Basic Guidelines ▷ Identify the common and differences between the experimental and control groups • Analyze the frequency of words → TF•IDF (Term frequency, inverse document frequency) • Analyze the co-occurrence between words/patterns → Co-occurrence • Analyze the importance between words → Centrality Graph 29
  30. 30. Graph Construction ▷Construct two graphs • E.g. →Emotion one: I love the World of Warcraft new game  → Not-emotion one: 3,000 killed in the world by ebola I of Warcraft new game WorldLove the 0.9 0.84 0.65 0.12 0.12 0.53 0.67  0.45 3,000 world by ebola the killed in 0.49 0.87 0.93 0.83 0.55 0.25 30
  31. 31. Graph Processes ▷Remove the common ones between two graphs • Leave the significant ones only appear in the emotion graph ▷Analyze the centrality of words • Betweenness, Closeness, Eigenvector, Degree, Katz → Can use the free/open software, e.g, Gaphi, GraphDB ▷Analyze the cluster degrees • Clustering Coefficient GraphKey patterns 31
  32. 32. Essence Only Only key phrases →emotion patterns 32
  33. 33. Ranking Emotion Patterns ▷ Ranking the emotion patterns for each emotion • Frequency, exclusiveness, diversity • One ranked list for each emotion SadJoy Anger 33
  34. 34. Emotion Pattern Samples SadJoy Anger finally * my tomorrow !!! * <hashtag> birthday .+ * yay ! :) * ! princess * * hehe prom dress * memories * * without my sucks * <hashtag> * tonight :( * anymore .. felt so * . :( * * :(( my * always shut the * teachers * people say * -.- * understand why * why are * with these * 34
  35. 35. Precision 35 Naïve Bayes SVM NRCWE Our Approach English 81.90% 76.60% 35.40% 81.20% Spanish 70.00% 52.00% 0.00% 80.00% French 72.00% 61.00% 0.00% 84.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00% Accuracy LIWC No LIWC
  36. 36. Feedback for Products 36
  37. 37. 商品喜好分析 37
  38. 38. 5. Case2:Analyze Emotion Status for individuals Who is bi-polar disorder? Who is borderline personal disorder? 38
  39. 39. Collect Patient Data 39 Support Group
  40. 40. Collect Patient Data 40 Followers
  41. 41. Collect Patient Data 41
  42. 42. Collect Patient Data 42
  43. 43. Collect Patient Data 43 Wait! Control Group Needed
  44. 44. Collect Data from Ordinary People 44
  45. 45. Collect Data from Ordinary People 45
  46. 46. Collect Data from Ordinary People 46
  47. 47. Basic Guidelines ▷ Identify the common and differences between the experimental and control groups • Word/pattern frequency • Emotion related data (e.g., flipping rates, occurrence rates) • Social interaction (e.g., retweet, reply) • Lifestyle (e.g., online time, stay-up or not) • Age and gender Features 47
  48. 48. Apply Classifiers ▷ By utilize the extracted features ▷ Various classifiers • Neural Networks • Naïve Bayes and Bayesian Belief Networks • Support Vector Machines • Random forest 48
  49. 49. Precisions 49
  50. 50. Possible Applications 50
  51. 51. Possible Applications 51
  52. 52. Possible Applications 52
  53. 53. Possible Applications 53
  54. 54. Election Analysis? 54
  55. 55. Election Analysis? 55
  56. 56. Election Analysis? 56
  57. 57. Election Analysis? 57
  58. 58. Election Analysis? 58
  59. 59. More in the future… Thank you. Contact me at: yishin@gmail.com

×