SlideShare a Scribd company logo
1 of 59
Ohio Center of Excellence in Knowledge-Enabled Computing
Automatic Emotion Identification
from Text
Wenbo Wang
Kno.e.sis Center
Advisor:
Dr. Amit P. Sheth
Committee members:
Dr. Keke Chen
Kevin Haas
Dr. T.K. Prasad
Dr. Ramakanth Kavuluru
Ph.D. Dissertation Defense
Ohio Center of Excellence in Knowledge-Enabled Computing 2Sadness
Anger
Fear
Joy
Your emotions are the slaves to your thoughts,
and you are the slave to your emotions.
--Elizabeth Gilbert
Ohio Center of Excellence in Knowledge-Enabled Computing 3
S&P 500 dropped 1% …
Jon C. Ogg, credit
Stock Market
Ohio Center of Excellence in Knowledge-Enabled Computing 4
Employee Productivity
Credit, credit
Ohio Center of Excellence in Knowledge-Enabled Computing 5
Subjective Well-being
Credit, credit
Happiness IndexECG
Physical State Emotional State
Ohio Center of Excellence in Knowledge-Enabled Computing 6
Ohio Center of Excellence in Knowledge-Enabled Computing 7
Ohio Center of Excellence in Knowledge-Enabled Computing
Emotion Identification
• Emotion
– “a strong feeling (such as love, anger, joy, hate, or fear)” --
Merriam-Webster Online Dictionary
• Emotion Identification
– the task of automatically identifying and extracting the
emotions expressed in a given text.
• Examples
8
“I hate when my mom compares me to my friends” -> Anger
“When I see a cop, no matter where I am or what I’m doing,
I always feel like every law I’ve broken is stamped all over
my body” -> Fear
Ohio Center of Excellence in Knowledge-Enabled Computing
Proposed Questions
• How to glean people’s emotions from their texts using machine
learning techniques?
• How to create large self-labeled emotion data from social media?
• How to improve emotion identification in target domains (e.g.,
blog, diary) by leveraging large self-labeled emotion data from
social media?
9
Ohio Center of Excellence in Knowledge-Enabled Computing
1. EMOTION CLASSIFICATION
10
Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-
grained Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing
Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International
Conference on Social Computing (SocialCom 2012)
Ohio Center of Excellence in Knowledge-Enabled Computing
Background - Classification
Credit: nltk 11
Ohio Center of Excellence in Knowledge-Enabled Computing
Dataset Description
• Suicide notes
– 15 fine-grained emotions
– Training: 4,633 sentences;
– Testing: 2086 sentences
• Twitter data
– 7 emotions
– Training: ~250 K tweets
– Testing: 250 K tweets
12
Ohio Center of Excellence in Knowledge-Enabled Computing
Suicide Notes Dataset
13
Sentence example:
“I loved you and was proud of
you.”
Unigrams: i, love, you, and,
be, proud, of, you, .
Bigrams: I love, love you, you
and, and be, be proud, proud
of, of you, you .
The combination of unigrams and
bigrams perform the best among n-gram
features.
Ohio Center of Excellence in Knowledge-Enabled Computing
Suicide Notes Dataset
14
Sentence example:
“I loved you and was proud of
you.”
LIWC Knowledge:
Posemo: 2 (love, proud)
Negemo: 0
Anger: 0
Sad: 0
Adding knowledge-based features
further increases the performance.
Ohio Center of Excellence in Knowledge-Enabled Computing
Suicide Notes Dataset
15
Sentence example:
“I loved you and was proud of
you .”
POS count:
Adjective: 1 (proud)
Noun: 0 ()
Pronoun: 3 (i, you)
…
Sentence tense:
Simple past tense: 2 (I loved,
was proud)
Adding sentence tenses and POS counts
further increases the performance
Ohio Center of Excellence in Knowledge-Enabled Computing
Twitter Dataset – Supervised Classifier
16
Applying only adjectives performs poorly because
emotions can be implicitly expressed in text.
Ohio Center of Excellence in Knowledge-Enabled Computing
Twitter Dataset – Supervised Classifier
17
The combination of unigrams and bigrams
perform the best among n-gram features.
Ohio Center of Excellence in Knowledge-Enabled Computing
Twitter Dataset – Supervised Classifier
18
Knowledge features and syntactic features
become less important on Twitter data.
Ohio Center of Excellence in Knowledge-Enabled Computing
Challenge: The Lack of Training Data
• Emotion annotation is typically time-consuming,
expensive and error-prone.
– multiple emotion categories
– subtle and ambiguous emotion expressions
– Human judgement of emotion tends to be subjective and
varied.
• Most of existing datasets are small, e.g.,
– Blog: 1,890 sentences (Aman and Szpakowicz 2008)
– Experience: 1,000 sentences (Neviarouskaya et. al. 2010)
– Diary: 700 sentences (Neviarouskaya et. al. 2011)
19
Ohio Center of Excellence in Knowledge-Enabled Computing
Why do We Need More Training Data? (I)
20
speech. The memory-based learner used only
the word before and word after as features.
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0.1 1 10 100 1000
Millions of Words
TestAccuracy
Memory-Based
Winnow
Perceptron
Naïve Bayes
Figure 1. Learning Curves for Confusion Set
Disambiguation
We collected a 1-billion-word training
corpus from a variety of English texts, including
“We may want to reconsider the
trade-off between spending
time and money on algorithm
development versus spending it
on corpus development”
-- (Banko and Brill 2001)
From (Banko and Brill 2001)
Ohio Center of Excellence in Knowledge-Enabled Computing
Why do We Need More Training Data? (II)
• Emotions arise in various situations, which leads to very
diverse expressions conveying the emotions.
21
“I hate when my mom compares me to my friends”
“When I see a cop, no matter where I am or what
I’m doing, I always feel like every law I’ve broken is
stamped all over my body”
“I hate when I get the hiccups in class”
“Omg I finally fit into one pair of my jeans from last
year!!”
“A dog barked at me!”
Ohio Center of Excellence in Knowledge-Enabled Computing
The Use of Hashtags on Twitter
22
“I hate when my mom compares me to my friends
#annoying”
“When I see a cop, no matter where I am or what
I’m doing, I always feel like every law I’ve broken is
stamped all over my body #nervous”
“I hate when I get the hiccups in class
#embarrassing”
“Omg I finally fit into one pair of my jeans from last
year!! #excited”
“A dog barked at me! #scared #weak”
Ohio Center of Excellence in Knowledge-Enabled Computing
2. SELF-LABELED DATA
CREATION
23
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing
Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International
Conference on Social Computing (SocialCom 2012)
Ohio Center of Excellence in Knowledge-Enabled Computing
Emotion Hashtags
• From existing psychology literature (Shaver et. al.
1987), collected 7 sets of emotion words for 7 different
emotions – joy, sadness, anger, love, fear,
thankfulness, and surprise.
24
Emotion Hashtag Word Examples Number of Tweets
Joy excited, happy, elated, proud (36) 706,182
Sadness sorrow, unhappy, depressing, lonely (36) 616,471
Anger irritating, annoyed, frustrate, fury (23) 574,170
Love affection, lovin, loving, fondness (7) 301,759
Fear fear, panic, fright, worry, scare (22) 135,154
Thankfulness thankfulness, thankful (2) 131,340
Surprise surprised, astonished, unexpected (5) 23,906
Total 131 2,488,982
Ohio Center of Excellence in Knowledge-Enabled Computing
Removing Irrelevant Tweets
25
Hashtag count > 2
Emotion hashtag is not at the end
Word count < 5
Has URL or quotations
About 5 million tweets -> 2,488,982 tweets
Ohio Center of Excellence in Knowledge-Enabled Computing
Results with Increasing Training Data
0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
26
0.4341
0.5292 Logistic Regression (LR)
Training instance: 1K -> 2M
Percentage gain = 51.05%
0.6557
LR
0.6156
Ohio Center of Excellence in Knowledge-Enabled Computing
Results with Increasing Training Data
0.4
0.45
0.5
0.55
0.6
0.65
1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184
accuracy
number of tweets in training data
LIBLINEAR
MNB
27
0.4580
0.5426
Multinomial Naive Bayes (MNB)
Training instance: 1K -> 2M
Percentage gain = 38.65%
0.6350
LR
0.6113
Ohio Center of Excellence in Knowledge-Enabled Computing
For three popular emotions (76.2% of the tweets), the classifier
achieves F-measures of over 64%
Detailed Results
28
Ohio Center of Excellence in Knowledge-Enabled Computing
Detailed Results
29
For three less popular emotions (22.8% of the tweets), the
precisions are relatively higher compared with the recalls, and
the F-measures are over 43%.
Ohio Center of Excellence in Knowledge-Enabled Computing
What Have We Learned?
• We can automatically create training datasets for
emotion identification by leveraging emotion hashtags
on Twitter.
– A large amount of labeled data are collected with little effort
and cost
– Covers a variety of situations that elicit emotions
– Performance gain with increasing size of training data
• However, there is still a lack of labeled data in many
other domains/data sources.
30
Ohio Center of Excellence in Knowledge-Enabled Computing
New Challenge
31
Lots of labeled tweets
Far less labeled data in
many other domains
Can we use emotion-labeled tweets to help emotion
identification in other domains?
Ohio Center of Excellence in Knowledge-Enabled Computing
3. DOMAIN ADAPTATION FOR
EMOTION IDENTIFICATION
32
Wenbo Wang, Lu Chen, Keke Chen, Krishnaprasad Thirunarayan, Amit P. Sheth.
Domain Adaptation for Emotion Identification via Data Selection. Technical paper
(under review) 2015
Ohio Center of Excellence in Knowledge-Enabled Computing
Problem Definition
• Input
– Large amount of emotion-labeled tweets
– Small amount of labeled sentences from target
domains (e.g., blogs, fairy tales)
• Objective
– Select informative tweets and add them to target
domain training data, and train an adaptive classifier
for the target domain
33
Ohio Center of Excellence in Knowledge-Enabled Computing
The Bootstrapping Framework
34
Self-labeled tweets
Target domain labeled data
Credit1, credit2, credit3
• Train classifier c
• Apply c to tweets
Ohio Center of Excellence in Knowledge-Enabled Computing
The Bootstrapping Framework
35
Target domain labeled data
Credit1, credit2, credit3
Correctly
classified
Misclassified
• Train classifier c
• Apply c to tweets
• Identify informative tweets
from misclassified tweets
• Add them to target domain
training data
Why select from
misclassified tweets?
Ohio Center of Excellence in Knowledge-Enabled Computing
Informativeness Overview
36
Consistency Diversity Similarity
Ohio Center of Excellence in Knowledge-Enabled Computing
Consistency
• Fear: “Amazing night with my baby. Hope she liked our
anniversary present. Alil early but whatever. :) hopefully tmmrw
goes as planned.”
– Top supporting features for emotion fear
– Top supporting features for any emotion other than fear
– Use the margin to estimate consistency:
0.5094 – 0.5962 = -0.0868
37
Consistency measures how much is a tweet’s Label
consistent with its content.
Ohio Center of Excellence in Knowledge-Enabled Computing
Diversity
• Sadness: “Searching for vinyl proved to be quite disappointing”
– “disappoint” occurs 2 times
• Sadness: “I'm about to lose everything I've ever wanted, my
whole world, and it's all my fault..”
– “lose” occurs 15 times
38
0.00
0.25
0.50
0.75
1.00
0 25 50 75 100
term_freq
diversity
0.9048 (disappoint)
0.4724 (lose)
Exponential decay of its term
frequency in target domain
training data
Diversity encourages the selection of source instances containing
discriminative features that are infrequent or underrepresented in
the target domain.
Ohio Center of Excellence in Knowledge-Enabled Computing
Similarity Intuition
• Inspired by domain adaptation for machine translation
studies that select source instances similar to test
instances (Eck et al., 2004; Lu et al., 2007)
• Given a target test sentence
– Disgust: “im sick of look at a comput screen.”
• Retrieve most similar tweets
– Anger: “im sick and tire of look like a fool”
– Joy: “i have get usb fairi light around my comput screen .”
39
Content Similarity is not sufficient!
Ohio Center of Excellence in Knowledge-Enabled Computing
Similarity Overview
40
Content
similarity
Label
similarity Uncertainty
Ohio Center of Excellence in Knowledge-Enabled Computing
Content Similarity
• Upweight important words
– Source instance:
– Target test instance: inverse document frequency
(idf)
41
Ohio Center of Excellence in Knowledge-Enabled Computing
Label Similarity
• Target test sentence
• Disgust: “im sick of look at a comput screen.”
• Source tweet
• Anger: “im sick and tire of look like a fool”
• How likely will the test sentence express anger?
• Apply the same formula used for Consistency factor
• Top supporting features for emotion anger
• Top supporting features for any emotion other than anger
• Use the margin to estimate consistency: 0.5838 – 0.625 = -0.0412
42
Ohio Center of Excellence in Knowledge-Enabled Computing
Uncertainty
Sentence Label
Predicted
Label
Classifier
confidence
Uncertainty
the second day i go in and i
be so paranoid .
Fear Sadness 0.2352
we are total awesome! Joy Joy 0.8683
43
0.7648
0.1317
The more confident the classifier is, the more likely the prediction
is correct, the less focus we should give to this sentence.
Ohio Center of Excellence in Knowledge-Enabled Computing
Similarity Revisit
• Encourage the selection of source instances that share high
content and label similarities with target domain test instances
that classifier c is most uncertain about.
44
Content
similarity
Label
similarity Uncertainty
Ohio Center of Excellence in Knowledge-Enabled Computing
Informativeness Revisit
• A tweet is informative when
– 1) its label is consistent with its content
– AND 2) it contains a discriminative feature that is infrequent in
target training data
– AND 3) it is similar to an target domain test instance whose
label cannot be predicted by the classifier c with high
confidence.
45
Consistency Diversity Similarity
Our proposed approach: CDS
Ohio Center of Excellence in Knowledge-Enabled Computing
Baseline approaches
• Source Only (SO): train classifiers using only Twit
• Target Only (TO): train classifiers using only target domain
training data
• Feature Injection (FI): first train a source classifier using only
source data (Daume III, 2007)
• Feature Augmentation (FA) (Daume III, 2007)
– Source instances: X -> XX0 (common, source, target)
– Target instances: X -> XoX (common, source, target)
• Balance Weight (BW): assign larger weights for the target
instances so that the weight sum of target instances equals to
that of source instances (Jiang and Zhai, 2007)
46
Ohio Center of Excellence in Knowledge-Enabled Computing
Baseline approaches
• Source Only (SO): train classifiers using only Twit
• Target Only (TO): train classifiers using only target domain
training data
• Feature Injection (FI): first train a source classifier using only
source data (Daume III, 2007)
• Feature Augmentation (FA) (Daume III, 2007)
– Source instances: X -> XX0 (common, source, target)
– Target instances: X -> XoX (common, source, target)
• Balance Weight (BW): assign larger weights for the target
instances so that the weight sum of target instances equals to
that of source instances (Jiang and Zhai, 2007)
47
Ohio Center of Excellence in Knowledge-Enabled Computing
Experimental settings
• Features
– Experimented unigrams, bigrams, unigrams+bigrams
– Applied unigrams in the end
• Logistic regression
– Fast, support probability output (uncertainty)
• Five-fold cross validation
– Four folds: training; 1 fold; testing
• Add-0.5 smoothing
48
Ohio Center of Excellence in Knowledge-Enabled Computing
Results on four target datasets*
49
Percentage gain
8.01%
24.07%
36.53%
3.62%
16.45%
*: The numbers are different from those in the dissertation defense video, because I fixed a bug after that. Results
got slightly improved because of this.
Ohio Center of Excellence in Knowledge-Enabled Computing
Different Instance Selection Strategies
• CDS: select tweets from misclassified tweets
• CD: removed similarity factor from CDS
• CDS-ALL: select tweets from all source tweets
• CDS-CORR: select tweets from source tweets that can be
correctly classified by c
50
Ohio Center of Excellence in Knowledge-Enabled Computing
Comparing instance selection strategies
51
Among all the strategies, CDS
improves F1 in the fastest way.
Ohio Center of Excellence in Knowledge-Enabled Computing
Comparing instance selection strategies
52
CDS-ALL achieves a similar performance
as CDS does but takes more iterations,
because the input of CDS-ALL is a
superset of CDS.
Ohio Center of Excellence in Knowledge-Enabled Computing
Comparing instance selection strategies
53
CDS-CORR performs the worst because it
selects tweets from correctly classified tweets,
the knowledge of which might already exist in
target domains.
Ohio Center of Excellence in Knowledge-Enabled Computing
Summary
• People’s emotions can be gleaned from their texts using machine learning
techniques.
– The combination of n-grams (n=1,2), knowledge-based and syntactic features
achieves the best performance.
– Knowledge features and syntactic features become less important on large training
data.
• We can automatically create a large training dataset for emotion identification
by leveraging emotion hashtags on Twitter.
– A large amount of labeled data are collected with little effort and cost
– Covers a variety of situations that elicit emotions
– Performance gain with increasing size of training data
• This self-labeled emotion dataset can be used to improve emotion
identification in text from other domains/data sources.
– Domain adaptation via selecting tweets that are informative to the target domain
– It is superior to select source instances that cannot be correctly classified.
– Informativeness of a source instance is measured by three factors: consistency,
diversity and similarity.
54
Ohio Center of Excellence in Knowledge-Enabled Computing
Publications
• Wenbo Wang, Lei Duan, Anirudh Koul, Amit P. Sheth. YouRank: Let User Engagement Rank
Microblog Search Results. In the Eighth International AAAI Conference on Weblogs and Social
Media (ICWSM'14) 2014
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Cursing in English on Twitter.
In ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)
2014
• Amit Sheth, Ashutosh Jadhav, Pavan Kapanipathi, Lu Chen, Hemant Purohit, Gary Alan Smith, and
Wenbo Wang. "Twitris: A system for collective social intelligence." In Encyclopedia of Social
Network Analysis and Mining, pp. 2240-2253. Springer New York, 2014.
• Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of
User Groups in Predicting 2012 U.S. Republican Presidential Primaries. In Proceedings of the
Fourth International Conference on Social Informatics (SocInfo'12) 2012
• Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter ‘Big Data’
for Automatic Emotion Identification. 2012 ASE International Conference on Social Computing
(SocialCom 2012), 2012
• Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit P. Sheth. Extracting Diverse
Sentiment Expressions with Target-dependent Polarity from Twitter. In Proceedings of the 6th
International AAAI Conference on Weblogs and Social Media (ICWSM), 2012
55
Ohio Center of Excellence in Knowledge-Enabled Computing
Publications
• Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-grained
Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012
• Ramakanth Kavuluru, Christopher Thomas, Amit Sheth, Victor Chan, Wenbo Wang, Alan Smith, An
Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused
Bioscience Domains, IHI 2012 - 2nd ACM SIGHIT Intl Health Informatics Symposium, January 28-
30, 2012.
• Wenbo Wang, Christopher Thomas, Amit Sheth, Victor Chan. Pattern-Based Synonym and
Antonym Extraction. 48th ACM Southeast Conference, ACMSE2010, Oxford Mississippi, April 15-
17, 2010
• Christopher J. Thomas, Wenbo Wang, Pankaj Mehra, Delroy Cameron, Pablo N. Mendes, and Amit
P. Sheth.. What Goes Around Comes Around – Improving Linked Opend Data through On-Demand
Model Creation. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April
26-27th, 2010, Raleigh, NC: US.
• Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P.
Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced
Browsing, Semantic Web Challenge 2009, demo at 8th International Semantic Web Conference,
Oct. 25-29 2009, Washington, DC, USA
56
Ohio Center of Excellence in Knowledge-Enabled Computing
Patents & Proposal
• Wenbo Wang, Lei Duan. "Temporal User Engagement Features", U.S. Patent
No. 20,150,120,753. 30 Apr. 2015.
• Lu Chen, Wenbo Wang, Amit Sheth. "Topic-specific Sentiment Extraction", U.S.
Patent No. 20,140,358,523. 4 Dec. 2014.
• Context-Aware Harassment Detection on Social Media. NSF proposal
57
Ohio Center of Excellence in Knowledge-Enabled Computing
Special thanks to AFRL and NSF
58
Credit, credit
*Part of this material is based upon work supported by the National Science Foundation under Grant IIS-1111182 ``
SoCS: Collaborative Research: Social Media Enhanced Organizational Sensemaking in Emergency Response.''
Ohio Center of Excellence in Knowledge-Enabled Computing 59
Thank You! & Questions?

More Related Content

Viewers also liked

Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Artificial Intelligence Institute at UofSC
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Artificial Intelligence Institute at UofSC
 
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...Artificial Intelligence Institute at UofSC
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
 

Viewers also liked (11)

Analyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street GangsAnalyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street Gangs
 
Mastering the Velocity Dimension of Big Data
Mastering the Velocity Dimension of Big DataMastering the Velocity Dimension of Big Data
Mastering the Velocity Dimension of Big Data
 
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
 
Walk through Streaming Technologies: EPL
Walk through Streaming Technologies: EPLWalk through Streaming Technologies: EPL
Walk through Streaming Technologies: EPL
 
Entity Recommendations Using Hierarchical Knowledge Bases
Entity Recommendations Using Hierarchical Knowledge BasesEntity Recommendations Using Hierarchical Knowledge Bases
Entity Recommendations Using Hierarchical Knowledge Bases
 
Knowledge Enabled Location Prediction of Twitter Users
Knowledge Enabled Location Prediction of Twitter UsersKnowledge Enabled Location Prediction of Twitter Users
Knowledge Enabled Location Prediction of Twitter Users
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
 
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
Knowledge-driven Personalized Contextual mHealth Service for Asthma Managemen...
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Trust Management: A Tutorial
Trust Management: A TutorialTrust Management: A Tutorial
Trust Management: A Tutorial
 

Similar to Wenbo Wang dissertation defense, Kno.e.sis, Wright State University

Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
David papini escape emotional intelligence traps
David papini   escape emotional intelligence trapsDavid papini   escape emotional intelligence traps
David papini escape emotional intelligence trapsDavid Papini
 
Filippo Lanubile's talk @IASESE 2018
Filippo Lanubile's talk @IASESE 2018Filippo Lanubile's talk @IASESE 2018
Filippo Lanubile's talk @IASESE 2018Filippo Lanubile
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
 
Leap from Doing Agile to Being Agile_AAC2019
Leap from Doing Agile to Being Agile_AAC2019Leap from Doing Agile to Being Agile_AAC2019
Leap from Doing Agile to Being Agile_AAC2019Agile Austria Conference
 
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...Edureka!
 
EI/EQ_Why and How
EI/EQ_Why and HowEI/EQ_Why and How
EI/EQ_Why and HowSeda Maurer
 
University Of Texas At Dallas Application Essay
University Of Texas At Dallas Application EssayUniversity Of Texas At Dallas Application Essay
University Of Texas At Dallas Application EssayBrittany Koch
 
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
Slides chase 2019  connected health conference - thursday 26 september 2019 -...Slides chase 2019  connected health conference - thursday 26 september 2019 -...
Slides chase 2019 connected health conference - thursday 26 september 2019 -...Amélie Gyrard
 
Go Reboot Yourself: Get a Grip on Your Tech
Go Reboot Yourself: Get a Grip on Your TechGo Reboot Yourself: Get a Grip on Your Tech
Go Reboot Yourself: Get a Grip on Your TechAliza Sherman
 
Emotionalinteligence
EmotionalinteligenceEmotionalinteligence
Emotionalinteligencemahimashukla
 

Similar to Wenbo Wang dissertation defense, Kno.e.sis, Wright State University (20)

Media 330057 smxx
Media 330057 smxxMedia 330057 smxx
Media 330057 smxx
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
David papini escape emotional intelligence traps
David papini   escape emotional intelligence trapsDavid papini   escape emotional intelligence traps
David papini escape emotional intelligence traps
 
Filippo Lanubile's talk @IASESE 2018
Filippo Lanubile's talk @IASESE 2018Filippo Lanubile's talk @IASESE 2018
Filippo Lanubile's talk @IASESE 2018
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
Leap from Doing Agile to Being Agile_AAC2019
Leap from Doing Agile to Being Agile_AAC2019Leap from Doing Agile to Being Agile_AAC2019
Leap from Doing Agile to Being Agile_AAC2019
 
Emotions of Facebook Data
Emotions of Facebook DataEmotions of Facebook Data
Emotions of Facebook Data
 
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...
Sentiment Analysis | Machine Learning Algorithms | Data Science Tutorial | Ed...
 
EI/EQ_Why and How
EI/EQ_Why and HowEI/EQ_Why and How
EI/EQ_Why and How
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Emotion Detection
Emotion DetectionEmotion Detection
Emotion Detection
 
University Of Texas At Dallas Application Essay
University Of Texas At Dallas Application EssayUniversity Of Texas At Dallas Application Essay
University Of Texas At Dallas Application Essay
 
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
Slides chase 2019  connected health conference - thursday 26 september 2019 -...Slides chase 2019  connected health conference - thursday 26 september 2019 -...
Slides chase 2019 connected health conference - thursday 26 september 2019 -...
 
Sentiment Analysis.pptx
Sentiment Analysis.pptxSentiment Analysis.pptx
Sentiment Analysis.pptx
 
Go Reboot Yourself: Get a Grip on Your Tech
Go Reboot Yourself: Get a Grip on Your TechGo Reboot Yourself: Get a Grip on Your Tech
Go Reboot Yourself: Get a Grip on Your Tech
 
Emotional Inteligence
Emotional InteligenceEmotional Inteligence
Emotional Inteligence
 
Emotionalinteligence
EmotionalinteligenceEmotionalinteligence
Emotionalinteligence
 
SAMA 2016 for LinkedIn
SAMA 2016 for LinkedInSAMA 2016 for LinkedIn
SAMA 2016 for LinkedIn
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Wenbo Wang dissertation defense, Kno.e.sis, Wright State University

  • 1. Ohio Center of Excellence in Knowledge-Enabled Computing Automatic Emotion Identification from Text Wenbo Wang Kno.e.sis Center Advisor: Dr. Amit P. Sheth Committee members: Dr. Keke Chen Kevin Haas Dr. T.K. Prasad Dr. Ramakanth Kavuluru Ph.D. Dissertation Defense
  • 2. Ohio Center of Excellence in Knowledge-Enabled Computing 2Sadness Anger Fear Joy Your emotions are the slaves to your thoughts, and you are the slave to your emotions. --Elizabeth Gilbert
  • 3. Ohio Center of Excellence in Knowledge-Enabled Computing 3 S&P 500 dropped 1% … Jon C. Ogg, credit Stock Market
  • 4. Ohio Center of Excellence in Knowledge-Enabled Computing 4 Employee Productivity Credit, credit
  • 5. Ohio Center of Excellence in Knowledge-Enabled Computing 5 Subjective Well-being Credit, credit Happiness IndexECG Physical State Emotional State
  • 6. Ohio Center of Excellence in Knowledge-Enabled Computing 6
  • 7. Ohio Center of Excellence in Knowledge-Enabled Computing 7
  • 8. Ohio Center of Excellence in Knowledge-Enabled Computing Emotion Identification • Emotion – “a strong feeling (such as love, anger, joy, hate, or fear)” -- Merriam-Webster Online Dictionary • Emotion Identification – the task of automatically identifying and extracting the emotions expressed in a given text. • Examples 8 “I hate when my mom compares me to my friends” -> Anger “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve broken is stamped all over my body” -> Fear
  • 9. Ohio Center of Excellence in Knowledge-Enabled Computing Proposed Questions • How to glean people’s emotions from their texts using machine learning techniques? • How to create large self-labeled emotion data from social media? • How to improve emotion identification in target domains (e.g., blog, diary) by leveraging large self-labeled emotion data from social media? 9
  • 10. Ohio Center of Excellence in Knowledge-Enabled Computing 1. EMOTION CLASSIFICATION 10 Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine- grained Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012 Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International Conference on Social Computing (SocialCom 2012)
  • 11. Ohio Center of Excellence in Knowledge-Enabled Computing Background - Classification Credit: nltk 11
  • 12. Ohio Center of Excellence in Knowledge-Enabled Computing Dataset Description • Suicide notes – 15 fine-grained emotions – Training: 4,633 sentences; – Testing: 2086 sentences • Twitter data – 7 emotions – Training: ~250 K tweets – Testing: 250 K tweets 12
  • 13. Ohio Center of Excellence in Knowledge-Enabled Computing Suicide Notes Dataset 13 Sentence example: “I loved you and was proud of you.” Unigrams: i, love, you, and, be, proud, of, you, . Bigrams: I love, love you, you and, and be, be proud, proud of, of you, you . The combination of unigrams and bigrams perform the best among n-gram features.
  • 14. Ohio Center of Excellence in Knowledge-Enabled Computing Suicide Notes Dataset 14 Sentence example: “I loved you and was proud of you.” LIWC Knowledge: Posemo: 2 (love, proud) Negemo: 0 Anger: 0 Sad: 0 Adding knowledge-based features further increases the performance.
  • 15. Ohio Center of Excellence in Knowledge-Enabled Computing Suicide Notes Dataset 15 Sentence example: “I loved you and was proud of you .” POS count: Adjective: 1 (proud) Noun: 0 () Pronoun: 3 (i, you) … Sentence tense: Simple past tense: 2 (I loved, was proud) Adding sentence tenses and POS counts further increases the performance
  • 16. Ohio Center of Excellence in Knowledge-Enabled Computing Twitter Dataset – Supervised Classifier 16 Applying only adjectives performs poorly because emotions can be implicitly expressed in text.
  • 17. Ohio Center of Excellence in Knowledge-Enabled Computing Twitter Dataset – Supervised Classifier 17 The combination of unigrams and bigrams perform the best among n-gram features.
  • 18. Ohio Center of Excellence in Knowledge-Enabled Computing Twitter Dataset – Supervised Classifier 18 Knowledge features and syntactic features become less important on Twitter data.
  • 19. Ohio Center of Excellence in Knowledge-Enabled Computing Challenge: The Lack of Training Data • Emotion annotation is typically time-consuming, expensive and error-prone. – multiple emotion categories – subtle and ambiguous emotion expressions – Human judgement of emotion tends to be subjective and varied. • Most of existing datasets are small, e.g., – Blog: 1,890 sentences (Aman and Szpakowicz 2008) – Experience: 1,000 sentences (Neviarouskaya et. al. 2010) – Diary: 700 sentences (Neviarouskaya et. al. 2011) 19
  • 20. Ohio Center of Excellence in Knowledge-Enabled Computing Why do We Need More Training Data? (I) 20 speech. The memory-based learner used only the word before and word after as features. 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.1 1 10 100 1000 Millions of Words TestAccuracy Memory-Based Winnow Perceptron Naïve Bayes Figure 1. Learning Curves for Confusion Set Disambiguation We collected a 1-billion-word training corpus from a variety of English texts, including “We may want to reconsider the trade-off between spending time and money on algorithm development versus spending it on corpus development” -- (Banko and Brill 2001) From (Banko and Brill 2001)
  • 21. Ohio Center of Excellence in Knowledge-Enabled Computing Why do We Need More Training Data? (II) • Emotions arise in various situations, which leads to very diverse expressions conveying the emotions. 21 “I hate when my mom compares me to my friends” “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve broken is stamped all over my body” “I hate when I get the hiccups in class” “Omg I finally fit into one pair of my jeans from last year!!” “A dog barked at me!”
  • 22. Ohio Center of Excellence in Knowledge-Enabled Computing The Use of Hashtags on Twitter 22 “I hate when my mom compares me to my friends #annoying” “When I see a cop, no matter where I am or what I’m doing, I always feel like every law I’ve broken is stamped all over my body #nervous” “I hate when I get the hiccups in class #embarrassing” “Omg I finally fit into one pair of my jeans from last year!! #excited” “A dog barked at me! #scared #weak”
  • 23. Ohio Center of Excellence in Knowledge-Enabled Computing 2. SELF-LABELED DATA CREATION 23 Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International Conference on Social Computing (SocialCom 2012)
  • 24. Ohio Center of Excellence in Knowledge-Enabled Computing Emotion Hashtags • From existing psychology literature (Shaver et. al. 1987), collected 7 sets of emotion words for 7 different emotions – joy, sadness, anger, love, fear, thankfulness, and surprise. 24 Emotion Hashtag Word Examples Number of Tweets Joy excited, happy, elated, proud (36) 706,182 Sadness sorrow, unhappy, depressing, lonely (36) 616,471 Anger irritating, annoyed, frustrate, fury (23) 574,170 Love affection, lovin, loving, fondness (7) 301,759 Fear fear, panic, fright, worry, scare (22) 135,154 Thankfulness thankfulness, thankful (2) 131,340 Surprise surprised, astonished, unexpected (5) 23,906 Total 131 2,488,982
  • 25. Ohio Center of Excellence in Knowledge-Enabled Computing Removing Irrelevant Tweets 25 Hashtag count > 2 Emotion hashtag is not at the end Word count < 5 Has URL or quotations About 5 million tweets -> 2,488,982 tweets
  • 26. Ohio Center of Excellence in Knowledge-Enabled Computing Results with Increasing Training Data 0.4 0.45 0.5 0.55 0.6 0.65 1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184 accuracy number of tweets in training data LIBLINEAR MNB 26 0.4341 0.5292 Logistic Regression (LR) Training instance: 1K -> 2M Percentage gain = 51.05% 0.6557 LR 0.6156
  • 27. Ohio Center of Excellence in Knowledge-Enabled Computing Results with Increasing Training Data 0.4 0.45 0.5 0.55 0.6 0.65 1,000 10,000 248,898 497,796 746,694 995,592 1,244,490 1,493,388 1,742,286 1,991,184 accuracy number of tweets in training data LIBLINEAR MNB 27 0.4580 0.5426 Multinomial Naive Bayes (MNB) Training instance: 1K -> 2M Percentage gain = 38.65% 0.6350 LR 0.6113
  • 28. Ohio Center of Excellence in Knowledge-Enabled Computing For three popular emotions (76.2% of the tweets), the classifier achieves F-measures of over 64% Detailed Results 28
  • 29. Ohio Center of Excellence in Knowledge-Enabled Computing Detailed Results 29 For three less popular emotions (22.8% of the tweets), the precisions are relatively higher compared with the recalls, and the F-measures are over 43%.
  • 30. Ohio Center of Excellence in Knowledge-Enabled Computing What Have We Learned? • We can automatically create training datasets for emotion identification by leveraging emotion hashtags on Twitter. – A large amount of labeled data are collected with little effort and cost – Covers a variety of situations that elicit emotions – Performance gain with increasing size of training data • However, there is still a lack of labeled data in many other domains/data sources. 30
  • 31. Ohio Center of Excellence in Knowledge-Enabled Computing New Challenge 31 Lots of labeled tweets Far less labeled data in many other domains Can we use emotion-labeled tweets to help emotion identification in other domains?
  • 32. Ohio Center of Excellence in Knowledge-Enabled Computing 3. DOMAIN ADAPTATION FOR EMOTION IDENTIFICATION 32 Wenbo Wang, Lu Chen, Keke Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Domain Adaptation for Emotion Identification via Data Selection. Technical paper (under review) 2015
  • 33. Ohio Center of Excellence in Knowledge-Enabled Computing Problem Definition • Input – Large amount of emotion-labeled tweets – Small amount of labeled sentences from target domains (e.g., blogs, fairy tales) • Objective – Select informative tweets and add them to target domain training data, and train an adaptive classifier for the target domain 33
  • 34. Ohio Center of Excellence in Knowledge-Enabled Computing The Bootstrapping Framework 34 Self-labeled tweets Target domain labeled data Credit1, credit2, credit3 • Train classifier c • Apply c to tweets
  • 35. Ohio Center of Excellence in Knowledge-Enabled Computing The Bootstrapping Framework 35 Target domain labeled data Credit1, credit2, credit3 Correctly classified Misclassified • Train classifier c • Apply c to tweets • Identify informative tweets from misclassified tweets • Add them to target domain training data Why select from misclassified tweets?
  • 36. Ohio Center of Excellence in Knowledge-Enabled Computing Informativeness Overview 36 Consistency Diversity Similarity
  • 37. Ohio Center of Excellence in Knowledge-Enabled Computing Consistency • Fear: “Amazing night with my baby. Hope she liked our anniversary present. Alil early but whatever. :) hopefully tmmrw goes as planned.” – Top supporting features for emotion fear – Top supporting features for any emotion other than fear – Use the margin to estimate consistency: 0.5094 – 0.5962 = -0.0868 37 Consistency measures how much is a tweet’s Label consistent with its content.
  • 38. Ohio Center of Excellence in Knowledge-Enabled Computing Diversity • Sadness: “Searching for vinyl proved to be quite disappointing” – “disappoint” occurs 2 times • Sadness: “I'm about to lose everything I've ever wanted, my whole world, and it's all my fault..” – “lose” occurs 15 times 38 0.00 0.25 0.50 0.75 1.00 0 25 50 75 100 term_freq diversity 0.9048 (disappoint) 0.4724 (lose) Exponential decay of its term frequency in target domain training data Diversity encourages the selection of source instances containing discriminative features that are infrequent or underrepresented in the target domain.
  • 39. Ohio Center of Excellence in Knowledge-Enabled Computing Similarity Intuition • Inspired by domain adaptation for machine translation studies that select source instances similar to test instances (Eck et al., 2004; Lu et al., 2007) • Given a target test sentence – Disgust: “im sick of look at a comput screen.” • Retrieve most similar tweets – Anger: “im sick and tire of look like a fool” – Joy: “i have get usb fairi light around my comput screen .” 39 Content Similarity is not sufficient!
  • 40. Ohio Center of Excellence in Knowledge-Enabled Computing Similarity Overview 40 Content similarity Label similarity Uncertainty
  • 41. Ohio Center of Excellence in Knowledge-Enabled Computing Content Similarity • Upweight important words – Source instance: – Target test instance: inverse document frequency (idf) 41
  • 42. Ohio Center of Excellence in Knowledge-Enabled Computing Label Similarity • Target test sentence • Disgust: “im sick of look at a comput screen.” • Source tweet • Anger: “im sick and tire of look like a fool” • How likely will the test sentence express anger? • Apply the same formula used for Consistency factor • Top supporting features for emotion anger • Top supporting features for any emotion other than anger • Use the margin to estimate consistency: 0.5838 – 0.625 = -0.0412 42
  • 43. Ohio Center of Excellence in Knowledge-Enabled Computing Uncertainty Sentence Label Predicted Label Classifier confidence Uncertainty the second day i go in and i be so paranoid . Fear Sadness 0.2352 we are total awesome! Joy Joy 0.8683 43 0.7648 0.1317 The more confident the classifier is, the more likely the prediction is correct, the less focus we should give to this sentence.
  • 44. Ohio Center of Excellence in Knowledge-Enabled Computing Similarity Revisit • Encourage the selection of source instances that share high content and label similarities with target domain test instances that classifier c is most uncertain about. 44 Content similarity Label similarity Uncertainty
  • 45. Ohio Center of Excellence in Knowledge-Enabled Computing Informativeness Revisit • A tweet is informative when – 1) its label is consistent with its content – AND 2) it contains a discriminative feature that is infrequent in target training data – AND 3) it is similar to an target domain test instance whose label cannot be predicted by the classifier c with high confidence. 45 Consistency Diversity Similarity Our proposed approach: CDS
  • 46. Ohio Center of Excellence in Knowledge-Enabled Computing Baseline approaches • Source Only (SO): train classifiers using only Twit • Target Only (TO): train classifiers using only target domain training data • Feature Injection (FI): first train a source classifier using only source data (Daume III, 2007) • Feature Augmentation (FA) (Daume III, 2007) – Source instances: X -> XX0 (common, source, target) – Target instances: X -> XoX (common, source, target) • Balance Weight (BW): assign larger weights for the target instances so that the weight sum of target instances equals to that of source instances (Jiang and Zhai, 2007) 46
  • 47. Ohio Center of Excellence in Knowledge-Enabled Computing Baseline approaches • Source Only (SO): train classifiers using only Twit • Target Only (TO): train classifiers using only target domain training data • Feature Injection (FI): first train a source classifier using only source data (Daume III, 2007) • Feature Augmentation (FA) (Daume III, 2007) – Source instances: X -> XX0 (common, source, target) – Target instances: X -> XoX (common, source, target) • Balance Weight (BW): assign larger weights for the target instances so that the weight sum of target instances equals to that of source instances (Jiang and Zhai, 2007) 47
  • 48. Ohio Center of Excellence in Knowledge-Enabled Computing Experimental settings • Features – Experimented unigrams, bigrams, unigrams+bigrams – Applied unigrams in the end • Logistic regression – Fast, support probability output (uncertainty) • Five-fold cross validation – Four folds: training; 1 fold; testing • Add-0.5 smoothing 48
  • 49. Ohio Center of Excellence in Knowledge-Enabled Computing Results on four target datasets* 49 Percentage gain 8.01% 24.07% 36.53% 3.62% 16.45% *: The numbers are different from those in the dissertation defense video, because I fixed a bug after that. Results got slightly improved because of this.
  • 50. Ohio Center of Excellence in Knowledge-Enabled Computing Different Instance Selection Strategies • CDS: select tweets from misclassified tweets • CD: removed similarity factor from CDS • CDS-ALL: select tweets from all source tweets • CDS-CORR: select tweets from source tweets that can be correctly classified by c 50
  • 51. Ohio Center of Excellence in Knowledge-Enabled Computing Comparing instance selection strategies 51 Among all the strategies, CDS improves F1 in the fastest way.
  • 52. Ohio Center of Excellence in Knowledge-Enabled Computing Comparing instance selection strategies 52 CDS-ALL achieves a similar performance as CDS does but takes more iterations, because the input of CDS-ALL is a superset of CDS.
  • 53. Ohio Center of Excellence in Knowledge-Enabled Computing Comparing instance selection strategies 53 CDS-CORR performs the worst because it selects tweets from correctly classified tweets, the knowledge of which might already exist in target domains.
  • 54. Ohio Center of Excellence in Knowledge-Enabled Computing Summary • People’s emotions can be gleaned from their texts using machine learning techniques. – The combination of n-grams (n=1,2), knowledge-based and syntactic features achieves the best performance. – Knowledge features and syntactic features become less important on large training data. • We can automatically create a large training dataset for emotion identification by leveraging emotion hashtags on Twitter. – A large amount of labeled data are collected with little effort and cost – Covers a variety of situations that elicit emotions – Performance gain with increasing size of training data • This self-labeled emotion dataset can be used to improve emotion identification in text from other domains/data sources. – Domain adaptation via selecting tweets that are informative to the target domain – It is superior to select source instances that cannot be correctly classified. – Informativeness of a source instance is measured by three factors: consistency, diversity and similarity. 54
  • 55. Ohio Center of Excellence in Knowledge-Enabled Computing Publications • Wenbo Wang, Lei Duan, Anirudh Koul, Amit P. Sheth. YouRank: Let User Engagement Rank Microblog Search Results. In the Eighth International AAAI Conference on Weblogs and Social Media (ICWSM'14) 2014 • Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Cursing in English on Twitter. In ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14) 2014 • Amit Sheth, Ashutosh Jadhav, Pavan Kapanipathi, Lu Chen, Hemant Purohit, Gary Alan Smith, and Wenbo Wang. "Twitris: A system for collective social intelligence." In Encyclopedia of Social Network Analysis and Mining, pp. 2240-2253. Springer New York, 2014. • Lu Chen, Wenbo Wang, Amit P. Sheth. Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries. In Proceedings of the Fourth International Conference on Social Informatics (SocInfo'12) 2012 • Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth. Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification. 2012 ASE International Conference on Social Computing (SocialCom 2012), 2012 • Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, Amit P. Sheth. Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. In Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM), 2012 55
  • 56. Ohio Center of Excellence in Knowledge-Enabled Computing Publications • Wenbo Wang, Lu Chen, Ming Tan, Shaojun Wang, Amit P. Sheth. Discovering Fine-grained Sentiment in Suicide Notes. Biomedical Informatics Insights, 2012 • Ramakanth Kavuluru, Christopher Thomas, Amit Sheth, Victor Chan, Wenbo Wang, Alan Smith, An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains, IHI 2012 - 2nd ACM SIGHIT Intl Health Informatics Symposium, January 28- 30, 2012. • Wenbo Wang, Christopher Thomas, Amit Sheth, Victor Chan. Pattern-Based Synonym and Antonym Extraction. 48th ACM Southeast Conference, ACMSE2010, Oxford Mississippi, April 15- 17, 2010 • Christopher J. Thomas, Wenbo Wang, Pankaj Mehra, Delroy Cameron, Pablo N. Mendes, and Amit P. Sheth.. What Goes Around Comes Around – Improving Linked Opend Data through On-Demand Model Creation. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US. • Ashutosh Jadhav, Wenbo Wang, Raghava Mutharaju, Pramod Anantharam, Vinh Nyugen, Amit P. Sheth, Karthik Gomadam, Meenakshi Nagarajan, and Ajith Ranabahu, Twitris: Socially Influenced Browsing, Semantic Web Challenge 2009, demo at 8th International Semantic Web Conference, Oct. 25-29 2009, Washington, DC, USA 56
  • 57. Ohio Center of Excellence in Knowledge-Enabled Computing Patents & Proposal • Wenbo Wang, Lei Duan. "Temporal User Engagement Features", U.S. Patent No. 20,150,120,753. 30 Apr. 2015. • Lu Chen, Wenbo Wang, Amit Sheth. "Topic-specific Sentiment Extraction", U.S. Patent No. 20,140,358,523. 4 Dec. 2014. • Context-Aware Harassment Detection on Social Media. NSF proposal 57
  • 58. Ohio Center of Excellence in Knowledge-Enabled Computing Special thanks to AFRL and NSF 58 Credit, credit *Part of this material is based upon work supported by the National Science Foundation under Grant IIS-1111182 `` SoCS: Collaborative Research: Social Media Enhanced Organizational Sensemaking in Emergency Response.''
  • 59. Ohio Center of Excellence in Knowledge-Enabled Computing 59 Thank You! & Questions?