Identifying effective affective email response

Effective affective
email response
Amsterdam.
13 July 2017
Erwin Huijzer

Content
• Introduction
• Annotation of emails
• Sentiment Analysis
• Affect Analysis
• Conclusions
• Future work
20-7-2017 2

Problem 1: Quality
20-7-2017 4

Problem 1: Quality
20-7-2017 5

Thesis subject: Predicting customer affect
after an email conversation
20-7-2017 8
Customer
Customer
Support

Thesis subject: Predicting customer affect
20-7-2017 9
Customer
Customer
Support

Research questions
What is the impact of a Customer Service email response on customer
affect?
1. What sentiment can be detected in customer emails?
2. Can a domain specific sentiment detection machine learning model
outperform a general model for sentiment?
3. Do CS response email features have predictive value for sentiment
of a customer?
20-7-2017 10

0. Sentiment Annotation
1. Sentiment Analysis (writer)
2. Affect analysis (reader)
Predicting customer affect
20-7-2017 11

Data
• Sporting goods retailer, UK customers & Customer Support emails
• 77k emails; 50% incoming, 50% outgoing
20-7-2017 12

6 Sentiment annotation steps
1. Determine sentiment and emotions framework
2. Create annotation instructions
3. Select / build annotation tooling
4. Annotate by multiple annotators
5. Analyze inter-annotator agreement
6. Combine annotations
20-7-2017 13

Annotation step 1a: Sentiment framework
20-7-2017 14
1. Neutral / None
2. Negative
3. Positive
4. Mixed
5. Irony

Annotation step 1b: Emotions framework
20-7-2017 15
Plutchik (2001)Ekman (1973)

Annotation step 3: Annotation application
20-7-2017 16

Annotation step 4: Annotation
• 5 annotators
• 3 annotators per email
• 300 - 750 emails per annotator
• 750 annotated emails
• 18 hours work
20-7-2017 17

Annotation step 5:
Inter-annotator agreement
Cohen’s kappa:
𝜅 =
𝐴 𝑜 − 𝐴 𝑒
1 − 𝐴 𝑒
𝐴 𝑜 is observed agreement
𝐴 𝑒 is expected agreement including annotator bias
20-7-2017 18

Annotation step 5: Agreement results
20-7-2017 19
• Significant differences between annotators
• Best agreement on Sentiment, Anger and Joy
Sentiment Emotions
Anger Disgust Fear Joy Sadness
Avg annotator kappa 0.45 0.49 0.21 0.24 0.61 0.31

Annotation step 6: Combine annotations
Merge 3 annotator results into 1:
• Majority vote
• Full agreement
• Average scoring
20-7-2017 20

Sentiment analysis
Classification problem
1. Feature construction
2. Handling class imbalance
3. Single label versus multilabel
4. Model selection
5. Feature selection & importance
6. Model evaluation
20-7-2017 21

Sentiment analysis step 1:
Feature construction
20 feature groups. 342 features
• Simple features:
• Number of words
• Average length of a word
• Day of the week (Monday. Tuesday.…)
• …
Advanced features
• Ratio of correctly spelled words / total number of words
• TF.IDF
• Doc2Vec
20-7-2017 22

Handling class imbalance
Percentage of emails with emotion:
Percentage of emails with certain sentiment:
• Imbalance may not be an issue
• Oversampling versus no sampling
20-7-2017 23
Annotator consensus 22.9 11.7 4.9 19.3 23.7
None Neg Pos Mix
Annotator consensus 33.4 38.6 23.2 4.7

Single label versus Multilabel
Single label, 5 models
Multilabel, 1 model
20-7-2017 24
Feats. Feats. Anger
… … 1
Feats. Feats. Anger Disgust Fear Joy Sadness
… … 1 1 0 0 0

Model selection
Models:
• Naive Bayes
• Support Vector Machine
• Neural Net
• Random Forest
• RAkEL
• Soft Voting ensemble
20-7-2017 25

Model selection
Oversampling over minority class(es)
Majority vote to combine annotations
20-7-2017 26
Sentiment Best Model
Sentiment Voting Neural Net +Random Forest + SVM
Anger
Voting Neural Net + Random ForestDisgust
Joy

Feature selection & importance
Significant features:
20-7-2017 27
Anger Disgust Joy Sentiment
char_Tfidf (100) X X X
countExclamation X
countNRC (7) X
Doc2Vec (100) X X X
word_Tfidf (100) X X

Model evaluation
20-7-2017 28
Sentiment
(kappa)
Emotions (kappa)
Domain specific model 0.43 0.51 0.43 0.13 0.61 0.36
Benchmark 1: NRC lexicon 0.09 0.33 0.31 0.02 0.06 0.27
Benchmark 2: IBM NLU 0.22 0.32 0.03 -0.01 0.46 0.14
Avg annotator agreement 0.45 0.49 0.21 0.24 0.61 0.31

Affect analysis
Classification problem
1. Feature construction
2. Handling class imbalance
3. Single label versus multilabel
4. Model selection
5. Feature selection & importance
6. Model evaluation
20-7-2017 29

Affect analysis:
Conversation data
20-7-2017 30
Post CS response sentiment
Mix Neg None Pos Total
Originating
Sentiment
Mix 0 5 4 13 22
Neg 9 85 65 59 218
None 7 31 63 51 152
Pos 2 15 19 27 63
Total 18 136 151 150 455

Affect analysis step 1:
Feature construction
23 feature groups. 681 features
• originating customer email features
• annotated customer email sentiment and emotions
• CS response email features
• response time
20-7-2017 31

Model selection
20-7-2017 32
Sentiment Best Model
Sentiment Random Forest, no oversampling
Anger Voting Neural Net + Naive Bayes, with oversampling
Disgust Naive Bayes, with oversampling
Joy Naive Bayes, with oversampling
Sadness RAkEL using Random Forest
second: Naive Bayes, no oversampling

Feature selection & importance
Significant features:
20-7-2017 33
Anger Disgust Joy Sadness Sentiment
threadItem X X
CS - char_Tfidf X
CS - dayOfWeek X
CS - lengthMessage X X
CS - word_Tfidf X X
Cust. - Emotions X X
Cust. - Sentiment X X

Model evaluation
20-7-2017 34
Sentiment
(accuracy)
Emotions
(F1-measure)
Anger Disgust Joy Sadness
Affect model 0.49 0.45 0.30 0.50 0.30
Benchmark 1:
single category 0.33* 0.36* 0.22* 0.44* 0.28*
Benchmark 2:
start emotion 0.38* 0.44 0.29 0.20* 0.28*
* Significant p<0.05

Conclusion
1. What sentiment can be detected in customer emails?
Sentiment, Anger, Disgust, Joy
2. Can a domain specific sentiment detection machine learning model
outperform a general model for sentiment?
Domain specific model significantly better than IBM NLU & NRC
3. Do CS response email features have predictive value for sentiment
of a customer?
Sentiment, Joy, Sadness significant. Overall low performance
20-7-2017 35

Future work
• Test performance on other domains
• Improve annotator agreement
• Increase amount of training data
• Increase number of features
• Directly measure customer emotion
20-7-2017 36

JULY 6. 2017
Stanford computer scientists develop an
algorithm that diagnoses heart arrhythmias
with cardiologist-level accuracy
JULY 13. 2017
VU data scientist develops an
algorithm that identifies emotions in email
with human-level accuracy
The takeaway message
20-7-2017 37

Step 5: Agreement results - Sentiment
20-7-2017 40
None Pos Neg Mix Irony Support
IBM NLU 11.6 35.3 15.5 37.6 - 742
NRC lexicon 19.7 36.1 6.9 37.3 - 742
Annotator 1 26.7 22.5 41.0 9.8 0.0 742
Annotator 2 27.1 28.3 35.5 9.0 0.0 442
Annotator 3 43.5 18.2 34.0 3.2 0.9 444
Annotator 4 16.7 42.1 39.1 1.7 0.0 299
Annotator 5 26.1 22.7 47.8 3.3 0.0 299
Annotator consensus 33.4 23.2 38.6* 4.7 - 742
Full annotator agreement 23.9 29.4 46.4* 0.3 - 366

Step 5: Agreement results - Emotions
20-7-2017 41
Anger Disgust Fear Joy Sadness Support
IBM NLU 26.3 0.5 0.4 23.3 13.9 742
NRC lexicon 20.6 16.4 22.2 38.7 33.4 742
Annotator 1 32.3 13.5 8.6 24.4 20.1 742
Annotator 2 25.6 9.5 6.8 18.1 38.7 442
Annotator 3 9.7 28.2 9.0 19.8 12.8 444
Annotator 4 17.4 5.4 18.4 18.7 24.7 299
Annotator 5 33.4 0.0 1.0 25.4 42.1 299
Annotator consensus 22.9 11.7 4.9 19.3 23.7 742
Full annotator agreement
(Support full agreement)
15.4
(526)
2.0
(568)
2.0
(612)
16.0
(594)
11.8
(447)

Model evaluation
20-7-2017 42
Precision Recall F1 Kappa
Anger 0.69 0.54 0.61 0.51
Disgust 0.54 0.46 0.49 0.43
Joy 0.72 0.65 0.68 0.61
Sentiment – Mix 0.08 0.03 0.05
0.43
Sentiment – Neg 0.67 0.71 0.69
Sentiment – None 0.57 0.59 0.58
Sentiment – Pos 0.63 0.63 0.63

Model evaluation
20-7-2017 43

Model evaluation
20-7-2017 44
Precision Recall F1
Anger 0.34 0.68 0.45
Disgust 0.21 0.52 0.30
Joy 0.41 0.65 0.50
Sadness 0.22 0.47 0.30
Sentiment – Mix 0.00 0.00 0.00
Sentiment – Neg 0.47 0.47 0.47
Sentiment – None 0.50 0.55 0.53
Sentiment – Pos 0.48 0.50 0.49
Sentiment avg 0.47 0.49 0.48

Identifying effective affective email response

Recommended

Recommended

More Related Content

Similar to Identifying effective affective email response

Similar to Identifying effective affective email response (20)

Recently uploaded

Recently uploaded (20)

Identifying effective affective email response

Editor's Notes