8. Thesis subject: Predicting customer affect
after an email conversation
20-7-2017 8
Customer
Customer
Support
9. Thesis subject: Predicting customer affect
after an email conversation
20-7-2017 9
Customer
Customer
Support
10. Research questions
What is the impact of a Customer Service email response on customer
affect?
1. What sentiment can be detected in customer emails?
2. Can a domain specific sentiment detection machine learning model
outperform a general model for sentiment?
3. Do CS response email features have predictive value for sentiment
of a customer?
20-7-2017 10
19. Annotation step 5: Agreement results
20-7-2017 19
• Significant differences between annotators
• Best agreement on Sentiment, Anger and Joy
Sentiment Emotions
Anger Disgust Fear Joy Sadness
Avg annotator kappa 0.45 0.49 0.21 0.24 0.61 0.31
20. Annotation step 6: Combine annotations
Merge 3 annotator results into 1:
• Majority vote
• Full agreement
• Average scoring
20-7-2017 20
21. Sentiment analysis
Classification problem
1. Feature construction
2. Handling class imbalance
3. Single label versus multilabel
4. Model selection
5. Feature selection & importance
6. Model evaluation
20-7-2017 21
22. Sentiment analysis step 1:
Feature construction
20 feature groups. 342 features
• Simple features:
• Number of words
• Average length of a word
• Day of the week (Monday. Tuesday.…)
• …
Advanced features
• Ratio of correctly spelled words / total number of words
• TF.IDF
• Doc2Vec
20-7-2017 22
23. Sentiment analysis step 2:
Handling class imbalance
Percentage of emails with emotion:
Percentage of emails with certain sentiment:
• Imbalance may not be an issue
• Oversampling versus no sampling
20-7-2017 23
Anger Disgust Fear Joy Sadness
Annotator consensus 22.9 11.7 4.9 19.3 23.7
None Neg Pos Mix
Annotator consensus 33.4 38.6 23.2 4.7
24. Sentiment analysis step 3:
Single label versus Multilabel
Single label, 5 models
Multilabel, 1 model
20-7-2017 24
Feats. Feats. Anger
… … 1
Feats. Feats. Anger Disgust Fear Joy Sadness
… … 1 1 0 0 0
25. Sentiment analysis step 4:
Model selection
Models:
• Naive Bayes
• Support Vector Machine
• Neural Net
• Random Forest
• RAkEL
• Soft Voting ensemble
20-7-2017 25
26. Sentiment analysis step 4:
Model selection
Oversampling over minority class(es)
Majority vote to combine annotations
20-7-2017 26
Sentiment Best Model
Sentiment Voting Neural Net +Random Forest + SVM
Anger
Voting Neural Net + Random ForestDisgust
Joy
27. Sentiment analysis step 5:
Feature selection & importance
Significant features:
20-7-2017 27
Anger Disgust Joy Sentiment
char_Tfidf (100) X X X
countExclamation X
countNRC (7) X
Doc2Vec (100) X X X
word_Tfidf (100) X X
28. Sentiment analysis step 6:
Model evaluation
20-7-2017 28
Sentiment
(kappa)
Emotions (kappa)
Anger Disgust Fear Joy Sadness
Domain specific model 0.43 0.51 0.43 0.13 0.61 0.36
Benchmark 1: NRC lexicon 0.09 0.33 0.31 0.02 0.06 0.27
Benchmark 2: IBM NLU 0.22 0.32 0.03 -0.01 0.46 0.14
Avg annotator agreement 0.45 0.49 0.21 0.24 0.61 0.31
29. Affect analysis
Classification problem
1. Feature construction
2. Handling class imbalance
3. Single label versus multilabel
4. Model selection
5. Feature selection & importance
6. Model evaluation
20-7-2017 29
31. Affect analysis step 1:
Feature construction
23 feature groups. 681 features
• originating customer email features
• annotated customer email sentiment and emotions
• CS response email features
• response time
20-7-2017 31
32. Affect analysis step 4:
Model selection
20-7-2017 32
Sentiment Best Model
Sentiment Random Forest, no oversampling
Anger Voting Neural Net + Naive Bayes, with oversampling
Disgust Naive Bayes, with oversampling
Joy Naive Bayes, with oversampling
Sadness RAkEL using Random Forest
second: Naive Bayes, no oversampling
33. Affect analysis step 5:
Feature selection & importance
Significant features:
20-7-2017 33
Anger Disgust Joy Sadness Sentiment
threadItem X X
CS - char_Tfidf X
CS - dayOfWeek X
CS - lengthMessage X X
CS - word_Tfidf X X
Cust. - Emotions X X
Cust. - Sentiment X X
35. Conclusion
1. What sentiment can be detected in customer emails?
Sentiment, Anger, Disgust, Joy
2. Can a domain specific sentiment detection machine learning model
outperform a general model for sentiment?
Domain specific model significantly better than IBM NLU & NRC
3. Do CS response email features have predictive value for sentiment
of a customer?
Sentiment, Joy, Sadness significant. Overall low performance
20-7-2017 35
36. Future work
• Test performance on other domains
• Improve annotator agreement
• Increase amount of training data
• Increase number of features
• Directly measure customer emotion
20-7-2017 36
37. JULY 6. 2017
Stanford computer scientists develop an
algorithm that diagnoses heart arrhythmias
with cardiologist-level accuracy
JULY 13. 2017
VU data scientist develops an
algorithm that identifies emotions in email
with human-level accuracy
The takeaway message
20-7-2017 37
Mohammad (2016). A practical guide to sentiment annotation: Challenges and solutions.
The method proposed by Cohen (1960) to calculate expected agreement Ae in his κ coefficient assumes that random assignment of categories to items is governed by prior distributions that are unique to each coder. and which reflect individual annotator bias.
Recognizing emotions in text is difficult for humans.
Exact feature impact not known
Exact feature impact not known
Exact feature impact not known
Recognizing emotions in text is difficult for humans.
Recognizing emotions in text is difficult for humans.