In this paper, we analyze the quality of several commercial tools for sentiment detection. All tools are tested on nearly 30’000 short texts from various sources, such as tweets, news, reviews etc. In addition to the quality analysis (measured by various metrics), we also investigate the effect of increasing text length on the performance. Finally, we show that combining all tools using machine learning techniques increases the overall performance significantly.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Potential and Limitations of Commercial Sentiment Detection Tools
1. Zurich Universitiy
of Applied Sciences
Potential and Limitations of
Commercial Sentiment Detection Tools
Fatih Uzdilli
joint work with Mark Cieliebak and Oliver Dürr
03.12.2013 @ ESSEM’13
2. Zurich Universitiy
of Applied Sciences
About Me
Fatih Uzdilli
Institute of Applied Information Technology (InIT)
ZHAW, Winterthur, Switzerland
Email:
, more about me: home.zhaw.ch/~uzdi
Research Interest
Information Retrieval, Machine Learning, Sentiment Analysis
Background
Software Engineer, Social Media Monitoring, Search Technologies
03.12.2013
Fatih Uzdilli
2
3. Zurich Universitiy
of Applied Sciences
Abstract
Evaluation of 9 commercial sentiment tools on approx.
30'000 short texts.
Best commercial tools have accuracy of only 60%.
Combining all tools using Random Forest improved the
accuracy.
03.12.2013
Fatih Uzdilli
3
4. Zurich Universitiy
of Applied Sciences
Motivation
• Scientific results for sentiment detection:
«very good performance: > 80% accuracy»
C
• Blog posts about commercial tools:
«very poor quality, unusable»
D
03.12.2013
Fatih Uzdilli
4
5. Zurich Universitiy
of Applied Sciences
Motivation
• Scientific results for sentiment detection:
«very good performance: > 80% accuracy»
C
• Blog posts about commercial tools:
«very poor quality, unusable»
D
03.12.2013
Fatih Uzdilli
5
6. Zurich Universitiy
of Applied Sciences
How good is commercial
Sentiment Detection?
source: http://www.commute.com/images/schools_evaluation.jpg
03.12.2013
Is there potential for
improvement?
source: http://3.bp.blogspot.com/-u3acK_WjaLU/ULYv51mHEhI/
AAAAAAAAARY/ DIZqOfxuswc/s1600/IcebergQ1.jpg
Fatih Uzdilli
6
7. Zurich Universitiy
of Applied Sciences
Evaluation Setup
• 9 Commercial APIs
• 7 Public Text Corpora
– Stand-alone
– Free for this evaluation
– Arbitrary Text
– Single Statements
– Different Media Types
• Tweet, News, Review,
Speech Transcript
– Total: 28653 Texts
POSITIVE
NEGATIVE
OTHER
( neutral / mixed )
03.12.2013
Fatih Uzdilli
7
8. Zurich Universitiy
of Applied Sciences
Tool Accuracy
Avg.
Best Tool per Corpus
Accuracy
52%
Worst Tool per Corpus
0.7
61%
Average of All Tools
0.8
40%
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
8
9. Zurich Universitiy
of Applied Sciences
Tool Accuracy
Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Overall Best Tool
Overall Worst Tool
0.8
Accuracy
0.7
Avg.
61%
52%
40%
59%
45%
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
9
10. Zurich Universitiy
of Applied Sciences
Further Findings
• Longer texts are hard to classify
• Corpus annotations might be erroneous
03.12.2013
Fatih Uzdilli
10
11. Zurich Universitiy
of Applied Sciences
Can a Meta-Classifier do better?
• 1st Approach: Majority Classifier
– Sentiment with most votes chosen
Illustration:
api1 api2 api3 api4 api5 api6 api7
Text 1
+
+
o
+
o
Text 2
Text 3
Text n
03.12.2013
o
+
o
o
+
+
+
+
o
+
-
Fatih Uzdilli
+
o
o
Majority
+
+
o
11
12. Zurich Universitiy
of Applied Sciences
Tool Accuracy
Best Tool per Corpus
0.8
Average of All Tools
Worst Tool per Corpus
Accuracy
0.7
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
12
13. Majority Classifier beats Average
Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Majority Classifier
0.8
0.7
Accuracy
Zurich Universitiy
of Applied Sciences
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
13
14. Zurich Universitiy
of Applied Sciences
2nd Approach: Random-Forest
api1 api2 api3 … api n annotation
Text 1 +
+ … o
+
Train
Text 2
+
o … +
Train
Text 3
o
- … +
Train
Text 4 +
Train
o
+ …
+
Text 5
Text 6
Text 7
Text 8
Text 9
03.12.2013
+
+
+
+
o
o
o
+
-
+
o
+
o
+
…
…
…
…
…
o
o
o
Fatih Uzdilli
o
Train
Train
o
unknown Predict
unknown Predict
unknown Predict
Random
Forest
Classifier
+
+
o
14
15. Zurich Universitiy
of Applied Sciences
Before Random Forest
Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Majority Classifier
0.8
Accuracy
0.7
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
15
16. Random Forest Beats Best Single Tool
Best Tool per Corpus
Average of All Tools
Worst Tool per Corpus
Majority Classifier
Random Forest Classifier
0.8
0.7
Accuracy
Zurich Universitiy
of Applied Sciences
0.6
0.5
0.4
0.3
0.2
03.12.2013
Fatih Uzdilli
16
17. Zurich Universitiy
of Applied Sciences
Summary
• Best Tool: 59% Accuracy
• Random Forest combination: Up to 9% improvement
<=9%
03.12.2013
Fatih Uzdilli
17