Sentiment Knowledge Discovery in Twitter Streaming Data
Albert Bifet and Eibe Frank
University of Waikato
Hamilton, New Zealand
Canberra, 7 October 2010
Discovery Science 2010
Twitter: A Massive Data Stream
Web 2.0
Micro-blogging service
Built to discover what is happening at any moment in time,
anywhere in the world.
106 million registered users
600 million search queries per day
3 billion requests a day via its API.
2 / 26
Outline
1 Twitter Streaming Data
2 Twitter Sentiment Classification: Metrics and Methods
3 Empirical results
3 / 26
Outline
1 Twitter Streaming Data
2 Twitter Sentiment Classification: Metrics and Methods
3 Empirical results
4 / 26
Data stream classification cycle
1 Process an example at a time,
and inspect it only once (at
most)
2 Use a limited amount of
memory
3 Work in a limited amount of
time
4 Be ready to predict at any
point
5 / 26
Data stream classification cycle
Evaluation procedures for Data
Streams
Holdout
Interleaved Test-Then-Train
("Prequential" Evaluation)
5 / 26
Twitter Streaming API
Twitter APIs
Streaming API
Two discrete REST APIs
Real-time access to Tweets
sampled form
filtered form
HTTP based
GET
POST
DELETE
6 / 26
Sentiment Analysis on Twitter
Sentiment analysis
Classifying messages into two categories depending on
whether they convey positive or negative feelings
Emoticons are visual cues associated with emotional states,
which can be used to define class labels for sentiment
classification
Positive Emoticons Negative Emoticons
:) :(
:-) :-(
: ) : (
:D
=)
Table: List of positive and negative emoticons.
7 / 26
Outline
1 Twitter Streaming Data
2 Twitter Sentiment Classification: Metrics and Methods
3 Empirical results
8 / 26
Streaming Data Evaluation with Unbalanced Classes
Predicted Predicted
Class+ Class- Total
Correct Class+ 75 8 83
Correct Class- 7 10 17
Total 82 18 100
Table: Simple confusion matrix example
Predicted Predicted
Class+ Class- Total
Correct Class+ 68.06 14.94 83
Correct Class- 13.94 3.06 17
Total 82 18 100
Table: Confusion matrix for chance predictor
9 / 26
Streaming Data Evaluation with Unbalanced Classes
Kappa Statistic
p0: classifier’s prequential accuracy
pc: probability that a chance classifier makes a correct
prediction.
κ statistic
κ =
p0 −pc
1−pc
κ = 1 if the classifier is always correct
κ = 0 if the predictions coincide with the correct ones as
often as those of the chance classifier
Forgetting mechanism for estimating prequential kappa
Sliding window of size w with the most recent observations
10 / 26
Data Stream Mining Methods
Multinomial Naïve Bayes
Considers a document as a bag-of-words.
Estimates the probability of observing word w and the prior
probability P(c)
Probability of class c given a test document:
P(c|d) =
P(c)∏w∈d P(w|c)nwd
P(d)
11 / 26
Data Stream Mining Methods
Stochastic Gradient Descent
Vanilla stochastic gradient descent with a fixed learning
rate
Optimizing the hinge loss with an L2 penalty commonly
applied to SVM
Loss function to optimize:
λ
2
||w||2
+∑[1−(yxw+b)]+
12 / 26
Data Stream Mining Methods
Hoeffding Tree
Incremental decision tree for data streams.
Strategy based on the Hoeffding bound
ε =
R2 ln(1/δ)
2n
A node is expanded by splitting as soon as there is
sufficient statistical evidence
13 / 26
Outline
1 Twitter Streaming Data
2 Twitter Sentiment Classification: Metrics and Methods
3 Empirical results
14 / 26
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for mining data
streams.
Based on experience with Weka and VFML
Focussed on classification trees, but lots of active
development: clustering, item set and sequence mining,
regression
Easy to extend
Easy to design and run experiments
15 / 26
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
16 / 26
Twitter Sentiment Corpora
Twitter Sentiment Corpus
twittersentiment.appspot.com
Alec Go, Richa Bhayani, Karthik Raghunathan, and Lei
Huang
Website to research the sentiment for a brand, product, or
topic.
Training dataset with messages between April 2009 and
June 25, 2009
800,000 tweets with positive emoticons
800,000 tweets with negative emoticons
Test dataset manually annotated
177 negative tweets
182 positive ones
17 / 26
Twitter Sentiment Corpora
Edinburgh Corpus
http://demeter.inf.ed.ac.uk
Sasa Petrovic, Miles Osborne, and Victor Lavrenko
97 million tweets (14 GB)
Each tweet contains
timestamp of the tweet,
anonymized user name
the tweet’s text
the posting method that was used
Collected between November 11th 2009 and February 1st
2010, using Twitter’s streaming API.
18 / 26
Twitter Empirical Evaluation
Sliding Window Prequential Accuracy
30
40
50
60
70
80
90
100
0,01
0,08
0,15
0,22
0,29
0,36
0,43
0,5
0,57
0,64
0,71
0,78
0,85
0,92
0,99
1,06
1,13
1,2
1,27
1,34
1,41
1,48
1,55
Millions of Instances
Accuracy%
NB Multinomial SGD Hoeffding Tree Class Distribution
Figure: Accuracy and Kappa Statistic on twittersentiment
corpus
19 / 26
Twitter Empirical Evaluation
Sliding Window Kappa Statistic
0
10
20
30
40
50
60
70
80
0,01
0,08
0,15
0,22
0,29
0,36
0,43
0,50
0,57
0,64
0,71
0,78
0,85
0,92
0,99
1,06
1,13
1,20
1,27
1,34
1,41
1,48
1,55
Millions of Instances
KappaStatistic
NB Multinomial SGD Hoeffding Tree Class Distribution
Figure: Accuracy and Kappa Statistic on twittersentiment
corpus
19 / 26
Twitter Empirical Evaluation
Sliding Window Prequential Accuracy
75
77
79
81
83
85
87
89
91
93
95
0,01
0,1
0,19
0,28
0,37
0,46
0,55
0,64
0,73
0,82
0,91
1
1,09
1,18
1,27
1,36
1,45
1,54
1,63
1,72
1,81
1,9
1,99
2,08
Millions of Instances
Accuracy%
NB Multinomial SGD Hoeffding Tree Class Distribution
Figure: Accuracy and Kappa Statistic on Edinburgh corpus
20 / 26
Twitter Empirical Evaluation
Sliding Window Kappa Statistic
0
10
20
30
40
50
60
70
80
90
100
0,01
0,1
0,19
0,28
0,37
0,46
0,55
0,64
0,73
0,82
0,91
1
1,09
1,18
1,27
1,36
1,45
1,54
1,63
1,72
1,81
1,9
1,99
2,08
Millions of Instances
KappaStatistic
NB Multinomial SGD Hoeffding Tree Class Distribution
Figure: Accuracy and Kappa Statistic on Edinburgh corpus
20 / 26
twittersentiment Corpus
Prequential Accuracy and Kappa
Accuracy Kappa Time
Multinomial Naïve Bayes 75.05% 50.10% 116.62 sec.
SGD 82.80% 62.60% 219.54 sec.
Hoeffding Tree 73.11% 46.23% 5525.51 sec.
Total prequential accuracy and Kappa measured on the
twittersentiment data stream
21 / 26
Edinburgh Corpus
Prequential Accuracy and Kappa
Accuracy Kappa Time
Multinomial Naïve Bayes 86.11% 36.15% 173.28, sec.
SGD 86.26% 31.88% 293.98 sec.
Hoeffding Tree 84.76% 20.40% 6151.51 sec.
Total prequential accuracy and Kappa obtained on the
Edinburgh corpus data stream.
22 / 26
SGD coefficient variations on the Edinburgh corpus
Middle of Stream End of Stream
Tags Coefficient Coefficient Variation
apple 0.3 0.7 0.4
microsoft -0.4 -0.1 0.3
facebook -0.3 0.4 0.7
mcdonalds 0.5 0.1 -0.4
google 0.3 0.6 0.3
disney 0.0 0.0 0.0
bmw 0.0 -0.2 -0.2
pepsi 0.1 -0.6 -0.7
dell 0.2 0.0 -0.2
gucci -0.4 0.6 1.0
amazon -0.1 -0.4 -0.3
23 / 26
Summary
Twitter is a new “what’s-happening-right-now” tool
Twitter as a stream mining dataset for real-time predictions
Sliding window Kappa statistic
Recommend SGD-based model
24 / 26
twittersentiment Corpus
Hold-out Accuracy and Kappa
Accuracy Kappa
Multinomial Naïve Bayes 82.45% 64.89%
SGD 78.55% 57.23%
Hoeffding Tree 69.36% 38.73%
Accuracy and Kappa for the test dataset obtained from
twittersentiment
25 / 26
Edinburgh Corpus
Hold-out Accuracy and Kappa
Accuracy Kappa
Multinomial Naïve Bayes 73.81% 47.28%
SGD 67.41% 34.23%
Hoeffding Tree 60.72% 20.59%
Accuracy and Kappa for the test dataset obtained from
twittersentiment using the Edinburgh corpus as training
data stream.
26 / 26

Sentiment Knowledge Discovery in Twitter Streaming Data

  • 1.
    Sentiment Knowledge Discoveryin Twitter Streaming Data Albert Bifet and Eibe Frank University of Waikato Hamilton, New Zealand Canberra, 7 October 2010 Discovery Science 2010
  • 2.
    Twitter: A MassiveData Stream Web 2.0 Micro-blogging service Built to discover what is happening at any moment in time, anywhere in the world. 106 million registered users 600 million search queries per day 3 billion requests a day via its API. 2 / 26
  • 3.
    Outline 1 Twitter StreamingData 2 Twitter Sentiment Classification: Metrics and Methods 3 Empirical results 3 / 26
  • 4.
    Outline 1 Twitter StreamingData 2 Twitter Sentiment Classification: Metrics and Methods 3 Empirical results 4 / 26
  • 5.
    Data stream classificationcycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point 5 / 26
  • 6.
    Data stream classificationcycle Evaluation procedures for Data Streams Holdout Interleaved Test-Then-Train ("Prequential" Evaluation) 5 / 26
  • 7.
    Twitter Streaming API TwitterAPIs Streaming API Two discrete REST APIs Real-time access to Tweets sampled form filtered form HTTP based GET POST DELETE 6 / 26
  • 8.
    Sentiment Analysis onTwitter Sentiment analysis Classifying messages into two categories depending on whether they convey positive or negative feelings Emoticons are visual cues associated with emotional states, which can be used to define class labels for sentiment classification Positive Emoticons Negative Emoticons :) :( :-) :-( : ) : ( :D =) Table: List of positive and negative emoticons. 7 / 26
  • 9.
    Outline 1 Twitter StreamingData 2 Twitter Sentiment Classification: Metrics and Methods 3 Empirical results 8 / 26
  • 10.
    Streaming Data Evaluationwith Unbalanced Classes Predicted Predicted Class+ Class- Total Correct Class+ 75 8 83 Correct Class- 7 10 17 Total 82 18 100 Table: Simple confusion matrix example Predicted Predicted Class+ Class- Total Correct Class+ 68.06 14.94 83 Correct Class- 13.94 3.06 17 Total 82 18 100 Table: Confusion matrix for chance predictor 9 / 26
  • 11.
    Streaming Data Evaluationwith Unbalanced Classes Kappa Statistic p0: classifier’s prequential accuracy pc: probability that a chance classifier makes a correct prediction. κ statistic κ = p0 −pc 1−pc κ = 1 if the classifier is always correct κ = 0 if the predictions coincide with the correct ones as often as those of the chance classifier Forgetting mechanism for estimating prequential kappa Sliding window of size w with the most recent observations 10 / 26
  • 12.
    Data Stream MiningMethods Multinomial Naïve Bayes Considers a document as a bag-of-words. Estimates the probability of observing word w and the prior probability P(c) Probability of class c given a test document: P(c|d) = P(c)∏w∈d P(w|c)nwd P(d) 11 / 26
  • 13.
    Data Stream MiningMethods Stochastic Gradient Descent Vanilla stochastic gradient descent with a fixed learning rate Optimizing the hinge loss with an L2 penalty commonly applied to SVM Loss function to optimize: λ 2 ||w||2 +∑[1−(yxw+b)]+ 12 / 26
  • 14.
    Data Stream MiningMethods Hoeffding Tree Incremental decision tree for data streams. Strategy based on the Hoeffding bound ε = R2 ln(1/δ) 2n A node is expanded by splitting as soon as there is sufficient statistical evidence 13 / 26
  • 15.
    Outline 1 Twitter StreamingData 2 Twitter Sentiment Classification: Metrics and Methods 3 Empirical results 14 / 26
  • 16.
    What is MOA? {M}assive{O}nline {A}nalysis is a framework for mining data streams. Based on experience with Weka and VFML Focussed on classification trees, but lots of active development: clustering, item set and sequence mining, regression Easy to extend Easy to design and run experiments 15 / 26
  • 17.
    MOA: the bird TheMoa (another native NZ bird) is not only flightless, like the Weka, but also extinct. 16 / 26
  • 18.
    Twitter Sentiment Corpora TwitterSentiment Corpus twittersentiment.appspot.com Alec Go, Richa Bhayani, Karthik Raghunathan, and Lei Huang Website to research the sentiment for a brand, product, or topic. Training dataset with messages between April 2009 and June 25, 2009 800,000 tweets with positive emoticons 800,000 tweets with negative emoticons Test dataset manually annotated 177 negative tweets 182 positive ones 17 / 26
  • 19.
    Twitter Sentiment Corpora EdinburghCorpus http://demeter.inf.ed.ac.uk Sasa Petrovic, Miles Osborne, and Victor Lavrenko 97 million tweets (14 GB) Each tweet contains timestamp of the tweet, anonymized user name the tweet’s text the posting method that was used Collected between November 11th 2009 and February 1st 2010, using Twitter’s streaming API. 18 / 26
  • 20.
    Twitter Empirical Evaluation SlidingWindow Prequential Accuracy 30 40 50 60 70 80 90 100 0,01 0,08 0,15 0,22 0,29 0,36 0,43 0,5 0,57 0,64 0,71 0,78 0,85 0,92 0,99 1,06 1,13 1,2 1,27 1,34 1,41 1,48 1,55 Millions of Instances Accuracy% NB Multinomial SGD Hoeffding Tree Class Distribution Figure: Accuracy and Kappa Statistic on twittersentiment corpus 19 / 26
  • 21.
    Twitter Empirical Evaluation SlidingWindow Kappa Statistic 0 10 20 30 40 50 60 70 80 0,01 0,08 0,15 0,22 0,29 0,36 0,43 0,50 0,57 0,64 0,71 0,78 0,85 0,92 0,99 1,06 1,13 1,20 1,27 1,34 1,41 1,48 1,55 Millions of Instances KappaStatistic NB Multinomial SGD Hoeffding Tree Class Distribution Figure: Accuracy and Kappa Statistic on twittersentiment corpus 19 / 26
  • 22.
    Twitter Empirical Evaluation SlidingWindow Prequential Accuracy 75 77 79 81 83 85 87 89 91 93 95 0,01 0,1 0,19 0,28 0,37 0,46 0,55 0,64 0,73 0,82 0,91 1 1,09 1,18 1,27 1,36 1,45 1,54 1,63 1,72 1,81 1,9 1,99 2,08 Millions of Instances Accuracy% NB Multinomial SGD Hoeffding Tree Class Distribution Figure: Accuracy and Kappa Statistic on Edinburgh corpus 20 / 26
  • 23.
    Twitter Empirical Evaluation SlidingWindow Kappa Statistic 0 10 20 30 40 50 60 70 80 90 100 0,01 0,1 0,19 0,28 0,37 0,46 0,55 0,64 0,73 0,82 0,91 1 1,09 1,18 1,27 1,36 1,45 1,54 1,63 1,72 1,81 1,9 1,99 2,08 Millions of Instances KappaStatistic NB Multinomial SGD Hoeffding Tree Class Distribution Figure: Accuracy and Kappa Statistic on Edinburgh corpus 20 / 26
  • 24.
    twittersentiment Corpus Prequential Accuracyand Kappa Accuracy Kappa Time Multinomial Naïve Bayes 75.05% 50.10% 116.62 sec. SGD 82.80% 62.60% 219.54 sec. Hoeffding Tree 73.11% 46.23% 5525.51 sec. Total prequential accuracy and Kappa measured on the twittersentiment data stream 21 / 26
  • 25.
    Edinburgh Corpus Prequential Accuracyand Kappa Accuracy Kappa Time Multinomial Naïve Bayes 86.11% 36.15% 173.28, sec. SGD 86.26% 31.88% 293.98 sec. Hoeffding Tree 84.76% 20.40% 6151.51 sec. Total prequential accuracy and Kappa obtained on the Edinburgh corpus data stream. 22 / 26
  • 26.
    SGD coefficient variationson the Edinburgh corpus Middle of Stream End of Stream Tags Coefficient Coefficient Variation apple 0.3 0.7 0.4 microsoft -0.4 -0.1 0.3 facebook -0.3 0.4 0.7 mcdonalds 0.5 0.1 -0.4 google 0.3 0.6 0.3 disney 0.0 0.0 0.0 bmw 0.0 -0.2 -0.2 pepsi 0.1 -0.6 -0.7 dell 0.2 0.0 -0.2 gucci -0.4 0.6 1.0 amazon -0.1 -0.4 -0.3 23 / 26
  • 27.
    Summary Twitter is anew “what’s-happening-right-now” tool Twitter as a stream mining dataset for real-time predictions Sliding window Kappa statistic Recommend SGD-based model 24 / 26
  • 28.
    twittersentiment Corpus Hold-out Accuracyand Kappa Accuracy Kappa Multinomial Naïve Bayes 82.45% 64.89% SGD 78.55% 57.23% Hoeffding Tree 69.36% 38.73% Accuracy and Kappa for the test dataset obtained from twittersentiment 25 / 26
  • 29.
    Edinburgh Corpus Hold-out Accuracyand Kappa Accuracy Kappa Multinomial Naïve Bayes 73.81% 47.28% SGD 67.41% 34.23% Hoeffding Tree 60.72% 20.59% Accuracy and Kappa for the test dataset obtained from twittersentiment using the Edinburgh corpus as training data stream. 26 / 26