Twitter Sentiment Analysis - final - no personal

ADVISORY
PARENTAL
EXPLICIT CONTENT

What Makes a
Good Model?
Team
Grant
dammnit I'm lit.
&dammnit I kn0
ders b0ut2be kiLLer
traFFic! & ya d0nt
even kn0 h0w haPPy I
am dats its back2sch00l
Or, Twitter Sentiment Analysis:
using models to classify tweets
so you don’t have to

a good model is
1. Valuable
2. Accurate
3. Sophisticated
4. Agile

=
𝑖,𝑗 𝑚𝑜𝑛𝑘𝑒𝑦𝑠 𝑖×𝑡𝑦𝑝𝑒𝑤𝑟𝑖𝑡𝑒𝑟𝑠 𝑗 ×
𝑃𝑟𝑜𝑏 𝐷𝑢𝑐𝑘 +𝑃𝑙𝑎𝑛𝑒+𝐹𝑜𝑟𝑒𝑠𝑡
𝑉𝑎𝑙𝑢𝑒 𝑝𝑒𝑟 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
1+
1
𝑠𝑝𝑒𝑙𝑙𝑐ℎ𝑒𝑐𝑘
+ 2×𝑈𝑛𝑑𝑒𝑟𝑠𝑡𝑎𝑛𝑑 𝑇𝑤𝑒𝑒𝑡 → 𝑂𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒+𝑁𝑜𝑛𝑠𝑒𝑛𝑠𝑖𝑐𝑎𝑙

2
a good model is accurate
(and the limits of that accuracy understood)

%
*Performance on 20% hold-out sample.
It’s a hell of a lot better on the training sample.
(Obviously.)
*

*2%
better than
55%
hosted sentiment classifier
75%
trained sentiment classifier

of the 23% the model got wrong…
model error 41%
neutral 30%
human error 15%
other 13%
You ever have those days
where you feel like you = FAIL.
Yeah. It's one of those days.
Model + / Human -
UP is intense! i cried
and laughed
Model - / Human +
Sorry, typo -
Environmentalism.
Model - / Human +
@Zee It's good, but buggy
like a motherfucker.
Model + / Human -
I really hate twitter... i don't
know what i'm doing here
Model - / Human +
so tierd could drop
DEAD x
Model - / Human +
ActiveRecord::HasManyThroughSourceAssoc
iationMacroError: Invalid source reflection
macro :has_one for has_many ->
http://bit.ly/135UWH
Model + / Human -
@Dichenlachman I like that you
abbreviated bathrooms to b'throoms when
b'throoms is the same no. of letters as
bathrooms... Bathrooms
Model - / Human +

bootstrapped hold-out performance
0.75 0.76 0.77 0.78 0.79
μ
0.768
-1 σ
0.761
-2 σ
0.755
1 σ
0.775
2 σ
0.781
3 σ
0.788

3
a good model is sophisticated
(but not too sophisticated)

classification process
raw tweets
NLP &
features
model
specification
training analysis

made it hurt like a motherfucker fuck my life & i
am not that short & your tall & i did grow some
balls & date night tonight htp bit ly/nos
MADD-E. it hurt like a MOTHERFUCKER fuck
my life & I am not that short & yr tall & i did grow
some balls & date night tonight!1!
http://bit.ly/Nos9D
1 raw tweet
2
5 vectorize [ 0 0 1 0 0 … 0 0 1 0 0 1 ] 6 model
MADD-E. it huuuurt like a MOTHERFUCKER fml
& i’m not that short & yr tall & i
did grow some balls & date night tonight!1!
http://bit.ly/Nos9D
3
expand contractions
social media lexicon
corrected XML
repeat replace
spellcheck
remove punctuation
remove numbers
all lowercase
4 uni-grams
bi-grams { made, it, made it, … }

why didn’t we do other cool NLP stuff?
0.74
0.75
0.76
0.77
0.78
tweets what we did english only remove
Twitter
symbols
remove
stopwords
stem
accuracy

raw
spellcheck
normalize
case
stem /
lemmatize
why does that happen?
LOVIN’
LOVING
loving
love
LOVIN LOVING
Loving loving
love loved

raw
spellcheck
normalize
case
stem /
lemmatize
why does that happen?
LOVIN’
LOVING
loving
love
fewer
dimensions
(good)
less
information
(bad)

team Grant model specification

0.45
0.50
0.55
0.60
0.65
0.70
0.75
analysis

how do they work?
linear SVM
naïve Bayes
random forest

consensus
76%
76%
74%
DEMOCRACY!
77%

1. genetically diverse
2. ensemble can handle more libraries / classifiers
3. modular design
a) NLP
b) feature detection
c) models
4. sequential checks
5. quick enough to classify the firehose
6. easily incorporate new cases for re-training

Twitter Sentiment Analysis - final - no personal

Twitter Sentiment Analysis - final - no personal

Recommended

Recommended

More Related Content

Similar to Twitter Sentiment Analysis - final - no personal

Similar to Twitter Sentiment Analysis - final - no personal (20)

Twitter Sentiment Analysis - final - no personal

Editor's Notes