SlideShare a Scribd company logo
1 of 28
ADVISORY
PARENTAL
EXPLICIT CONTENT
What Makes a
Good Model?
Team
Grant
dammnit I'm lit.
&dammnit I kn0
ders b0ut2be kiLLer
traFFic! & ya d0nt
even kn0 h0w haPPy I
am dats its back2sch00l
Or, Twitter Sentiment Analysis:
using models to classify tweets
so you don’t have to
a good model is
1. Valuable
2. Accurate
3. Sophisticated
4. Agile
1
a good model is valuable
=
𝑖,𝑗 𝑚𝑜𝑛𝑘𝑒𝑦𝑠 𝑖×𝑡𝑦𝑝𝑒𝑤𝑟𝑖𝑡𝑒𝑟𝑠 𝑗 ×
𝑃𝑟𝑜𝑏 𝐷𝑢𝑐𝑘 +𝑃𝑙𝑎𝑛𝑒+𝐹𝑜𝑟𝑒𝑠𝑡
𝑉𝑎𝑙𝑢𝑒 𝑝𝑒𝑟 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡
1+
1
𝑠𝑝𝑒𝑙𝑙𝑐ℎ𝑒𝑐𝑘
+ 2×𝑈𝑛𝑑𝑒𝑟𝑠𝑡𝑎𝑛𝑑 𝑇𝑤𝑒𝑒𝑡 → 𝑂𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒+𝑁𝑜𝑛𝑠𝑒𝑛𝑠𝑖𝑐𝑎𝑙
2
a good model is accurate
(and the limits of that accuracy understood)
%
*Performance on 20% hold-out sample.
It’s a hell of a lot better on the training sample.
(Obviously.)
*
*2%
better than
55%
hosted sentiment classifier
75%
trained sentiment classifier
39% 11%
12% 38%
of the 23% the model got wrong…
model error 41%
neutral 30%
human error 15%
other 13%
You ever have those days
where you feel like you = FAIL.
Yeah. It's one of those days.
Model + / Human -
UP is intense! i cried
and laughed
Model - / Human +
Sorry, typo -
Environmentalism.
Model - / Human +
@Zee It's good, but buggy
like a motherfucker.
Model + / Human -
I really hate twitter... i don't
know what i'm doing here
Model - / Human +
so tierd could drop
DEAD x
Model - / Human +
ActiveRecord::HasManyThroughSourceAssoc
iationMacroError: Invalid source reflection
macro :has_one for has_many ->
http://bit.ly/135UWH
Model + / Human -
@Dichenlachman I like that you
abbreviated bathrooms to b'throoms when
b'throoms is the same no. of letters as
bathrooms... Bathrooms
Model - / Human +
bootstrapped hold-out performance
0.75 0.76 0.77 0.78 0.79
μ
0.768
-1 σ
0.761
-2 σ
0.755
1 σ
0.775
2 σ
0.781
3 σ
0.788
3
a good model is sophisticated
(but not too sophisticated)
classification process
raw tweets
NLP &
features
model
specification
training analysis
made it hurt like a motherfucker fuck my life & i
am not that short & your tall & i did grow some
balls & date night tonight htp bit ly/nos
MADD-E. it hurt like a MOTHERFUCKER fuck
my life & I am not that short & yr tall & i did grow
some balls & date night tonight!1!
http://bit.ly/Nos9D
1 raw tweet
2
5 vectorize [ 0 0 1 0 0 … 0 0 1 0 0 1 ] 6 model
MADD-E. it huuuurt like a MOTHERFUCKER fml
& i’m not that short & yr tall & i
did grow some balls & date night tonight!1!
http://bit.ly/Nos9D
3
expand contractions
social media lexicon
corrected XML
repeat replace
spellcheck
remove punctuation
remove numbers
all lowercase
4 uni-grams
bi-grams { made, it, made it, … }
why didn’t we do other cool NLP stuff?
0.74
0.75
0.76
0.77
0.78
tweets what we did english only remove
Twitter
symbols
remove
stopwords
stem
accuracy
why didn’t we do other cool NLP stuff?
0.74
0.75
0.76
0.77
0.78
tweets what we did english only remove
Twitter
symbols
remove
stopwords
stem
accuracy
raw
spellcheck
normalize
case
stem /
lemmatize
why does that happen?
LOVIN’
LOVING
loving
love
LOVIN LOVING
Loving loving
love loved
raw
spellcheck
normalize
case
stem /
lemmatize
why does that happen?
LOVIN’
LOVING
loving
love
fewer
dimensions
(good)
less
information
(bad)
team Grant model specification
0.45
0.50
0.55
0.60
0.65
0.70
0.75
analysis
0.45
0.50
0.55
0.60
0.65
0.70
0.75
analysis
0.45
0.50
0.55
0.60
0.65
0.70
0.75
analysis
how do they work?
linear SVM
naïve Bayes
random forest
consensus
76%
76%
74%
DEMOCRACY!
77%
4
a good model is agile
1. genetically diverse
2. ensemble can handle more libraries / classifiers
3. modular design
a) NLP
b) feature detection
c) models
4. sequential checks
5. quick enough to classify the firehose
6. easily incorporate new cases for re-training
Twitter Sentiment Analysis - final - no personal

More Related Content

Similar to Twitter Sentiment Analysis - final - no personal

Free Clean Template for your Webinars (by Livestorm)
Free Clean Template for your Webinars (by Livestorm)Free Clean Template for your Webinars (by Livestorm)
Free Clean Template for your Webinars (by Livestorm)Livestorm
 
Data Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdData Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdFormulatedby
 
Presentation skills demo 1
Presentation skills demo 1Presentation skills demo 1
Presentation skills demo 1Rahul Guru
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech TalentRecruitingDaily.com LLC
 
Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers John Fagan
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
You shouldneverdo
You shouldneverdoYou shouldneverdo
You shouldneverdodaniil3
 
Beginner's luck - A story about learning and teaching Clojure
Beginner's luck - A story about learning and teaching ClojureBeginner's luck - A story about learning and teaching Clojure
Beginner's luck - A story about learning and teaching ClojureDaniel Glauser
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerProduct School
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPJustin Long
 
Leveraging AI & ML to Automoate Repetitive Tasks
Leveraging AI & ML to Automoate Repetitive TasksLeveraging AI & ML to Automoate Repetitive Tasks
Leveraging AI & ML to Automoate Repetitive TasksSabrinaBandel1
 
Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Siddharth Adelkar
 
UX STRAT Online 2021 Presentation by Gideon Simons, Zinier
UX STRAT Online 2021 Presentation by Gideon Simons, ZinierUX STRAT Online 2021 Presentation by Gideon Simons, Zinier
UX STRAT Online 2021 Presentation by Gideon Simons, ZinierUX STRAT
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
TMA 2015 The Technical Mind
TMA 2015 The Technical MindTMA 2015 The Technical Mind
TMA 2015 The Technical MindSteve Levy
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingGreg Wilson
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnAmol Agrawal
 

Similar to Twitter Sentiment Analysis - final - no personal (20)

Tensorflow go
Tensorflow goTensorflow go
Tensorflow go
 
Free Clean Template for your Webinars (by Livestorm)
Free Clean Template for your Webinars (by Livestorm)Free Clean Template for your Webinars (by Livestorm)
Free Clean Template for your Webinars (by Livestorm)
 
Data Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ ScribdData Science Salon: Deep Learning as a Product @ Scribd
Data Science Salon: Deep Learning as a Product @ Scribd
 
Presentation skills demo 1
Presentation skills demo 1Presentation skills demo 1
Presentation skills demo 1
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
Master Technical Recruiting Workshop:  How to Recruit Top Tech TalentMaster Technical Recruiting Workshop:  How to Recruit Top Tech Talent
Master Technical Recruiting Workshop: How to Recruit Top Tech Talent
 
Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers Building Agile & AI startups - Basic tips for Product Managers
Building Agile & AI startups - Basic tips for Product Managers
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
You shouldneverdo
You shouldneverdoYou shouldneverdo
You shouldneverdo
 
Beginner's luck - A story about learning and teaching Clojure
Beginner's luck - A story about learning and teaching ClojureBeginner's luck - A story about learning and teaching Clojure
Beginner's luck - A story about learning and teaching Clojure
 
Intro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product ManagerIntro to Machine Learning by Google Product Manager
Intro to Machine Learning by Google Product Manager
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
Leveraging AI & ML to Automoate Repetitive Tasks
Leveraging AI & ML to Automoate Repetitive TasksLeveraging AI & ML to Automoate Repetitive Tasks
Leveraging AI & ML to Automoate Repetitive Tasks
 
Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020Machine Learning Workshop, TSEC 2020
Machine Learning Workshop, TSEC 2020
 
UX STRAT Online 2021 Presentation by Gideon Simons, Zinier
UX STRAT Online 2021 Presentation by Gideon Simons, ZinierUX STRAT Online 2021 Presentation by Gideon Simons, Zinier
UX STRAT Online 2021 Presentation by Gideon Simons, Zinier
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
TMA 2015 The Technical Mind
TMA 2015 The Technical MindTMA 2015 The Technical Mind
TMA 2015 The Technical Mind
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
 
You Suck At PowerPoint! by @jessedee
You Suck At PowerPoint! by @jessedeeYou Suck At PowerPoint! by @jessedee
You Suck At PowerPoint! by @jessedee
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 

Twitter Sentiment Analysis - final - no personal

  • 1.
  • 3. What Makes a Good Model? Team Grant dammnit I'm lit. &dammnit I kn0 ders b0ut2be kiLLer traFFic! & ya d0nt even kn0 h0w haPPy I am dats its back2sch00l Or, Twitter Sentiment Analysis: using models to classify tweets so you don’t have to
  • 4. a good model is 1. Valuable 2. Accurate 3. Sophisticated 4. Agile
  • 5. 1 a good model is valuable
  • 6. = 𝑖,𝑗 𝑚𝑜𝑛𝑘𝑒𝑦𝑠 𝑖×𝑡𝑦𝑝𝑒𝑤𝑟𝑖𝑡𝑒𝑟𝑠 𝑗 × 𝑃𝑟𝑜𝑏 𝐷𝑢𝑐𝑘 +𝑃𝑙𝑎𝑛𝑒+𝐹𝑜𝑟𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 𝑝𝑒𝑟 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 1+ 1 𝑠𝑝𝑒𝑙𝑙𝑐ℎ𝑒𝑐𝑘 + 2×𝑈𝑛𝑑𝑒𝑟𝑠𝑡𝑎𝑛𝑑 𝑇𝑤𝑒𝑒𝑡 → 𝑂𝑓𝑓𝑒𝑛𝑠𝑖𝑣𝑒+𝑁𝑜𝑛𝑠𝑒𝑛𝑠𝑖𝑐𝑎𝑙
  • 7. 2 a good model is accurate (and the limits of that accuracy understood)
  • 8. % *Performance on 20% hold-out sample. It’s a hell of a lot better on the training sample. (Obviously.) *
  • 9. *2% better than 55% hosted sentiment classifier 75% trained sentiment classifier
  • 11. of the 23% the model got wrong… model error 41% neutral 30% human error 15% other 13% You ever have those days where you feel like you = FAIL. Yeah. It's one of those days. Model + / Human - UP is intense! i cried and laughed Model - / Human + Sorry, typo - Environmentalism. Model - / Human + @Zee It's good, but buggy like a motherfucker. Model + / Human - I really hate twitter... i don't know what i'm doing here Model - / Human + so tierd could drop DEAD x Model - / Human + ActiveRecord::HasManyThroughSourceAssoc iationMacroError: Invalid source reflection macro :has_one for has_many -> http://bit.ly/135UWH Model + / Human - @Dichenlachman I like that you abbreviated bathrooms to b'throoms when b'throoms is the same no. of letters as bathrooms... Bathrooms Model - / Human +
  • 12. bootstrapped hold-out performance 0.75 0.76 0.77 0.78 0.79 μ 0.768 -1 σ 0.761 -2 σ 0.755 1 σ 0.775 2 σ 0.781 3 σ 0.788
  • 13. 3 a good model is sophisticated (but not too sophisticated)
  • 14. classification process raw tweets NLP & features model specification training analysis
  • 15. made it hurt like a motherfucker fuck my life & i am not that short & your tall & i did grow some balls & date night tonight htp bit ly/nos MADD-E. it hurt like a MOTHERFUCKER fuck my life & I am not that short & yr tall & i did grow some balls & date night tonight!1! http://bit.ly/Nos9D 1 raw tweet 2 5 vectorize [ 0 0 1 0 0 … 0 0 1 0 0 1 ] 6 model MADD-E. it huuuurt like a MOTHERFUCKER fml & i’m not that short & yr tall & i did grow some balls & date night tonight!1! http://bit.ly/Nos9D 3 expand contractions social media lexicon corrected XML repeat replace spellcheck remove punctuation remove numbers all lowercase 4 uni-grams bi-grams { made, it, made it, … }
  • 16. why didn’t we do other cool NLP stuff? 0.74 0.75 0.76 0.77 0.78 tweets what we did english only remove Twitter symbols remove stopwords stem accuracy
  • 17. why didn’t we do other cool NLP stuff? 0.74 0.75 0.76 0.77 0.78 tweets what we did english only remove Twitter symbols remove stopwords stem accuracy
  • 18. raw spellcheck normalize case stem / lemmatize why does that happen? LOVIN’ LOVING loving love LOVIN LOVING Loving loving love loved
  • 19. raw spellcheck normalize case stem / lemmatize why does that happen? LOVIN’ LOVING loving love fewer dimensions (good) less information (bad)
  • 20. team Grant model specification
  • 24. how do they work? linear SVM naïve Bayes random forest
  • 26. 4 a good model is agile
  • 27. 1. genetically diverse 2. ensemble can handle more libraries / classifiers 3. modular design a) NLP b) feature detection c) models 4. sequential checks 5. quick enough to classify the firehose 6. easily incorporate new cases for re-training

Editor's Notes

  1. Stas
  2. Stas What it says.
  3. Stas Order is important.
  4. Stas
  5. Stas
  6. Stas
  7. Qahir
  8. Qahir
  9. Qahir Data went through Google’s pre-trained sentiment detector it was terrible And then was trained using google algorithms better, but not as good as our model
  10. Qahir Confusion matrix Reiterating results Rounding errors Symmetry indicates a lack of bias
  11. Chad [slide can be modified to be all percentages if required] We got our hands dirty and looked at the 23% of tweets the model got wrong 41% due to model error [first tweet] makes sense; strong negative sentiment [second tweet] the model failed 30% were neutral [first tweet] depending on your political views… [second tweet] contains two distinct sentiments. 15% were grad student error [first tweet] clear negative sentiment, labelled positive [second tweet] clear negative sentiment again, labelled positive 30% were for other reasons [first tweet] some sort of lookup error [second tweet] uh…
  12. Chad
  13. Chad
  14. Chad
  15. Chad Read raw tweet expanded contractions, like I’m to iam Social media lexicon dealt with acronyms like fml corrected xml extraction errors Dealt with repeated characters. Then spellcheck Removed punctuation and numbers Normalized case Created one and two word features Finally vectorised features into ones and zeroes
  16. Keyi
  17. Keyi
  18. Keyi
  19. Keyi
  20. Francisco
  21. Francisco
  22. Francisco
  23. Francisco
  24. Francisco
  25. Keyi
  26. Keyi