SlideShare a Scribd company logo
Between Custom and
Off-the-shelf NLP
Yves Peirsman
NLP LANDSCAPE
Cloud APIs:
train your own
models
+
Libraries:
pre-trained
models
Cloud APIs:
pre-trained
models
Libraries:
train your own
models
2
SENTIMENT ANALYSIS
“The process of computationally
identifying and categorizing opinions
expressed in a piece of text, especially
in order to determine whether the
writer's attitude towards a particular
topic, product, etc., is positive,
negative, or neutral.”
3
SENTIMENT ANALYSIS: APPLICATIONS
http://tarsier.monkeylearn.com
4
SENTIMENT ANALYSIS: APPLICATIONS
http://varianceexplained.org/r/trump-tweets/
5
SENTIMENT ANALYSIS: APPLICATIONS
http://www.stockfluence.com/
6
OVERVIEW
Off-the-shelf NLP DIY NLP Conclusions
7
NLP LANDSCAPE
Cloud APIs:
train your own
models
+
Libraries:
pre-trained
models
Cloud APIs:
pre-trained
models
Libraries:
train your own
models
8
DATA SETS
Domain Source Categories Baseline
Movie reviews rottentomatoes.com positive, negative 50.0%
Baby products Amazon
positive (****, *****),
negative (*, **)
81.5%
Android apps Amazon
positive (****, *****),
negative (*, **)
72.7%
Android apps Amazon, Wikipedia
positive (****, *****),
negative (*, **), neutral
51.7%
Hotels, restaurants Yelp
positive (****, *****),
negative (*, **)
70.8%
9
DATA SETS: EXAMPLES
Positive
If you're looking for
something scary,
this is the first great
horror film of the
spooky season.
Neutral
Avernum is a series of
demoware role-playing
video games by Jeff Vogel
of Spiderweb Software
available for Macintosh and
Windows-based computers.
Several are available for
iPad and Android tablet.
Negative
It's Starbucks only
with bad customer
service. Baristas with
attitude that don't
know their own
product. If I'm paying
$7.00 for a coffee at
least drop the 'tude.
10
OFF-THE-SHELF NLP
Is it OK to be lazy?
11
1.
Variables
OFF-THE-SHELF MODELS
Data
12
OFF-THE-SHELF MODELS: MOVIE REVIEWS 13
Indico 76.8%
IBM AlchemyAPI 73.4%
Stanford CoreNLP 71.9%
OFF-THE-SHELF MODELS: BABY PRODUCTS 14
Indico 92.6%
MonkeyLearn
(Product)
87.5%
TextBlob Pattern 82.5%
OFF-THE-SHELF MODELS: YELP REVIEWS 15
Indico 92.9%
Google 91.0%
IBM AlchemyAPI 90.4%
OFF-THE-SHELF MODELS: ANDROID APPS (2-WAY) 16
Indico 90.6%
Google 90.5%
MonkeyLearn
(Product)
87.1%
OFF-THE-SHELF MODELS: ANDROID APPS (3-WAY) 17
Indico 80.0%
HavenOnDemand 79.1%
Google 77.8%
OFF-THE-SHELF MODELS: CONCLUSIONS
There is enormous
variation between and
within off-the-shelf
solutions.
High quality is
possible, but not
guaranteed.
Comparing available
solutions on your
data is crucial.
18
DIY MODELS
Are you better off building your own custom models?
19
2.
DIY MODELS: PROCESS 20
Data Library Model
DIY MODELS: PROCESS 21
DIY MODELS: BABY PRODUCTS 22
DIY SVM 93.4%
Best off-the-shelf 92.6%
DIY Naive Bayes 86.3%
DIY MODELS: ANDROID APPS 23
DIY SVM 93.1%
Best off-the-shelf 90.6%
DIY Naive Bayes 90.3%
DIY MODELS: YELP REVIEWS 24
DIY SVM 95.6%
Best off-the-shelf 92.9%
DIY Naive Bayes 89.2%
DIY MODELS: BABY PRODUCTS 25
DIY MODELS: ANDROID APPS 26
DIY MODELS: CONCLUSIONS
You need sufficient
relevant data to build
a good model.
DIY models built with
sufficient data will
typically outperform
off-the-shelf
solutions.
DIY models may or
may not be worth the
effort.
27
Conclusions
28
3.
CONCLUSIONS
Off-the-shelf
no data
little effort
good quality possible, but
not guaranteed
no control
DIY
lots of data
more effort
superior quality
full control
29
BUT SOME ARE
30
WRONG
ALL MODELS ARE
USEFUL
- George Box
31
Any questions?
You can find me at:
» @yvespeirsman
» yves@nlp.town
32THANKS!

More Related Content

Similar to LT-Accelerate 2016: Between Custom and Off-the-shelf NLP

DeKnowledge - Try us
DeKnowledge - Try usDeKnowledge - Try us
DeKnowledge - Try us
Bob Pinto
 
Ruby codebases in an entropic universe
Ruby codebases in an entropic universeRuby codebases in an entropic universe
Ruby codebases in an entropic universe
Niranjan Paranjape
 
Django in the Real World
Django in the Real WorldDjango in the Real World
Django in the Real World
Jacob Kaplan-Moss
 
Infrastructure is development
Infrastructure is developmentInfrastructure is development
Infrastructure is development
stahnma
 
Fixing the program my computer learned: End-user debugging of machine-learned...
Fixing the program my computer learned: End-user debugging of machine-learned...Fixing the program my computer learned: End-user debugging of machine-learned...
Fixing the program my computer learned: End-user debugging of machine-learned...
City University London
 
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
Guillermo A. Fisher
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
Eoin Hurrell, PhD
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
openi_ict
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
Michael Petychakis
 
Mining apps for anomalies
Mining apps for anomaliesMining apps for anomalies
Mining apps for anomalies
Ahmed Kamel Taha
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOps
PROIDEA
 
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine CheronAPIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
apidays
 
Machine Learning Model for Gender Detection
Machine Learning Model for Gender DetectionMachine Learning Model for Gender Detection
Machine Learning Model for Gender Detection
TecnoIncentive
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
Animesh Singh
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Cinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patternsCinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patterns
Steven Smith
 
Web And App Design
Web And App DesignWeb And App Design
Web And App Design
Outreach Digital
 
Web & Mobile App Design for Non-Coders with Bubble.is
Web & Mobile App Design for Non-Coders with Bubble.isWeb & Mobile App Design for Non-Coders with Bubble.is
Web & Mobile App Design for Non-Coders with Bubble.is
James Eckhardt
 
Walter api
Walter apiWalter api
Walter api
Nicholas Schiller
 

Similar to LT-Accelerate 2016: Between Custom and Off-the-shelf NLP (20)

DeKnowledge - Try us
DeKnowledge - Try usDeKnowledge - Try us
DeKnowledge - Try us
 
Ruby codebases in an entropic universe
Ruby codebases in an entropic universeRuby codebases in an entropic universe
Ruby codebases in an entropic universe
 
Django in the Real World
Django in the Real WorldDjango in the Real World
Django in the Real World
 
Infrastructure is development
Infrastructure is developmentInfrastructure is development
Infrastructure is development
 
Fixing the program my computer learned: End-user debugging of machine-learned...
Fixing the program my computer learned: End-user debugging of machine-learned...Fixing the program my computer learned: End-user debugging of machine-learned...
Fixing the program my computer learned: End-user debugging of machine-learned...
 
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016PHP, AWS, and Sleep - Hampton Roads DevFest 2016
PHP, AWS, and Sleep - Hampton Roads DevFest 2016
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
 
Mining apps for anomalies
Mining apps for anomaliesMining apps for anomalies
Mining apps for anomalies
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
 
Atmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOpsAtmosphere Conference 2015: The 10 Myths of DevOps
Atmosphere Conference 2015: The 10 Myths of DevOps
 
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine CheronAPIdays Paris 2019 Backend is the new frontend by Antoine Cheron
APIdays Paris 2019 Backend is the new frontend by Antoine Cheron
 
Machine Learning Model for Gender Detection
Machine Learning Model for Gender DetectionMachine Learning Model for Gender Detection
Machine Learning Model for Gender Detection
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox Defend against adversarial AI using Adversarial Robustness Toolbox
Defend against adversarial AI using Adversarial Robustness Toolbox
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Cinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patternsCinci ug-january2011-anti-patterns
Cinci ug-january2011-anti-patterns
 
Web And App Design
Web And App DesignWeb And App Design
Web And App Design
 
Web & Mobile App Design for Non-Coders with Bubble.is
Web & Mobile App Design for Non-Coders with Bubble.isWeb & Mobile App Design for Non-Coders with Bubble.is
Web & Mobile App Design for Non-Coders with Bubble.is
 
Walter api
Walter apiWalter api
Walter api
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 

LT-Accelerate 2016: Between Custom and Off-the-shelf NLP