Small Data for Big
Problems
Practical Transfer
Learning for NLP
Who Am I?
• Founder, CTO Indico
• Research done at Olin College of
Engineering
• Indico focuses Intelligent Process
Automation for Unstructured
Content
• Leverages Indico innovation in
Transfer Learning for text and image
content
Agenda
• Overview of Traditional Approaches to
Feature Engineering in NLP
• Introduction to Transfer Learning and
Text Embeddings
• Word Embeddings vs. Text
Embeddings
• Takeaways and Resources
Assumed
Knowledge
• Traditional NLP Basics (e.g. tf-idf
vectors)
• Traditional Data Science Basics (e.g.
Logistic Regression)
• Generic Math Background (e.g. Vector
Spaces)
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
Name
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
Name
Traditional Solution(s)
• Tf-idf
• Soundex/NYSIIS encoding
• Ignore – low algorithmic value
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
Name
Issues(s)
• Out of Vocabulary
Traditional Solution(s)
• Tf-idf
• Soundex/NYSIIS encoding
• Ignore – low algorithmic value
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Gender
• Location
• Age
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Gender
• Location
• Age
Traditional Solution(s)
• Tf-idf
• Hand-coded features (i.e.
gender)
• Location dictionary
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Gender
• Location
• Age
Issues(s)
• Local Context: His birthday vs
his daughter’s birthday
• Brittle gender detection
• Location detection
Traditional Solution(s)
• Tf-idf
• Hand-coded features (i.e.
gender)
• Location dictionary
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Activity
• Prior Affliction/Treatment
• Travel
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Activity
• Prior Affliction/Treatment
• Travel
Traditional Solution(s)
• Tf-idf
• Parse trees (soreness ->
elbow)
• Domain-specific lexicon
The Problem With Text
John Malkovitch plays tennis in Winchester. He
has been reporting soreness in his elbow. His
60th birthday is in two weeks. After he returns
from his birthday trip to Casablanca we will
recommend a steroid shot to reduce
inflammation.
Feature(s)
• Activity
• Prior Affliction/Treatment
• Travel
Issues(s)
• Linguistic Context (Semantics)
• Error-prone parse trees
• Maintaining the lexicon
Traditional Solution(s)
• Tf-idf
• Parse trees (soreness ->
elbow)
• Domain-specific lexicon
The Problem With Text
Problem Traditional Solution Traditional Problem
Linguistic Context • Stemming
• Synonym sets
• Lexicons
• Brittle
• Labor-intensive
• Messy real-world data
Local Context • Parse trees
• N-grams
• Phrase lexicon
• Inaccurate parsing
• Limited Context
• Messy real-world data
Out of Vocabulary Issues • Lemmatization
• Expanded vocabulary
• Ignore
• Computationally expensive
• Diminishing returns
• Messy real-world data
Problems with
Small Data
Add Linguistic Context (Semantics)
Add Local Context
Prevent Out of Vocabulary Issues
Enter Embeddings Transfer Learning
What is an Embedding?
Text Space
(e.g. English)
Embedding Space
(e.g. R300)
Embedding Method
(e.g. Word2Vec)
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
What is an Embedding?
Text Space
(e.g. English)
Embedding Space
(e.g. R300)
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
Embedding Method
(e.g. Word2Vec)
Linguistic Context
(e.g. Wikipedia)
Pitfalls
• Sufficient, Diverse Linguistic Context
• Clean Test/Train Splits
• The Curse of Dimensionality
• Effective Benchmarking
King
Queen
- man
+ woman
(Royalty)
How do Embeddings Work?
• Meaning is “encoded” into the
embedding space
• Individual dimensions are not
human interpretable
• Embedding method learns by
examining large corpora of
generic language
• Goal is accurate language
representation as a proxy for
downstream performance
“Word” Embeddings
Examples
• Word2vec
• GloVe
• fastText
“Word” Embeddings
Token Value
“great” [0.1, 0.3, …]
… …
Examples In Practice
• Word2vec
• GloVe
• fastText
“Word” Embeddings
Token Value
“great” [0.1, 0.3, …]
… …
Examples In Practice
Training
The quick brown fox _____ over the lazy dog
___ ___ ____ ___ jumps ___ __ ___ ___
CBOW
Skip Gram
• Word2vec
• GloVe
• fastText
Do They Really Preserve Algorithmic Value?
• Embeddings generally
outperform raw text at low data
volumes
• Leveraging large, generic text
corpora improves
generalizability
• This is 4 year old tech.
Embeddings have improved
drastically. Text has not.
Reported numbers are the average of 5 runs of randomly sampled test/train splits
each reporting the average of a 5-fold cv, within which Logistic Regression
hyperparameters are optimized. Generated using Enso
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
Accuracy
Number of Data Points
Glove Benchmark (Movie Review Sentiment
Analysis)
tf-idf
Glove
Problems with
Small Data
Add Linguistic Context (Semantics)
Add Local Context
Prevent Out of Vocabulary Issues
Text Embeddings
Examples
• Doc2vec
• Elmo
• ULMFiT
Text Embeddings
Examples
In Practice
Often built on top of pre-trained word embeddings
• Doc2vec
• Elmo
• ULMFiT
Text Embeddings
Examples In Practice
Training
The quick brown fox jumps over the lazy
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
Language
Supervised
dog
True
Often built on top of pre-trained word embeddings
• Doc2vec
• Elmo
• ULMFiT
Text Embeddings
CNN-Style
The quick brown fox jumps over the lazy
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
Prediction
https://arxiv.org/pdf/1408.5882.pdf
Example
Text Embeddings
RNN-Style
The quick brown fox jumps over the lazy
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
Output
Memory
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.1
0.2
0.8
0.1
0.3
0.6
0.8
0.3
…
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
…
σ σ σ σ σ σ σ σ
Prediction
https://arxiv.org/pdf/1802.05365.pdf
Example
Add Linguistic Context (Semantics)
Add Local Context
Prevent Out of Vocabulary Issues
Problems with
Small Data
The Power of Context
We used a bytepair encoding (BPE) vocabulary…
significantly improving upon the state of the art in 9 out of
the 12 tasks studied
- Improving Language Understanding by Generative Pre-Training*
* https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-
unsupervised/language_understanding_paper.pdf
Problems with
Small Data
Add Linguistic Context (Semantics)
Add Local Context
Prevent Out of Vocabulary Issues
Do They Really Preserve Algorithmic Value?
• Newer transfer learning
techniques have made deep
learning at low data volumes
tractable
• Even when operating on top of
byte-pair encodings sufficient
context is retained to achieve
sota performance
• 4x error reduction over tf-idf
Reported numbers are the average of 5 runs of randomly sampled test/train splits
each reporting the average of a 5-fold cv, within which Logistic Regression
hyperparameters are optimized. Generated using Enso
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
50
75
100
125
150
175
200
225
250
275
300
325
350
375
400
425
450
475
500
Accuracy
Number of Data Points
Finetune Benchmark (Movie Review Sentiment
Analysis)
tf-idf
Glove
Finetune
Takeaways
• At low data volumes embeddings
drastically improve accuracy via
transfer learning
• The transfer learning space moves
very quickly. Adoption of Glove is very
low, but already out of date
• This is just basic framing. Practical use
of embeddings is more complex. See
our session at DSS to learn more
Resources
• Github library – Finetune
(https://github.com/indicodatasolutions/finetune)
• Github library – Enso
(https://github.com/indicodatasolutions/enso)
• Indico Machine learning newsletter
(indico.io)
• Deep Learning Book
(https://www.deeplearningbook.org/)
Questions?
• slater@indico.io
• Quora: https://www.quora.com/profile/Slater-Ryan-Victoroff
The Real Problem With Text
Select
Features
Optimize
Hyper
parameters
Test/Train
Split
Train
Model
Evaluate
Errors and
View Test
Error
Feature Engineering?
Standard Data Science?
The Real Problem With Text
Select
Features
Optimize
Hyper
parameters
Test/Train
Split
Train
Model
Evaluate
Errors and
View Test
Error
Feature Engineering?
Standard Data Science?
Overfitting
Test/Train Contamination
The Real Problem With Text
Select
Features
Optimize
Hyper
parameters
Test/Train
Split
Train
Model
Evaluate
Errors and
View Test
Error
Feature Engineering?
Standard Data Science?
Overfitting
Test/Train Contamination
Manual feature engineering
leads to inaccurate
perceptions of performance

Small Data for Big Problems: Practical Transfer Learning for NLP

  • 1.
    Small Data forBig Problems Practical Transfer Learning for NLP
  • 2.
    Who Am I? •Founder, CTO Indico • Research done at Olin College of Engineering • Indico focuses Intelligent Process Automation for Unstructured Content • Leverages Indico innovation in Transfer Learning for text and image content
  • 3.
    Agenda • Overview ofTraditional Approaches to Feature Engineering in NLP • Introduction to Transfer Learning and Text Embeddings • Word Embeddings vs. Text Embeddings • Takeaways and Resources
  • 4.
    Assumed Knowledge • Traditional NLPBasics (e.g. tf-idf vectors) • Traditional Data Science Basics (e.g. Logistic Regression) • Generic Math Background (e.g. Vector Spaces)
  • 5.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) Name
  • 6.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) Name Traditional Solution(s) • Tf-idf • Soundex/NYSIIS encoding • Ignore – low algorithmic value
  • 7.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) Name Issues(s) • Out of Vocabulary Traditional Solution(s) • Tf-idf • Soundex/NYSIIS encoding • Ignore – low algorithmic value
  • 8.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Gender • Location • Age
  • 9.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Gender • Location • Age Traditional Solution(s) • Tf-idf • Hand-coded features (i.e. gender) • Location dictionary
  • 10.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Gender • Location • Age Issues(s) • Local Context: His birthday vs his daughter’s birthday • Brittle gender detection • Location detection Traditional Solution(s) • Tf-idf • Hand-coded features (i.e. gender) • Location dictionary
  • 11.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Activity • Prior Affliction/Treatment • Travel
  • 12.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Activity • Prior Affliction/Treatment • Travel Traditional Solution(s) • Tf-idf • Parse trees (soreness -> elbow) • Domain-specific lexicon
  • 13.
    The Problem WithText John Malkovitch plays tennis in Winchester. He has been reporting soreness in his elbow. His 60th birthday is in two weeks. After he returns from his birthday trip to Casablanca we will recommend a steroid shot to reduce inflammation. Feature(s) • Activity • Prior Affliction/Treatment • Travel Issues(s) • Linguistic Context (Semantics) • Error-prone parse trees • Maintaining the lexicon Traditional Solution(s) • Tf-idf • Parse trees (soreness -> elbow) • Domain-specific lexicon
  • 14.
    The Problem WithText Problem Traditional Solution Traditional Problem Linguistic Context • Stemming • Synonym sets • Lexicons • Brittle • Labor-intensive • Messy real-world data Local Context • Parse trees • N-grams • Phrase lexicon • Inaccurate parsing • Limited Context • Messy real-world data Out of Vocabulary Issues • Lemmatization • Expanded vocabulary • Ignore • Computationally expensive • Diminishing returns • Messy real-world data
  • 15.
    Problems with Small Data AddLinguistic Context (Semantics) Add Local Context Prevent Out of Vocabulary Issues
  • 16.
  • 17.
    What is anEmbedding? Text Space (e.g. English) Embedding Space (e.g. R300) Embedding Method (e.g. Word2Vec) 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 …
  • 18.
    What is anEmbedding? Text Space (e.g. English) Embedding Space (e.g. R300) 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … Embedding Method (e.g. Word2Vec) Linguistic Context (e.g. Wikipedia)
  • 19.
    Pitfalls • Sufficient, DiverseLinguistic Context • Clean Test/Train Splits • The Curse of Dimensionality • Effective Benchmarking
  • 20.
    King Queen - man + woman (Royalty) Howdo Embeddings Work? • Meaning is “encoded” into the embedding space • Individual dimensions are not human interpretable • Embedding method learns by examining large corpora of generic language • Goal is accurate language representation as a proxy for downstream performance
  • 21.
  • 22.
    “Word” Embeddings Token Value “great”[0.1, 0.3, …] … … Examples In Practice • Word2vec • GloVe • fastText
  • 23.
    “Word” Embeddings Token Value “great”[0.1, 0.3, …] … … Examples In Practice Training The quick brown fox _____ over the lazy dog ___ ___ ____ ___ jumps ___ __ ___ ___ CBOW Skip Gram • Word2vec • GloVe • fastText
  • 24.
    Do They ReallyPreserve Algorithmic Value? • Embeddings generally outperform raw text at low data volumes • Leveraging large, generic text corpora improves generalizability • This is 4 year old tech. Embeddings have improved drastically. Text has not. Reported numbers are the average of 5 runs of randomly sampled test/train splits each reporting the average of a 5-fold cv, within which Logistic Regression hyperparameters are optimized. Generated using Enso 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Accuracy Number of Data Points Glove Benchmark (Movie Review Sentiment Analysis) tf-idf Glove
  • 25.
    Problems with Small Data AddLinguistic Context (Semantics) Add Local Context Prevent Out of Vocabulary Issues
  • 26.
  • 27.
    Text Embeddings Examples In Practice Oftenbuilt on top of pre-trained word embeddings • Doc2vec • Elmo • ULMFiT
  • 28.
    Text Embeddings Examples InPractice Training The quick brown fox jumps over the lazy 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … Language Supervised dog True Often built on top of pre-trained word embeddings • Doc2vec • Elmo • ULMFiT
  • 29.
    Text Embeddings CNN-Style The quickbrown fox jumps over the lazy 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … Prediction https://arxiv.org/pdf/1408.5882.pdf Example
  • 30.
    Text Embeddings RNN-Style The quickbrown fox jumps over the lazy 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … Output Memory 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.1 0.2 0.8 0.1 0.3 0.6 0.8 0.3 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … σ σ σ σ σ σ σ σ Prediction https://arxiv.org/pdf/1802.05365.pdf Example
  • 31.
    Add Linguistic Context(Semantics) Add Local Context Prevent Out of Vocabulary Issues Problems with Small Data
  • 32.
    The Power ofContext We used a bytepair encoding (BPE) vocabulary… significantly improving upon the state of the art in 9 out of the 12 tasks studied - Improving Language Understanding by Generative Pre-Training* * https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language- unsupervised/language_understanding_paper.pdf
  • 33.
    Problems with Small Data AddLinguistic Context (Semantics) Add Local Context Prevent Out of Vocabulary Issues
  • 34.
    Do They ReallyPreserve Algorithmic Value? • Newer transfer learning techniques have made deep learning at low data volumes tractable • Even when operating on top of byte-pair encodings sufficient context is retained to achieve sota performance • 4x error reduction over tf-idf Reported numbers are the average of 5 runs of randomly sampled test/train splits each reporting the average of a 5-fold cv, within which Logistic Regression hyperparameters are optimized. Generated using Enso 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 Accuracy Number of Data Points Finetune Benchmark (Movie Review Sentiment Analysis) tf-idf Glove Finetune
  • 35.
    Takeaways • At lowdata volumes embeddings drastically improve accuracy via transfer learning • The transfer learning space moves very quickly. Adoption of Glove is very low, but already out of date • This is just basic framing. Practical use of embeddings is more complex. See our session at DSS to learn more
  • 36.
    Resources • Github library– Finetune (https://github.com/indicodatasolutions/finetune) • Github library – Enso (https://github.com/indicodatasolutions/enso) • Indico Machine learning newsletter (indico.io) • Deep Learning Book (https://www.deeplearningbook.org/)
  • 37.
    Questions? • slater@indico.io • Quora:https://www.quora.com/profile/Slater-Ryan-Victoroff
  • 38.
    The Real ProblemWith Text Select Features Optimize Hyper parameters Test/Train Split Train Model Evaluate Errors and View Test Error Feature Engineering? Standard Data Science?
  • 39.
    The Real ProblemWith Text Select Features Optimize Hyper parameters Test/Train Split Train Model Evaluate Errors and View Test Error Feature Engineering? Standard Data Science? Overfitting Test/Train Contamination
  • 40.
    The Real ProblemWith Text Select Features Optimize Hyper parameters Test/Train Split Train Model Evaluate Errors and View Test Error Feature Engineering? Standard Data Science? Overfitting Test/Train Contamination Manual feature engineering leads to inaccurate perceptions of performance