SlideShare a Scribd company logo
RECURRENT NEURAL
NETWORKS FOR TEXT
ANALYSIS
Alec Radford
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
Recurrent Neural
Networks for text analysis
From idea to practice
ALEC RADFORD
Follow Along
Slides at: http://goo.gl/WLsUWv
How ML
-0.15, 0.2, 0, 1.5
A, B, C, D
The cat sat on the
mat.
Numerical, great!
Categorical, great!
Uhhh…….
How text is dealt with
(ML perspective)
Text
Features
(bow, TFIDF, LSA, etc...)
Linear Model
(SVM, softmax)
Structure is important!
The cat sat on the mat.
sat the on mat cat the
● Certain tasks, structure is essential:
○ Humor
○ Sarcasm
● Certain tasks, ngrams can get you a
long way:
○ Sentiment Analysis
○ Topic detection
● Specific words can be strong indicators
○ useless, fantastic (sentiment)
○ hoop, green tea, NASDAQ (topic)
Structure is hard
Ngrams is typical way of preserving some structure.
sat
the on
mat
cat
the cat cat sat sat on
on thethe mat
Beyond bi or tri-grams occurrences become very rare and
dimensionality becomes huge (1, 10 million + features)
Structure is hard
How text is dealt with
(ML perspective)
Text
Features
(bow, TFIDF, LSA, etc...)
Linear Model
(SVM, softmax)
How text should be dealt with?
Text RNN
Linear Model
(SVM, softmax)
How an RNN works
the cat sat on the mat
How an RNN works
the cat sat on the mat
input to hidden
How an RNN works
the cat sat on the mat
input to hidden
hidden to hidden
How an RNN works
the cat sat on the mat
input to hidden
hidden to hidden
How an RNN works
the cat sat on the mat
projections
(activities x weights)
activities
(vectors of values)
input to hidden
hidden to hidden
How an RNN works
the cat sat on the mat
projections
(activities x weights)
activities
(vectors of values)
Learned representation of
sequence.
input to hidden
hidden to hidden
How an RNN works
the cat sat on the mat
projections
(activities x weights)
activities
(vectors of values)
cat
hidden to output
input to hidden
hidden to hidden
From text to RNN input
the cat sat on the mat
“The cat sat on the mat.”
Tokenize .
Assign index 0 1 2 3 0 4 5
String input
Embedding lookup 2.5 0.3 -1.2 0.2 -3.3 0.7 -4.1 1.6 2.8 1.1 5.7 -0.2 2.5 0.3 -1.2 1.4 0.6 -3.9 -3.8 1.5 0.1
2.5 0.3 -1.2
0.2 -3.3 0.7
-4.1 1.6 2.8
1.1 5.7 -0.2
1.4 0.6 -3.9
-3.8 1.5 0.1
Learned matrix
You can stack them too
the cat sat on the mat
cat
hidden to output
input to hidden
hidden to hidden
But aren’t RNNs unstable?
Simple RNNs trained with SGD are unstable/difficult to learn.
But modern RNNs with various tricks blow up much less often!
● Gating Units
● Gradient Clipping
● Steeper gates
● Better initialization
● Better optimizers
● Bigger datasets
Simple Recurrent Unit
ht-1
xt
+ ht
xt+1
+ ht+1
+ Element wise addition
Activation function
Routes information can propagate along
Involved in modifying information flow and
values
⊙
⊙⊙
Gated Recurrent Unit - GRU
xt
r
htht-1 ht
z
+
~
1-z z
+ Element wise addition
⊙ Element wise multiplication
Routes information can propagate along
Involved in modifying information flow and
values
Gated Recurrent Unit - GRU
⊙
⊙⊙
xt
r
htht-1
z
+
~
1-z z
⊙
⊙⊙
xt+1
r
ht+1ht
z
+
~
1-z z
ht+1
Gating is important
For sentiment analysis of longer
sequences of text (paragraph or so)
a simple RNN has difficulty learning
at all while a gated RNN does so
easily.
Which One?
There are two types of gated RNNs:
● Gated Recurrent Units (GRU) by
K. Cho, recently introduced and
used for machine translation and
speech recognition tasks.
● Long short term memory (LSTM)
by S. Hochreiter and J.
Schmidhuber has been around
since 1997 and has been used
far more. Various modifications
to it exist.
Which One?
GRU is simpler, faster, and optimizes
quicker (at least on sentiment).
Because it only has two gates
(compared to four) approximately 1.5-
1.75x faster for theano
implementation.
If you have a huge dataset and don’t
mind waiting LSTM may be better in
the long run due to its greater
complexity - especially if you add
peephole connections.
Exploding Gradients?
Exploding gradients are a major problem
for traditional RNNs trained with SGD.
One of the sources of the reputation of
RNNs being hard to train.
In 2012, R Pascanu and T. Mikolov
proposed clipping the norm of the gradient
to alleviate this.
Modern optimizers don’t seem to have this
problem - at least for classification text
analysis.
Better Gating Functions
Interesting paper at NIPS workshop (Q. Lyu, J. Zhu) - make the gates “steeper” so
they change more rapidly from “off” to “on” so model learns to use them quicker.
Better Initialization
Andrew Saxe last year showed that initializing weight matrices with random
orthogonal matrices works better than random gaussian (or uniform) matrices.
In addition, Richard Socher (and more recently Quoc Le) have used identity
initialization schemes which work great as well.
Understanding Optimizers
2D moons dataset
courtesy of scikit-learn
Comparing Optimizers
Adam (D. Kingma) combines the
early optimization speed of
Adagrad (J. Duchi) with the better
later convergence of various other
methods like Adadelta (M. Zeiler)
and RMSprop (T. Tieleman).
Warning: Generalization
performance of Adam seems
slightly worse for smaller datasets.
It adds up
Up to 10x more efficient training once you
add all the tricks together compared to a
naive implementation - much more stable
- rarely diverges.
Around 7.5x faster, the various tricks add
a bit of computation time.
Too much? - Overfitting
RNNs can overfit very well as we will
see. As they continue to fit to training
dataset, their performance on test data
will plateau or even worsen.
Keep track of it using a validation set,
save model at each iteration over
training data and pick the earliest, best,
validation performance.
The Showdown
Model #1 Model #2
+ 512 dim
embedding
512 dim
hidden state
output
Using bigrams and grid search on min_df for
vectorizer and regularization coefficient for model.
Using whatever I tried that worked :)
Adam, GRU, steeper sigmoid gates, ortho/identity
Sentiment & Helpfulness
Effect of Dataset Size
● RNNs have poor generalization properties on small
datasets.
○ 1K labeled examples 25-50% worse than linear model…
● RNNs have better generalization properties on large
datasets.
○ 1M labeled examples 0-30% better than linear model.
● Crossovers between 10K and 1M examples
○ Depends on dataset.
The Thing we don’t talk about
For 1 million paragraph sized text examples to converge:
● Linear model takes 30 minutes on a single CPU core.
● RNN takes 90 minutes on a Titan X.
● RNN takes five days on a single CPU core.
RNN is about 250x slower on CPU than linear model…
This is why we use GPUs
Visualizing
representations of
words learned via
sentiment
TSNE - L.J.P. van derIndividual words colored by average sentiment
Negative
Positive
Model learns to separate negative and positive words, not too surprising
Quantities of TimeQualifiers
Product nouns
Punctuation
Much cooler, model also begins to learn components of language from only binary sentiment labels
The library - Passage
● Tiny RNN library built on top of Theano
● https://github.com/IndicoDataSolutions/Passage
● Still alpha - we’re working on it!
● Supports simple, LSTM, and GRU recurrent layers
● Supports multiple recurrent layers
● Supports deep input to and deep output from hidden layers
○ no deep transitions currently
● Supports embedding and onehot input representations
● Can be used for both regression and classification problems
○ Regression needs preprocessing for stability - working on it
● Much more in the pipeline
An example
Sentiment analysis of movie reviews - 25K labeled examples
RNN imports
RNN imports
preprocessing
RNN imports
preprocessing
load training data
RNN imports
preprocessing
tokenize data
load training data
RNN imports
preprocessing
configure model
tokenize data
load training data
RNN imports
preprocessing
make and train model
tokenize data
load training data
configure model
RNN imports
preprocessing
load test data
make and train model
tokenize data
load training data
configure model
RNN imports
preprocessing
predict on test data
load test data
make and train model
tokenize data
load training data
configure model
The results
Top 10! - barely :)
Summary
● RNNs look to be a competitive tool in certain situations
for text analysis.
● Especially if you have a large 1M+ example dataset
o A GPU or great patience is essential
● Otherwise it can be difficult to justify over linear models
o Speed
o Complexity
o Poor generalization with small datasets
Contact
alec@indico.io
We’re hiring!
● Data Engineer
● Infrastructure Engineer
● Interested?
o contact@indico.io (or talk-to/email me after pres.)
Questions?

More Related Content

What's hot

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Pytorch
PytorchPytorch
Pytorch
ehsan tr
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Knoldus Inc.
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
Khang Pham
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Simplilearn
 
Deep learning
Deep learningDeep learning
Deep learning
Ratnakar Pandey
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
Akshay Sehgal
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Jun Young Park
 
Deep learning.pptx
Deep learning.pptxDeep learning.pptx
Deep learning.pptx
MdMahfoozAlam5
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
Shuai Zhang
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
Yuta Niki
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
Dean Wyatte
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
Shreshth Saxena
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
佳蓉 倪
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 

What's hot (20)

Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Pytorch
PytorchPytorch
Pytorch
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Deep learning
Deep learningDeep learning
Deep learning
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Deep learning.pptx
Deep learning.pptxDeep learning.pptx
Deep learning.pptx
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Seq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) modelSeq2Seq (encoder decoder) model
Seq2Seq (encoder decoder) model
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 

Similar to Recurrent Neural Networks for Text Analysis

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
yang947066
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Vijay Srinivas Agneeswaran, Ph.D
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
Arvind Devaraj
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
MadhuriChandanbatwe
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
Amazon Web Services
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
Manish Mishra
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
elephantscale
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
muayyad alsadi
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
Lucinda Linde
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
Hoa Le
 
TensorfLow_Basic.pptx
TensorfLow_Basic.pptxTensorfLow_Basic.pptx
TensorfLow_Basic.pptx
TMUb202109065
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
Kyuri Kim
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
Sasha Lazarevic
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
Adam Gibson
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Josh Patterson
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 

Similar to Recurrent Neural Networks for Text Analysis (20)

deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
Predicting Tweet Sentiment
Predicting Tweet SentimentPredicting Tweet Sentiment
Predicting Tweet Sentiment
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
TensorfLow_Basic.pptx
TensorfLow_Basic.pptxTensorfLow_Basic.pptx
TensorfLow_Basic.pptx
 
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 

More from odsc

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
odsc
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
odsc
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
odsc
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
odsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
odsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
odsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
odsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
odsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
odsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
odsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
odsc
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
odsc
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
odsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
odsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
odsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
odsc
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
odsc
 

More from odsc (20)

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
 
Kaggle The Home of Data Science
Kaggle The Home of Data ScienceKaggle The Home of Data Science
Kaggle The Home of Data Science
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 

Recurrent Neural Networks for Text Analysis

  • 1. RECURRENT NEURAL NETWORKS FOR TEXT ANALYSIS Alec Radford O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. Recurrent Neural Networks for text analysis From idea to practice ALEC RADFORD
  • 3. Follow Along Slides at: http://goo.gl/WLsUWv
  • 4. How ML -0.15, 0.2, 0, 1.5 A, B, C, D The cat sat on the mat. Numerical, great! Categorical, great! Uhhh…….
  • 5. How text is dealt with (ML perspective) Text Features (bow, TFIDF, LSA, etc...) Linear Model (SVM, softmax)
  • 6. Structure is important! The cat sat on the mat. sat the on mat cat the ● Certain tasks, structure is essential: ○ Humor ○ Sarcasm ● Certain tasks, ngrams can get you a long way: ○ Sentiment Analysis ○ Topic detection ● Specific words can be strong indicators ○ useless, fantastic (sentiment) ○ hoop, green tea, NASDAQ (topic)
  • 7. Structure is hard Ngrams is typical way of preserving some structure. sat the on mat cat the cat cat sat sat on on thethe mat Beyond bi or tri-grams occurrences become very rare and dimensionality becomes huge (1, 10 million + features)
  • 9. How text is dealt with (ML perspective) Text Features (bow, TFIDF, LSA, etc...) Linear Model (SVM, softmax)
  • 10. How text should be dealt with? Text RNN Linear Model (SVM, softmax)
  • 11. How an RNN works the cat sat on the mat
  • 12. How an RNN works the cat sat on the mat input to hidden
  • 13. How an RNN works the cat sat on the mat input to hidden hidden to hidden
  • 14. How an RNN works the cat sat on the mat input to hidden hidden to hidden
  • 15. How an RNN works the cat sat on the mat projections (activities x weights) activities (vectors of values) input to hidden hidden to hidden
  • 16. How an RNN works the cat sat on the mat projections (activities x weights) activities (vectors of values) Learned representation of sequence. input to hidden hidden to hidden
  • 17. How an RNN works the cat sat on the mat projections (activities x weights) activities (vectors of values) cat hidden to output input to hidden hidden to hidden
  • 18. From text to RNN input the cat sat on the mat “The cat sat on the mat.” Tokenize . Assign index 0 1 2 3 0 4 5 String input Embedding lookup 2.5 0.3 -1.2 0.2 -3.3 0.7 -4.1 1.6 2.8 1.1 5.7 -0.2 2.5 0.3 -1.2 1.4 0.6 -3.9 -3.8 1.5 0.1 2.5 0.3 -1.2 0.2 -3.3 0.7 -4.1 1.6 2.8 1.1 5.7 -0.2 1.4 0.6 -3.9 -3.8 1.5 0.1 Learned matrix
  • 19. You can stack them too the cat sat on the mat cat hidden to output input to hidden hidden to hidden
  • 20. But aren’t RNNs unstable? Simple RNNs trained with SGD are unstable/difficult to learn. But modern RNNs with various tricks blow up much less often! ● Gating Units ● Gradient Clipping ● Steeper gates ● Better initialization ● Better optimizers ● Bigger datasets
  • 21. Simple Recurrent Unit ht-1 xt + ht xt+1 + ht+1 + Element wise addition Activation function Routes information can propagate along Involved in modifying information flow and values
  • 22. ⊙ ⊙⊙ Gated Recurrent Unit - GRU xt r htht-1 ht z + ~ 1-z z + Element wise addition ⊙ Element wise multiplication Routes information can propagate along Involved in modifying information flow and values
  • 23. Gated Recurrent Unit - GRU ⊙ ⊙⊙ xt r htht-1 z + ~ 1-z z ⊙ ⊙⊙ xt+1 r ht+1ht z + ~ 1-z z ht+1
  • 24. Gating is important For sentiment analysis of longer sequences of text (paragraph or so) a simple RNN has difficulty learning at all while a gated RNN does so easily.
  • 25. Which One? There are two types of gated RNNs: ● Gated Recurrent Units (GRU) by K. Cho, recently introduced and used for machine translation and speech recognition tasks. ● Long short term memory (LSTM) by S. Hochreiter and J. Schmidhuber has been around since 1997 and has been used far more. Various modifications to it exist.
  • 26. Which One? GRU is simpler, faster, and optimizes quicker (at least on sentiment). Because it only has two gates (compared to four) approximately 1.5- 1.75x faster for theano implementation. If you have a huge dataset and don’t mind waiting LSTM may be better in the long run due to its greater complexity - especially if you add peephole connections.
  • 27. Exploding Gradients? Exploding gradients are a major problem for traditional RNNs trained with SGD. One of the sources of the reputation of RNNs being hard to train. In 2012, R Pascanu and T. Mikolov proposed clipping the norm of the gradient to alleviate this. Modern optimizers don’t seem to have this problem - at least for classification text analysis.
  • 28. Better Gating Functions Interesting paper at NIPS workshop (Q. Lyu, J. Zhu) - make the gates “steeper” so they change more rapidly from “off” to “on” so model learns to use them quicker.
  • 29. Better Initialization Andrew Saxe last year showed that initializing weight matrices with random orthogonal matrices works better than random gaussian (or uniform) matrices. In addition, Richard Socher (and more recently Quoc Le) have used identity initialization schemes which work great as well.
  • 30. Understanding Optimizers 2D moons dataset courtesy of scikit-learn
  • 31. Comparing Optimizers Adam (D. Kingma) combines the early optimization speed of Adagrad (J. Duchi) with the better later convergence of various other methods like Adadelta (M. Zeiler) and RMSprop (T. Tieleman). Warning: Generalization performance of Adam seems slightly worse for smaller datasets.
  • 32. It adds up Up to 10x more efficient training once you add all the tricks together compared to a naive implementation - much more stable - rarely diverges. Around 7.5x faster, the various tricks add a bit of computation time.
  • 33. Too much? - Overfitting RNNs can overfit very well as we will see. As they continue to fit to training dataset, their performance on test data will plateau or even worsen. Keep track of it using a validation set, save model at each iteration over training data and pick the earliest, best, validation performance.
  • 34. The Showdown Model #1 Model #2 + 512 dim embedding 512 dim hidden state output Using bigrams and grid search on min_df for vectorizer and regularization coefficient for model. Using whatever I tried that worked :) Adam, GRU, steeper sigmoid gates, ortho/identity
  • 36. Effect of Dataset Size ● RNNs have poor generalization properties on small datasets. ○ 1K labeled examples 25-50% worse than linear model… ● RNNs have better generalization properties on large datasets. ○ 1M labeled examples 0-30% better than linear model. ● Crossovers between 10K and 1M examples ○ Depends on dataset.
  • 37. The Thing we don’t talk about For 1 million paragraph sized text examples to converge: ● Linear model takes 30 minutes on a single CPU core. ● RNN takes 90 minutes on a Titan X. ● RNN takes five days on a single CPU core. RNN is about 250x slower on CPU than linear model… This is why we use GPUs
  • 38. Visualizing representations of words learned via sentiment TSNE - L.J.P. van derIndividual words colored by average sentiment
  • 39. Negative Positive Model learns to separate negative and positive words, not too surprising
  • 40. Quantities of TimeQualifiers Product nouns Punctuation Much cooler, model also begins to learn components of language from only binary sentiment labels
  • 41. The library - Passage ● Tiny RNN library built on top of Theano ● https://github.com/IndicoDataSolutions/Passage ● Still alpha - we’re working on it! ● Supports simple, LSTM, and GRU recurrent layers ● Supports multiple recurrent layers ● Supports deep input to and deep output from hidden layers ○ no deep transitions currently ● Supports embedding and onehot input representations ● Can be used for both regression and classification problems ○ Regression needs preprocessing for stability - working on it ● Much more in the pipeline
  • 42. An example Sentiment analysis of movie reviews - 25K labeled examples
  • 43.
  • 49. RNN imports preprocessing make and train model tokenize data load training data configure model
  • 50. RNN imports preprocessing load test data make and train model tokenize data load training data configure model
  • 51. RNN imports preprocessing predict on test data load test data make and train model tokenize data load training data configure model
  • 52. The results Top 10! - barely :)
  • 53. Summary ● RNNs look to be a competitive tool in certain situations for text analysis. ● Especially if you have a large 1M+ example dataset o A GPU or great patience is essential ● Otherwise it can be difficult to justify over linear models o Speed o Complexity o Poor generalization with small datasets
  • 55. We’re hiring! ● Data Engineer ● Infrastructure Engineer ● Interested? o contact@indico.io (or talk-to/email me after pres.)