SlideShare a Scribd company logo
1 of 20
Download to read offline
A Survey of Current
Neural Network
Architectures for NLP
Márton Miháltz
Meltwater Group
Hungarian NLP Meetup
2
• Introduction
• Short intro to NN concepts
• Recurrent neural networks
• Long Short-Term Memory, Gated Recurrent Unit
• Recursive neural networks
• Applications to sentiment analysis: Socher et al. 2013; Tai et al. 2015
• Convolutional neural networks
• Applications to text classification: Kim 2014
• Some more recent architectures
• Memory networks, attention models, hybrid architectures
• Tools
• Theano, Torch, Tensor Flow, Caffe, Keras
Outline
3
• Feed-forward neural network
• Activation fn: tanh, ReLU,
Leaky/Parametric ReLU, SoftPlus, …
• Logistic regression or softmax
function for classification layer
• Loss functions (objectives):
categorical cross-entropy, neg. log
likelihood, …
• Training (optimizers): Gradient
Descent, SGD, Mini-batch GD,
RMSprop, Ada, Adagrad, Adam,
Adamax, Nesterov Momentum,
L-BFGS, …
Very Short Intro to Modern Neural Networks
• Input embeddings
• 1-hot encoding
• Random vectors
• Pre-trained vectors, eg. distributional similarity
4
● Tutorials, Blogs
○ Denny Britz’s blog (RNNs, CNNs for NLP, code etc.) -- code in Theano, Tensor Flow
○ Cristopher Olah’s blog (architectures, DL for NLP etc.)
○ Andrej Karpathy’s fun blogpost about RNNs: generate Shakespeare, Paul Graham text,
LaTex source, C code etc. + nice LSTM activity visualizations
○ Deeplearning.net Tutorial -- code in Theano (python)
● Courses
○ Richard Socher’s course Deep Learning for Natural Language Processing at Stanford --
code in Tensor Flow
○ Stanford Unsupervised Feature Learning and Deep Learning Tutorial -- code in Matlab
○ Stanford course Convolutional Neural Networks for Image Recognition (Andrej Karpathy)
● Other sources
○ Bengio’s Deep Learning book
Further Reading (DL for NLP)
5
• Powerful apparatus for learning complex functions for ML
• Better at certain NLP tasks than previous methods
• Pre-trained distributed representation vectors
• Word2vec, GloVe, GenSim, doc2vec, skip-thought vectors etc.
• Vector space properties: similarity, analogies, compositionality etc.
• Less feature engineering needed
• Network learns abstract representations
• Transfer learning / domain adaptation
• Joint learning/execution of NLP steps possible
• Easy to go multimodal
Why Deep Learning for NLP?
6
● About RNNs
○ Internal state depends on state of last step
○ Good for sequential input
○ Backprop. Through Time (BPTT) training
● Applications
○ Language modeling (eg. in machine translation)
○ Sequential labeling
○ Text generation (eg. image description generation, together w/ CNN)
● Problems with RNNs
○ Long sentences, long-term dependencies
○ Exponentially shrinking gradients (“vanishing gradients”)
○ Solutions:
■ Initialization of weights; regularization; using ReLU activ. fn.
■ RNN variations: bidirectional RNN, deep RNN etc.
■ gated RNNs: LSTM, GRU
Recurrent Neural Networks
7
• Long Short Term Memory Networks
• A special recurrent network
• Has a memory cell (internal memory) (c)
• 3 gates: input, forget, output
sigmoid layers with pointwise multiplication
operation (vector of values in [0, 1])
• LSTM is able to remove or add information to the
cell state, regulated by gates, which optionally let
information through
• Gated Recurrent Units
• Another RNN variant
• No internal memory different from internal state
• 2 gates: reset, update (z)
• Reset gate: how to combine new input with previous
state, update gate: how much of the previous state
to keep
LSTMs and GRUs
t-1 t-1
t-1 t-1
[Chung et al. 2014
+ red labels by me]
8
• Overcome RNNs’ long dependency limitations
& vanishing gradients problem
• Very hip in current NLP applications, eg. SOTA in MT
• More complex architectures:
• Bi-directional LSTM
• Stacked (deep) (B-)LSTM/GRU layers
• Another extension, Grid-LSTM (Kalchbrenner et al. 2015)
• Still evolving!
• LSTM vs. GRU better: still in the jury
• GRU has fewer parameters, may be faster to train
• LSTM may be better with more data
LSTMs and GRUs
9
• About RNNs
• Hierarchical architecture
• Shared weights
• Plausible approach for modeling linguistics structures
• Sentiment Analysis with Recursive Networks (Socher et al. 2013)
• Compositional processing of parsed input (Eg. able to handle negations)
• Performs sentence-level sentiment classification:
Rotten Tomatoes dataset (Pang & Lee 2005): 11K movie review sentences pos or neg
85.5% Accuracy on binary class subset, 45.7% on 5-class
• Not SOTA score any more, but was first to go over 80% after 7 years
• Sentiment Treebank for training
Recursive Networks
10
• Sentence words: embedding layer w/ random initial vectors (d=25..35)
• Parse nodes: compositionality function computes representation, recursive
• Softmax classifier: pos-neg (or 5-class) label for each word & each parse node
Recursive Neural Tensor Network
● Weight tensor V:
● Intuition:
each slice of the tensor
captures a specific
type of composition
Sentiment Analysis with RNTN
12
• Tree-LSTM
• Using constituency parsing
• Using GloVe word vectors, updated during training
• Idea: sum hidden states of child vectors
of tree nodes
• Each child has its own forget gate
• Polarity softmax classifiers on tree nodes
• Improves Socher et al 2013
• Fine-grained sentence sentiment: 51.0% vs. 45.7%
• Binary sentence sentiment: 88.0% vs. 85.4%
Tree-LSTMs for Sentiment Analysis
(Tai et al 2015)
13
Convolutional Neural Networks
• CNNs (ConvNets) widely used in
image processing
• Location invariety
• Compositionality
• Fast
• Convolution layers
• “sliding window” over input representation:
filter/kernel/feature generator
• Local connectivity
• Sharing weights
• Hyperparameters
• Wide vs. narrow convolution (padding)
• Filter size (width, height, depth)
• Number of filters/layer
• Stride size
• Channels (R, G, B)
14
CNNs for Text Classification
● Intuition: filter windows over
sentence words <-> n-grams
● Advantage over Recursive
NN/Tree-LSTM: does not require
parsing
● Becoming a standard baseline for
new text classification architectures
● Easy to parallelize on GPUs
15
CNN for Sentiment Analysis (Kim 2014)
• Sentence polarity classification (RT dataset/Sentiment Treebank)
• 88.1% on binary sentiment classification
• Use word2vec vectors
• sentences: concatenated word vectors
• 2 channels:
• Static word2vec vectors & tuned via backprop
• Multiple window sizes (h=3,4,5) and multiple filters (eg. 100)
• Apply max-pooling on feature map
• Selects most important feature from feature map
• Penultimate layer: final feature vector
• Concatenate all pooled features
• Final layer: softmax classifier (pos/neg sentiment)
• Regularization: dropout on penultimate layer
• Randomly set to 0 some of the feature weights
• Prevents co-adaptation of hidden units during forward propagation (overfitting)
16
Adaptation of
Word Vectors
17
• Recursive NNs
• Linguistically plausible, applicable to grammatical structures,
needs parsing
• Recurrent NNs
• Engineered for sequential input, current improvements with gated
RNNs (LSTM, GRU etc.)
• Convolutional NNs
• Exceptionally good for classification; unclear how to incorporate
phrase-level structures, hard to interpret, needs zero padding,
good for GPUs
Summary
18
• Memory Networks
• MemN2N (Sukhbaatar et al 2015)
Facebook’s bAbI Question Answering tasks 90-90%
• Dynamic Memory Networks (Kumar, Irsoy et al 2015): Sentiment on RT dataset 88.6%
Episodic memory: input sequences, questions, reasoning about answers
• Attention models
• Parsing (Vinyals & Hinton et al 2015); Machine Translation (Bahdanau & Bengio et al 2016)
• Relation extraction with LSTM + attention (Zhou et al 2016)
• Sentence embeddings with attention model (Wang et al 2016)
• Hybrid architectures
• NER with BLSTM-CNN (Chiu & Nichols 2016): 91.62% CoNLL, 86.28% OntoNotes
• Sequential labeling with BLSTM-CNN-CRF (Ma & Hovy 2016): 97.55% PoS, 91.21% NER
• Sentiment Analysis using CNN-LSTM (Wang et al 2016)
• Joint learning of NLP tasks
• Pos-tagging, chunking and CC-tagging with one network (Søgaard & Goldberg 2016)
• JEDI: Joint learning of NER and RE (Kirschnick et al 2016)
Some Recent Work
19
● Cuda, CudNN
○ You need these drivers installed
to utilize the GPU (Nvidia)
● Theano
○ Low level abstraction; you define
symbolic variables & functions;
python
● Tensor Flow
○ Low level abstraction; you define
data flow graphs; C++, python
● Torch
○ High abstraction level; very easy
C interfacing, Lua
Tools for Hacking ● Caffe
○ Very high level, simple JSON
config, little versatility, most useful
with convnets (C+Python to
extend)
● High-level wrappers
○ Keras: can bind to either Tensor
Flow or Theano; python
○ SkFlow: wrapper around Tensor
Flow for those familiar with
Scikit-learn; python
○ Pretty Tensor, TensorFlow
Slim: high level wrapper functions
for Tensor Flow; python
○ Digits: Supports Caffe and Torch
● More
○ nice overview here
Thank you!

More Related Content

What's hot

Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Divya Gera
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP Textkernel
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsBuhwan Jeong
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep LearningAsim Jalis
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 

What's hot (20)

Deeplearning NLP
Deeplearning NLPDeeplearning NLP
Deeplearning NLP
 
Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...Natural language processing techniques transition from machine learning to de...
Natural language processing techniques transition from machine learning to de...
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Practical Deep Learning for NLP
Practical Deep Learning for NLP Practical Deep Learning for NLP
Practical Deep Learning for NLP
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applicationsDeep learning - Conceptual understanding and applications
Deep learning - Conceptual understanding and applications
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 

Viewers also liked

Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLPhytae
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsBhaskar Mitra
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理hytae
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Wordshytae
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlpPan Xiaotong
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analyticsErik Tromp
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionevolutionpd
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill BoormanTextkernel
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesBenjamin Taylor
 
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...multimediaeval
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsAce12358
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text MiningWill Stanton
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)Sumit Raj
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Pythonanntp
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine LearningAmrinder Arora
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationPier Luca Lanzi
 

Viewers also liked (20)

Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
新たなRNNと自然言語処理
新たなRNNと自然言語処理新たなRNNと自然言語処理
新たなRNNと自然言語処理
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
NLP
NLPNLP
NLP
 
Deep learning for text analytics
Deep learning for text analyticsDeep learning for text analytics
Deep learning for text analytics
 
NLP@Work Conference: email persuasion
NLP@Work Conference: email persuasionNLP@Work Conference: email persuasion
NLP@Work Conference: email persuasion
 
AI Reality: Where are we now? Data for Good? - Bill Boorman
AI Reality: Where are we now? Data for Good? - Bill  BoormanAI Reality: Where are we now? Data for Good? - Bill  Boorman
AI Reality: Where are we now? Data for Good? - Bill Boorman
 
Using Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From ResumesUsing Deep Learning And NLP To Predict Performance From Resumes
Using Deep Learning And NLP To Predict Performance From Resumes
 
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
Music Emotion Tracking with Continuous Conditional Neural Fields and Relative...
 
Emnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
 
Deep Learning and Text Mining
Deep Learning and Text MiningDeep Learning and Text Mining
Deep Learning and Text Mining
 
Natural language processing (Python)
Natural language processing (Python)Natural language processing (Python)
Natural language processing (Python)
 
Natural Language Processing and Python
Natural Language Processing and PythonNatural Language Processing and Python
Natural Language Processing and Python
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Online algorithms in Machine Learning
Online algorithms in Machine LearningOnline algorithms in Machine Learning
Online algorithms in Machine Learning
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 

Similar to Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfFEG
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyRimzim Thube
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Olga Zinkevych
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitBAINIDA
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionSai Kiran Kadam
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksRimzim Thube
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Presentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptxPresentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptxrebeen4
 

Similar to Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07) (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A SurveyNatural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...Overview of text classification approaches algorithms &amp; software v lyubin...
Overview of text classification approaches algorithms &amp; software v lyubin...
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
A Survey of Convolutional Neural Networks
A Survey of Convolutional Neural NetworksA Survey of Convolutional Neural Networks
A Survey of Convolutional Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Nlpnn
NlpnnNlpnn
Nlpnn
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Presentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptxPresentation-Licentiate degree.pptx
Presentation-Licentiate degree.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 

Recently uploaded

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 

Recently uploaded (20)

Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 

Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)

  • 1. A Survey of Current Neural Network Architectures for NLP Márton Miháltz Meltwater Group Hungarian NLP Meetup
  • 2. 2 • Introduction • Short intro to NN concepts • Recurrent neural networks • Long Short-Term Memory, Gated Recurrent Unit • Recursive neural networks • Applications to sentiment analysis: Socher et al. 2013; Tai et al. 2015 • Convolutional neural networks • Applications to text classification: Kim 2014 • Some more recent architectures • Memory networks, attention models, hybrid architectures • Tools • Theano, Torch, Tensor Flow, Caffe, Keras Outline
  • 3. 3 • Feed-forward neural network • Activation fn: tanh, ReLU, Leaky/Parametric ReLU, SoftPlus, … • Logistic regression or softmax function for classification layer • Loss functions (objectives): categorical cross-entropy, neg. log likelihood, … • Training (optimizers): Gradient Descent, SGD, Mini-batch GD, RMSprop, Ada, Adagrad, Adam, Adamax, Nesterov Momentum, L-BFGS, … Very Short Intro to Modern Neural Networks • Input embeddings • 1-hot encoding • Random vectors • Pre-trained vectors, eg. distributional similarity
  • 4. 4 ● Tutorials, Blogs ○ Denny Britz’s blog (RNNs, CNNs for NLP, code etc.) -- code in Theano, Tensor Flow ○ Cristopher Olah’s blog (architectures, DL for NLP etc.) ○ Andrej Karpathy’s fun blogpost about RNNs: generate Shakespeare, Paul Graham text, LaTex source, C code etc. + nice LSTM activity visualizations ○ Deeplearning.net Tutorial -- code in Theano (python) ● Courses ○ Richard Socher’s course Deep Learning for Natural Language Processing at Stanford -- code in Tensor Flow ○ Stanford Unsupervised Feature Learning and Deep Learning Tutorial -- code in Matlab ○ Stanford course Convolutional Neural Networks for Image Recognition (Andrej Karpathy) ● Other sources ○ Bengio’s Deep Learning book Further Reading (DL for NLP)
  • 5. 5 • Powerful apparatus for learning complex functions for ML • Better at certain NLP tasks than previous methods • Pre-trained distributed representation vectors • Word2vec, GloVe, GenSim, doc2vec, skip-thought vectors etc. • Vector space properties: similarity, analogies, compositionality etc. • Less feature engineering needed • Network learns abstract representations • Transfer learning / domain adaptation • Joint learning/execution of NLP steps possible • Easy to go multimodal Why Deep Learning for NLP?
  • 6. 6 ● About RNNs ○ Internal state depends on state of last step ○ Good for sequential input ○ Backprop. Through Time (BPTT) training ● Applications ○ Language modeling (eg. in machine translation) ○ Sequential labeling ○ Text generation (eg. image description generation, together w/ CNN) ● Problems with RNNs ○ Long sentences, long-term dependencies ○ Exponentially shrinking gradients (“vanishing gradients”) ○ Solutions: ■ Initialization of weights; regularization; using ReLU activ. fn. ■ RNN variations: bidirectional RNN, deep RNN etc. ■ gated RNNs: LSTM, GRU Recurrent Neural Networks
  • 7. 7 • Long Short Term Memory Networks • A special recurrent network • Has a memory cell (internal memory) (c) • 3 gates: input, forget, output sigmoid layers with pointwise multiplication operation (vector of values in [0, 1]) • LSTM is able to remove or add information to the cell state, regulated by gates, which optionally let information through • Gated Recurrent Units • Another RNN variant • No internal memory different from internal state • 2 gates: reset, update (z) • Reset gate: how to combine new input with previous state, update gate: how much of the previous state to keep LSTMs and GRUs t-1 t-1 t-1 t-1 [Chung et al. 2014 + red labels by me]
  • 8. 8 • Overcome RNNs’ long dependency limitations & vanishing gradients problem • Very hip in current NLP applications, eg. SOTA in MT • More complex architectures: • Bi-directional LSTM • Stacked (deep) (B-)LSTM/GRU layers • Another extension, Grid-LSTM (Kalchbrenner et al. 2015) • Still evolving! • LSTM vs. GRU better: still in the jury • GRU has fewer parameters, may be faster to train • LSTM may be better with more data LSTMs and GRUs
  • 9. 9 • About RNNs • Hierarchical architecture • Shared weights • Plausible approach for modeling linguistics structures • Sentiment Analysis with Recursive Networks (Socher et al. 2013) • Compositional processing of parsed input (Eg. able to handle negations) • Performs sentence-level sentiment classification: Rotten Tomatoes dataset (Pang & Lee 2005): 11K movie review sentences pos or neg 85.5% Accuracy on binary class subset, 45.7% on 5-class • Not SOTA score any more, but was first to go over 80% after 7 years • Sentiment Treebank for training Recursive Networks
  • 10. 10 • Sentence words: embedding layer w/ random initial vectors (d=25..35) • Parse nodes: compositionality function computes representation, recursive • Softmax classifier: pos-neg (or 5-class) label for each word & each parse node Recursive Neural Tensor Network ● Weight tensor V: ● Intuition: each slice of the tensor captures a specific type of composition
  • 12. 12 • Tree-LSTM • Using constituency parsing • Using GloVe word vectors, updated during training • Idea: sum hidden states of child vectors of tree nodes • Each child has its own forget gate • Polarity softmax classifiers on tree nodes • Improves Socher et al 2013 • Fine-grained sentence sentiment: 51.0% vs. 45.7% • Binary sentence sentiment: 88.0% vs. 85.4% Tree-LSTMs for Sentiment Analysis (Tai et al 2015)
  • 13. 13 Convolutional Neural Networks • CNNs (ConvNets) widely used in image processing • Location invariety • Compositionality • Fast • Convolution layers • “sliding window” over input representation: filter/kernel/feature generator • Local connectivity • Sharing weights • Hyperparameters • Wide vs. narrow convolution (padding) • Filter size (width, height, depth) • Number of filters/layer • Stride size • Channels (R, G, B)
  • 14. 14 CNNs for Text Classification ● Intuition: filter windows over sentence words <-> n-grams ● Advantage over Recursive NN/Tree-LSTM: does not require parsing ● Becoming a standard baseline for new text classification architectures ● Easy to parallelize on GPUs
  • 15. 15 CNN for Sentiment Analysis (Kim 2014) • Sentence polarity classification (RT dataset/Sentiment Treebank) • 88.1% on binary sentiment classification • Use word2vec vectors • sentences: concatenated word vectors • 2 channels: • Static word2vec vectors & tuned via backprop • Multiple window sizes (h=3,4,5) and multiple filters (eg. 100) • Apply max-pooling on feature map • Selects most important feature from feature map • Penultimate layer: final feature vector • Concatenate all pooled features • Final layer: softmax classifier (pos/neg sentiment) • Regularization: dropout on penultimate layer • Randomly set to 0 some of the feature weights • Prevents co-adaptation of hidden units during forward propagation (overfitting)
  • 17. 17 • Recursive NNs • Linguistically plausible, applicable to grammatical structures, needs parsing • Recurrent NNs • Engineered for sequential input, current improvements with gated RNNs (LSTM, GRU etc.) • Convolutional NNs • Exceptionally good for classification; unclear how to incorporate phrase-level structures, hard to interpret, needs zero padding, good for GPUs Summary
  • 18. 18 • Memory Networks • MemN2N (Sukhbaatar et al 2015) Facebook’s bAbI Question Answering tasks 90-90% • Dynamic Memory Networks (Kumar, Irsoy et al 2015): Sentiment on RT dataset 88.6% Episodic memory: input sequences, questions, reasoning about answers • Attention models • Parsing (Vinyals & Hinton et al 2015); Machine Translation (Bahdanau & Bengio et al 2016) • Relation extraction with LSTM + attention (Zhou et al 2016) • Sentence embeddings with attention model (Wang et al 2016) • Hybrid architectures • NER with BLSTM-CNN (Chiu & Nichols 2016): 91.62% CoNLL, 86.28% OntoNotes • Sequential labeling with BLSTM-CNN-CRF (Ma & Hovy 2016): 97.55% PoS, 91.21% NER • Sentiment Analysis using CNN-LSTM (Wang et al 2016) • Joint learning of NLP tasks • Pos-tagging, chunking and CC-tagging with one network (Søgaard & Goldberg 2016) • JEDI: Joint learning of NER and RE (Kirschnick et al 2016) Some Recent Work
  • 19. 19 ● Cuda, CudNN ○ You need these drivers installed to utilize the GPU (Nvidia) ● Theano ○ Low level abstraction; you define symbolic variables & functions; python ● Tensor Flow ○ Low level abstraction; you define data flow graphs; C++, python ● Torch ○ High abstraction level; very easy C interfacing, Lua Tools for Hacking ● Caffe ○ Very high level, simple JSON config, little versatility, most useful with convnets (C+Python to extend) ● High-level wrappers ○ Keras: can bind to either Tensor Flow or Theano; python ○ SkFlow: wrapper around Tensor Flow for those familiar with Scikit-learn; python ○ Pretty Tensor, TensorFlow Slim: high level wrapper functions for Tensor Flow; python ○ Digits: Supports Caffe and Torch ● More ○ nice overview here