SlideShare a Scribd company logo
Deep Learning 

for NLP
An Introduction to Neural Word Embeddings*
Roelof Pieters

PhD candidate KTH/CSC
CIO/CTO Feeda AB
Feeda KTH, December 4, 2014
roelof@kth.se www.csc.kth.se/~roelof/ @graphific
*and some more fun stuff…
2. NLP: WORD EMBEDDINGS
1. DEEP LEARNING
1. DEEP LEARNING
2. NLP: WORD EMBEDDINGS
A couple of headlines… [all November ’14]
Improving some task T
based on experience E with
respect to performance
measure P.
Deep Learning = Machine Learning
Learning denotes changes in the
system that are adaptive in the sense
that they enable the system to do the
same task (or tasks drawn from a
population of similar tasks) more
effectively the next time.
— H. Simon 1983 

"Why Should Machines Learn?” in Mitchell 1997
— T. Mitchell 1997
Representation learning
Attempts to automatically learn
good features or
representations
Deep learning
Attempt to learn multiple levels
of representation of increasing
complexity/abstraction
Deep Learning: What?
ML: Traditional Approach
1. Gather as much LABELED data as you can get
2. Throw some algorithms at it (mainly put in an SVM and
keep it at that)
3. If you actually have tried more algos: Pick the best
4. Spend hours hand engineering some features / feature
selection / dimensionality reduction (PCA, SVD, etc)
5. Repeat…
For each new problem/question::
History
• Perceptron (’57-69…)
• Multi-Layered Perceptrons (’86)
• SVMs (popularized 00s)
• RBM (‘92+)
• “2006”
Rosenblatt 1957 vs Minsky & Papert
History
• Perceptron (’57-69…)
• Multi-Layered Perceptrons (’86)
• SVMs (popularized 00s)
• RBM (‘92+)
• “2006”
(Rumelhart, Hinton & Williams, 1986)
Backprop Renaissance
• Multi-Layered Perceptrons (’86)
• Uses Backpropagation (Bryson & Ho 1969):
back-propagates the error signal computed at the
output layer to get derivatives for learning, in
order to update the weight vectors until
convergence is reached
Backprop Renaissance
Forward Propagation
• Sum inputs, produce activation, feed-forward
Backprop Renaissance
Back Propagation (of error)
• Calculate total error at the top
• Calculate contributions to error at each step going
backwards
• Compute gradient of example-wise loss wrt
parameters
• Simply applying the derivative chain rule wisely 





• If computing the loss (example, parameters) is O(n)
computation, then so is computing the gradient
Backpropagation
Simple Chain Rule
History
• Perceptron (’57-69…)
• Multi-Layered Perceptrons (’86)
• SVMs (popularized 00s)
• RBM (‘92+)
• “2006”
(Cortes & Vapnik 1995)
Kernel SVM
History
• Perceptron (’57-69…)
• Multi-Layered Perceptrons (’86)
• SVMs (popularized 00s)
• RBM (‘92+)
• “2006”
• Form of log-linear
Markov Random
Field 

(MRF)
• Bipartite graph, with
no intra-layer
connections
• Energy Function
RBM: Structure
• Training Function:



• often by contrastive divergence (CD) (Hinton 1999;
Hinton 2000)
• Gibbs sampling
• Gradient Descent
• Goal: compute weight updates
RBM: Training
more info: Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines
History
• Perceptron (’57-69…)
• Multi-Layered Perceptrons (’86)
• SVMs (popularized 00s)
• RBM (‘92+)
• “2006”
1. More labeled data
(“Big Data”)
2. GPU’s
3. “layer-wise
unsupervised
feature learning”
Stacking Single Layer Learners
One of the big ideas from 2006: layer-wise
unsupervised feature learning
- Stacking Restricted Boltzmann Machines (RBM) ->
Deep Belief Network (DBN)
- Stacking regularized auto-encoders -> deep neural
nets
• Stacked RBM
• Introduced by Hinton et al. (2006)
• 1st RBM hidden layer == 2th RBM input layer
• Feature Hierarchy
Deep Belief Network (DBN)
Biological Justification
Deep Learning = Brain “inspired”

Audio/Visual Cortex has multiple stages == Hierarchical
Biological Justification
Deep Learning = Brain “inspired”

Audio/Visual Cortex has multiple stages == Hierarchical
“Brainiacs” “Pragmatists”vs
Biological Justification
Deep Learning = Brain “inspired”

Audio/Visual Cortex has multiple stages == Hierarchical
• Computational Biology • CVAP
“Brainiacs” “Pragmatists”vs
Biological Justification
Deep Learning = Brain “inspired”

Audio/Visual Cortex has multiple stages == Hierarchical
• Computational Biology • CVAP
• Jorge Dávila-Chacón
• “that guy”
“Brainiacs” “Pragmatists”vs
Different Levels of Abstraction
Hierarchical Learning
• Natural progression
from low level to high
level structure as seen
in natural complexity
• Easier to monitor what
is being learnt and to
guide the machine to
better subspaces
• A good lower level
representation can be
used for many distinct
tasks
Different Levels of Abstraction
Feature Representation
• Shared Low Level
Representations
• Multi-Task Learning
• Unsupervised Training
• Partial Feature Sharing
• Mixed Mode Learning
• Composition of
Functions
Generalizable Learning
Classic Deep Architecture
Input layer
Hidden layers
Output layer
Modern Deep Architecture
Input layer
Hidden layers
Output layer
Modern Deep Architecture
Input layer
Hidden layers
Output layer
movie time:
http://www.cs.toronto.edu/~hinton/adi/index.htm
Deep Learning
Hierarchies
Efficient
Generalization
Distributed
Sharing
Unsupervised*
Black Box
Training Time
Major PWNAGE!
Much Data
Why go Deep ?
No More Handcrafted Features !
— Andrew Ng
“I’ve worked all my life in
Machine Learning, and I’ve
never seen one algorithm knock
over benchmarks like Deep
Learning”
Deep Learning: Why?
Deep Learning: Why?
Beat state of the art in many areas:
• Language Modeling (2012, Mikolov et al)
• Image Recognition (Krizhevsky won
2012 ImageNet competition)
• Sentiment Classification (2011, Socher et
al)
• Speech Recognition (2010, Dahl et al)
• MNIST hand-written digit recognition
(Ciresan et al, 2010)
One Model rules them all ?



DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
2. NLP: WORD EMBEDDINGS
1. DEEP LEARNING
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND
Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
Distributional representations
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most successful ideas of modern
statistical NLP!
these words represent banking
• Hard (class based) clustering models:
• Soft clustering models
• Word Embeddings (Bengio et al, 2001;
Bengio et al, 2003) based on idea of
distributed representations for symbols
(Hinton 1986)
• Neural Word embeddings (Mnih and Hinton
2007, Collobert & Weston 2008, Turian et al
2010; Collobert et al. 2011, Mikolov et al.
2011)
Language Modeling
Neural distributional representations
• Neural word embeddings
• Combine vector space semantics with the prediction
of probabilistic models
• Words are represented as a dense vector:
Candy =
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
input: 

- the country of my birth
- the place where I was born
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
input: 

- the country of my birth
- the place where I was born
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born
input: 

- the country of my birth
- the place where I was born
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born ?
…
• Recursive Tensor (Neural) Network (RTNT) 

(Socher et al. 2011; Socher 2014)
• Top-down hierarchical net (vs feed forward)
• NLP!
• Sequence based classification, windows of several
events, entire scenes (rather than images), entire
sentences (rather than words)
• Features = Vectors
• A tensor = multi-dimensional matrix, or multiple matrices
of the same size
Recursive Neural (Tensor) Network
Recursive Neural Tensor Network
Recursive Neural Tensor Network
Compositionality
Principle of compositionality:
the “meaning (vector) of a
complex expression
(sentence) is determined by:
— Gottlob Frege 

(1848 - 1925)
- the meanings of its
constituent expressions
(words) and
- the rules (grammar) used
to combine them”
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
Compositionality
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
Compositionality
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules”
Compositionality
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
Vector Space + Word Embeddings: Socher
Vector Space + Word Embeddings: Socher
code & info: http://metaoptimize.com/projects/wordreprs/
Word Embeddings: Turian
t-SNE visualizations of word embeddings. Left: Number Region; Right:
Jobs Region. From Turian et al. 2011
Word Embeddings: Turian
• Recurrent Neural Network (Mikolov et al. 2010;
Mikolov et al. 2013a)
W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle")
W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king")
Figures from Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic
Regularities in Continuous Space Word Representations
Word Embeddings: Mikolov
• Mikolov et al. 2013b
Figures from Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b).
Efficient Estimation of Word Representations in Vector Space
Word Embeddings: Mikolov
• cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/
CUDA, optimized for GTX 580) 

https://code.google.com/p/cuda-convnet2/
• Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)

http://caffe.berkeleyvision.org/
• OverFeat (NYU) 

http://cilvr.nyu.edu/doku.php?id=code:start
Wanna Play ?
• Theano - CPU/GPU symbolic expression compiler in
python (from LISA lab at University of Montreal). http://
deeplearning.net/software/theano/
• Pylearn2 - library designed to make machine learning
research easy. http://deeplearning.net/software/pylearn2/
• Torch - Matlab-like environment for state-of-the-art
machine learning algorithms in lua (from Ronan Collobert,
Clement Farabet and Koray Kavukcuoglu) http://torch.ch/
• more info: http://deeplearning.net/software links/
Wanna Play ?
Wanna Play ?
as PhD candidate KTH/CSC:
“Always interested in discussion

Machine Learning, Deep
Architectures, 

Graphs, and NLP”
Wanna Play with Me ?
roelof@kth.se
www.csc.kth.se/~roelof/
Internship / EntrepeneurshipAcademic/Research
as CIO/CTO Feeda:
“Always looking for additions to our 

brand new R&D team”



[Internships upcoming on 

KTH exjobb website…]
roelof@feeda.com
www.feeda.com
Feeda
Were Hiring!
roelof@feeda.com
www.feeda.com
Feeda
• Software Developers
• Data Scientists
Appendum
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive
Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013
code & demo: http://nlp.stanford.edu/sentiment/index.html
Appendum
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng 

Improving Word Representations via Global Context and Multiple Word Prototypes

More Related Content

What's hot

Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
Anirudh Jayakumar
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
DataminingTools Inc
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
Anuj Gupta
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems
Scandala Tamang
 
Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...
iwan_rg
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Aanchal Chaurasia
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Ajay Taneja
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
ankit_ppt
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
National Institute of Technology Durgapur
 
Fuzzy logic and neural networks
Fuzzy logic and neural networksFuzzy logic and neural networks
Fuzzy logic and neural networks
qazi
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
Harry Potter
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
Nitesh Singh
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Traian Rebedea
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 

What's hot (20)

Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Word embedding
Word embedding Word embedding
Word embedding
 
Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems Non Deterministic and Deterministic Problems
Non Deterministic and Deterministic Problems
 
Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...Summary of Multilingual Natural Language Processing Applications: From Theory...
Summary of Multilingual Natural Language Processing Applications: From Theory...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Fuzzy logic and neural networks
Fuzzy logic and neural networksFuzzy logic and neural networks
Fuzzy logic and neural networks
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 

Viewers also liked

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
ggplot for python
ggplot for pythonggplot for python
ggplot for python
Austin Ogilvie
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
Roelof Pieters
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
Jörgen Sandig
 
Deep learning
Deep learningDeep learning
Deep learning
Pratap Dangeti
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
Roelof Pieters
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
LINAGORA
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
Roelof Pieters
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Melanie Swan
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
Jaemin Cho
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
Bhaskar Mitra
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
LINAGORA
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Association for Computational Linguistics
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
Bhaskar Mitra
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)
Jaemin Cho
 

Viewers also liked (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
ggplot for python
ggplot for pythonggplot for python
ggplot for python
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Hackathon 2014 NLP Hack
Hackathon 2014 NLP HackHackathon 2014 NLP Hack
Hackathon 2014 NLP Hack
 
Advanced Node.JS Meetup
Advanced Node.JS MeetupAdvanced Node.JS Meetup
Advanced Node.JS Meetup
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain ExplainedBlockchain Smartnetworks: Bitcoin and Blockchain Explained
Blockchain Smartnetworks: Bitcoin and Blockchain Explained
 
Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)Deep Learning for Chatbot (3/4)
Deep Learning for Chatbot (3/4)
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !Construisons ensemble le chatbot bancaire dedemain !
Construisons ensemble le chatbot bancaire dedemain !
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)Deep Learning for Chatbot (1/4)
Deep Learning for Chatbot (1/4)
 

Similar to Deep Learning for NLP: An Introduction to Neural Word Embeddings

Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
Roelof Pieters
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
Roelof Pieters
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
Machine Learning Prague
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
Akshay Hegde
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
RajkiranVeluri
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
Roelof Pieters
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
Viet-Trung TRAN
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
Balázs Hidasi
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Dialogare con agenti artificiali
Dialogare con agenti artificiali  Dialogare con agenti artificiali
Dialogare con agenti artificiali
Agnese Augello
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
Artificial Intelligence Institute at UofSC
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
VishnuRajuV
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
SwayattaDaw1
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
ebelani
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
Andrzej Zydroń MBCS
 

Similar to Deep Learning for NLP: An Introduction to Neural Word Embeddings (20)

Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Dialogare con agenti artificiali
Dialogare con agenti artificiali  Dialogare con agenti artificiali
Dialogare con agenti artificiali
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
asdrfasdfasdf
asdrfasdfasdfasdrfasdfasdf
asdrfasdfasdf
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 

More from Roelof Pieters

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain future
Roelof Pieters
 
AI assisted creativity
AI assisted creativity AI assisted creativity
AI assisted creativity
Roelof Pieters
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"
Roelof Pieters
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
Roelof Pieters
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) Machine
Roelof Pieters
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative ai
Roelof Pieters
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
Roelof Pieters
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
Roelof Pieters
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep Learning
Roelof Pieters
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
Roelof Pieters
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
Roelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
Roelof Pieters
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
Roelof Pieters
 

More from Roelof Pieters (16)

Speculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain futureSpeculations in anthropology and tech for an uncertain future
Speculations in anthropology and tech for an uncertain future
 
AI assisted creativity
AI assisted creativity AI assisted creativity
AI assisted creativity
 
Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"Creativity and AI: 
Deep Neural Nets "Going Wild"
Creativity and AI: 
Deep Neural Nets "Going Wild"
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) Machine
 
Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative ai
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep Learning
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 

Recently uploaded

FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 

Recently uploaded (20)

FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 

Deep Learning for NLP: An Introduction to Neural Word Embeddings

  • 1. Deep Learning 
 for NLP An Introduction to Neural Word Embeddings* Roelof Pieters
 PhD candidate KTH/CSC CIO/CTO Feeda AB Feeda KTH, December 4, 2014 roelof@kth.se www.csc.kth.se/~roelof/ @graphific *and some more fun stuff…
  • 2. 2. NLP: WORD EMBEDDINGS 1. DEEP LEARNING
  • 3. 1. DEEP LEARNING 2. NLP: WORD EMBEDDINGS
  • 4. A couple of headlines… [all November ’14]
  • 5. Improving some task T based on experience E with respect to performance measure P. Deep Learning = Machine Learning Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task (or tasks drawn from a population of similar tasks) more effectively the next time. — H. Simon 1983 
 "Why Should Machines Learn?” in Mitchell 1997 — T. Mitchell 1997
  • 6. Representation learning Attempts to automatically learn good features or representations Deep learning Attempt to learn multiple levels of representation of increasing complexity/abstraction Deep Learning: What?
  • 7. ML: Traditional Approach 1. Gather as much LABELED data as you can get 2. Throw some algorithms at it (mainly put in an SVM and keep it at that) 3. If you actually have tried more algos: Pick the best 4. Spend hours hand engineering some features / feature selection / dimensionality reduction (PCA, SVD, etc) 5. Repeat… For each new problem/question::
  • 8. History • Perceptron (’57-69…) • Multi-Layered Perceptrons (’86) • SVMs (popularized 00s) • RBM (‘92+) • “2006” Rosenblatt 1957 vs Minsky & Papert
  • 9. History • Perceptron (’57-69…) • Multi-Layered Perceptrons (’86) • SVMs (popularized 00s) • RBM (‘92+) • “2006” (Rumelhart, Hinton & Williams, 1986)
  • 10. Backprop Renaissance • Multi-Layered Perceptrons (’86) • Uses Backpropagation (Bryson & Ho 1969): back-propagates the error signal computed at the output layer to get derivatives for learning, in order to update the weight vectors until convergence is reached
  • 11. Backprop Renaissance Forward Propagation • Sum inputs, produce activation, feed-forward
  • 12. Backprop Renaissance Back Propagation (of error) • Calculate total error at the top • Calculate contributions to error at each step going backwards
  • 13. • Compute gradient of example-wise loss wrt parameters • Simply applying the derivative chain rule wisely 
 
 
 • If computing the loss (example, parameters) is O(n) computation, then so is computing the gradient Backpropagation
  • 15. History • Perceptron (’57-69…) • Multi-Layered Perceptrons (’86) • SVMs (popularized 00s) • RBM (‘92+) • “2006” (Cortes & Vapnik 1995) Kernel SVM
  • 16. History • Perceptron (’57-69…) • Multi-Layered Perceptrons (’86) • SVMs (popularized 00s) • RBM (‘92+) • “2006” • Form of log-linear Markov Random Field 
 (MRF) • Bipartite graph, with no intra-layer connections
  • 18. • Training Function:
 
 • often by contrastive divergence (CD) (Hinton 1999; Hinton 2000) • Gibbs sampling • Gradient Descent • Goal: compute weight updates RBM: Training more info: Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines
  • 19. History • Perceptron (’57-69…) • Multi-Layered Perceptrons (’86) • SVMs (popularized 00s) • RBM (‘92+) • “2006” 1. More labeled data (“Big Data”) 2. GPU’s 3. “layer-wise unsupervised feature learning”
  • 20. Stacking Single Layer Learners One of the big ideas from 2006: layer-wise unsupervised feature learning - Stacking Restricted Boltzmann Machines (RBM) -> Deep Belief Network (DBN) - Stacking regularized auto-encoders -> deep neural nets
  • 21. • Stacked RBM • Introduced by Hinton et al. (2006) • 1st RBM hidden layer == 2th RBM input layer • Feature Hierarchy Deep Belief Network (DBN)
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Biological Justification Deep Learning = Brain “inspired”
 Audio/Visual Cortex has multiple stages == Hierarchical
  • 27. Biological Justification Deep Learning = Brain “inspired”
 Audio/Visual Cortex has multiple stages == Hierarchical “Brainiacs” “Pragmatists”vs
  • 28. Biological Justification Deep Learning = Brain “inspired”
 Audio/Visual Cortex has multiple stages == Hierarchical • Computational Biology • CVAP “Brainiacs” “Pragmatists”vs
  • 29. Biological Justification Deep Learning = Brain “inspired”
 Audio/Visual Cortex has multiple stages == Hierarchical • Computational Biology • CVAP • Jorge Dávila-Chacón • “that guy” “Brainiacs” “Pragmatists”vs
  • 30. Different Levels of Abstraction
  • 31. Hierarchical Learning • Natural progression from low level to high level structure as seen in natural complexity • Easier to monitor what is being learnt and to guide the machine to better subspaces • A good lower level representation can be used for many distinct tasks Different Levels of Abstraction Feature Representation
  • 32. • Shared Low Level Representations • Multi-Task Learning • Unsupervised Training • Partial Feature Sharing • Mixed Mode Learning • Composition of Functions Generalizable Learning
  • 33. Classic Deep Architecture Input layer Hidden layers Output layer
  • 34. Modern Deep Architecture Input layer Hidden layers Output layer
  • 35. Modern Deep Architecture Input layer Hidden layers Output layer movie time: http://www.cs.toronto.edu/~hinton/adi/index.htm
  • 38. No More Handcrafted Features !
  • 39. — Andrew Ng “I’ve worked all my life in Machine Learning, and I’ve never seen one algorithm knock over benchmarks like Deep Learning” Deep Learning: Why?
  • 40. Deep Learning: Why? Beat state of the art in many areas: • Language Modeling (2012, Mikolov et al) • Image Recognition (Krizhevsky won 2012 ImageNet competition) • Sentiment Classification (2011, Socher et al) • Speech Recognition (2010, Dahl et al) • MNIST hand-written digit recognition (Ciresan et al, 2010)
  • 41. One Model rules them all ?
 
 DL approaches have been successfully applied to: Deep Learning: Why for NLP ? Automatic summarization Coreference resolution Discourse analysis Machine translation Morphological segmentation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Part-of-speech tagging Parsing Question answering Relationship extraction sentence boundary disambiguation Sentiment analysis Speech recognition Speech segmentation Topic segmentation and recognition Word segmentation Word sense disambiguation Information retrieval (IR) Information extraction (IE) Speech processing
  • 42. 2. NLP: WORD EMBEDDINGS 1. DEEP LEARNING
  • 43. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
  • 44. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
  • 45. Distributional representations “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking • Hard (class based) clustering models: • Soft clustering models
  • 46. • Word Embeddings (Bengio et al, 2001; Bengio et al, 2003) based on idea of distributed representations for symbols (Hinton 1986) • Neural Word embeddings (Mnih and Hinton 2007, Collobert & Weston 2008, Turian et al 2010; Collobert et al. 2011, Mikolov et al. 2011) Language Modeling
  • 47. Neural distributional representations • Neural word embeddings • Combine vector space semantics with the prediction of probabilistic models • Words are represented as a dense vector: Candy =
  • 48. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world:
  • 49. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: input: 
 - the country of my birth - the place where I was born
  • 50. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth input: 
 - the country of my birth - the place where I was born
  • 51. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born input: 
 - the country of my birth - the place where I was born
  • 52. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born ? …
  • 53. • Recursive Tensor (Neural) Network (RTNT) 
 (Socher et al. 2011; Socher 2014) • Top-down hierarchical net (vs feed forward) • NLP! • Sequence based classification, windows of several events, entire scenes (rather than images), entire sentences (rather than words) • Features = Vectors • A tensor = multi-dimensional matrix, or multiple matrices of the same size Recursive Neural (Tensor) Network
  • 56. Compositionality Principle of compositionality: the “meaning (vector) of a complex expression (sentence) is determined by: — Gottlob Frege 
 (1848 - 1925) - the meanings of its constituent expressions (words) and - the rules (grammar) used to combine them”
  • 57. NP PP/IN NP DT NN PRP$ NN Parse Tree Compositionality
  • 58. NP PP/IN NP DT NN PRP$ NN Parse Tree INDT NN PRP NN Compositionality
  • 59. NP IN NP PRP NN Parse Tree DT NN Compositionality
  • 60. NP IN NP DT NN PRP NN PP NP (S / ROOT) Compositionality
  • 61. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” Compositionality
  • 62. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” “meanings” Compositionality
  • 63. Vector Space + Word Embeddings: Socher
  • 64. Vector Space + Word Embeddings: Socher
  • 65. code & info: http://metaoptimize.com/projects/wordreprs/ Word Embeddings: Turian
  • 66. t-SNE visualizations of word embeddings. Left: Number Region; Right: Jobs Region. From Turian et al. 2011 Word Embeddings: Turian
  • 67. • Recurrent Neural Network (Mikolov et al. 2010; Mikolov et al. 2013a) W(‘‘woman")−W(‘‘man") ≃ W(‘‘aunt")−W(‘‘uncle") W(‘‘woman")−W(‘‘man") ≃ W(‘‘queen")−W(‘‘king") Figures from Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations Word Embeddings: Mikolov
  • 68. • Mikolov et al. 2013b Figures from Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013b). Efficient Estimation of Word Representations in Vector Space Word Embeddings: Mikolov
  • 69.
  • 70. • cuda-convnet2 (Alex Krizhevsky, Toronto) (c++/ CUDA, optimized for GTX 580) 
 https://code.google.com/p/cuda-convnet2/ • Caffe (Berkeley) (Cuda/OpenCL, Theano, Python)
 http://caffe.berkeleyvision.org/ • OverFeat (NYU) 
 http://cilvr.nyu.edu/doku.php?id=code:start Wanna Play ?
  • 71. • Theano - CPU/GPU symbolic expression compiler in python (from LISA lab at University of Montreal). http:// deeplearning.net/software/theano/ • Pylearn2 - library designed to make machine learning research easy. http://deeplearning.net/software/pylearn2/ • Torch - Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu) http://torch.ch/ • more info: http://deeplearning.net/software links/ Wanna Play ? Wanna Play ?
  • 72. as PhD candidate KTH/CSC: “Always interested in discussion
 Machine Learning, Deep Architectures, 
 Graphs, and NLP” Wanna Play with Me ? roelof@kth.se www.csc.kth.se/~roelof/ Internship / EntrepeneurshipAcademic/Research as CIO/CTO Feeda: “Always looking for additions to our 
 brand new R&D team”
 
 [Internships upcoming on 
 KTH exjobb website…] roelof@feeda.com www.feeda.com Feeda
  • 74. Appendum Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chris Manning, Andrew Ng and Chris Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. EMNLP 2013 code & demo: http://nlp.stanford.edu/sentiment/index.html
  • 75. Appendum Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng 
 Improving Word Representations via Global Context and Multiple Word Prototypes