Deep Learning for
Natural Language Processing
Sihem Romdhani
Agenda
What is deep learning?
Why is this important?
How to apply deep learning to
natural language processing?
About Me
Sihem Romdhani
Software engineer at Veeva
MEng in Computer Engineering
MSc in ECE at University of Waterloo
	- Deep learning
	 - Speech/image recognition
What is machine learning?
Training Text
DOCUMENTS,
IMAGES,
SOUNDS, ...
Labels
Features Vectors
Machine
Learning
Algorithm
What is machine learning?
Training Text
DOCUMENTS,
IMAGES,
SOUNDS, ...
Labels
Features Vectors
Features Vector
New Text
DOCUMENTS,
IMAGES,
SOUNDS, ...
Predictive
Model
Machine
Learning
Algorithm
Expected
Label
What is deep learning?
image (pixel) edges object parts object models
Input Layer Output Layer
SARA
layers of abstraction
DEEP NEURAL NETWORK (DNN)
Speech
technology
DeepMind beats the best in GO
Nvidia to drive car with
Drive PX 2 super computer
Apple’s
Siri
Amazon’s
Alexa
Quiz Time!
Which is human? Which is machine?
I’m in a love affair I can’t share it ain’t fair
Haha I’m just playin’ ladies you know I love you.
I know my love is true and I know you love me too
Girl I’m down for whatever cause my love is true
SONG A
For a chance at romance I would love to enhance
But everything I love has turned to a tedious task
One day we gonna have to leave our love in the past
I love my fans but no one ever puts a grasp
SONG B
DeepBeat.org
Online tool for rap lyrics
generator
Natural Language Processing (NLP)
understanding and generating human language
Human language is
difficult!
variability
kitty vs. cat
ambiguity
blue jays vs. Blue Jays
vague and latent
“Eating babies can be messy.”
How can we represent words?
AUDIO SPECTROGRAM IMAGE PIXELS VECTOR OF REAL NUMBERS
Audio Image Word
‘CAT’
0 0.5 0 0.2 0.1 0 0.1 0 0.3 0 0.4 0.1
‘KITTY’
Similarity?
Embedding Embedding
The 		 cat/kitty		 purrs
Context defines similarity
Infer the meaning of words from the company they keep
The 		 cat/kitty	 hunts mice
Word to Vector
The little cat hunts mice
Embedding (‘cat’)
Context Window
Predict
Word2Vec (Google) for learning embeddings
1. Collect large corpus of text
2. Randomly initialize embeddings
	 Each word is mapped into a fixed-size embedding with random values
3. Set up prediction task
	 For each word, pick a random word from the context and consider it as the target (label)
	 INPUT WORD → TARGET WORD
		 ‘Cat’	 → ‘Hunt’
4. Learn the embeddings
	 Train a model (logistic classifier) to predict the target (neighbours of each word)
	 Update the embedding
Word2Vec Process
‘cat’
Embedding
Linear model
wVcat + b
‘cat’
Embedding
Predicted target
PREDICTION
Linear model
wVcat + b
Word2Vec Process
0
0
0
0
0
1
0
0
0
0
‘cat’
Embedding
Predicted target
Real target
‘hunt’
PREDICTION COMPARE
Linear model
wVcat + b
Volcabulary
Word2Vec Process
Similar words are nearby
Word Embedding
‘cat’
‘tiger’
‘home’
‘kitten’
‘dog’
Let’s discuss a real problem.
Pharma Company’s
Market Expansion
New product:
pancreatic cancer treatment!
Who are the oncologists
to inform about it?
Web scraping
DIFFERENT DATA-EXTRACTION SCRIPTS FOR DIFFERENT WEB PAGES
Consolidating into one record
Extracting addresses
from webpages
abbreviations
Avenue - Ave - Ave.
ambiguity in names
main vs. Main Street
missing elements
mispellings
misorder
Demo
Solution
architecture
WEB PAGE
(HTML)
TOKENIZATION
WORD REPRESENTATION/
WORD EMBEDDING
DEEP NEURAL NETWORK
CLASSIFIER
POST PROCESSING
OUTPUT
EXTRACTED ADDRESS
Solution
architecture
WEB PAGE
(HTML)
TOKENIZATION
WORD REPRESENTATION/
WORD EMBEDDING
DEEP NEURAL NETWORK
CLASSIFIER
POST PROCESSING
OUTPUT
EXTRACTED ADDRESS
Solution
architecture
WEB PAGE
(HTML)
TOKENIZATION
WORD REPRESENTATION/
WORD EMBEDDING
DEEP NEURAL NETWORK
CLASSIFIER
POST PROCESSING
OUTPUT
EXTRACTED ADDRESS
Solution
architecture
WEB PAGE
(HTML)
TOKENIZATION
WORD REPRESENTATION/
WORD EMBEDDING
DEEP NEURAL NETWORK
CLASSIFIER
POST PROCESSING
OUTPUT
EXTRACTED ADDRESS
Solution
architecture
WEB PAGE
(HTML)
TOKENIZATION
WORD REPRESENTATION/
WORD EMBEDDING
DEEP NEURAL NETWORK
CLASSIFIER
POST PROCESSING
OUTPUT
EXTRACTED ADDRESS
TensorFlow for word embedding
TensorFlow for word embedding
TensorFlow for word embedding
TensorFlow for word embedding
Training a DNN classifier
123 Main Street, Florida 32209
123 Street Florida 32209Main
Context Words
AO
A: Address
O: Other (non-address)
Hidden Layer 3
Hidden Layer 2
Hidden Layer 1
Address extraction algorithm
1. Load the trained DNN model
2. for each new web page do
	 3. The web page is pre-processed to remove the HTML tags, then
		tokenized
	 4. Each token (word) is replaced by its embedding vector
	 5. Create the DNN input by concatenating the embedding of the
		context
	 6. The DNN labels the central word of the context as A (ADDRESS) or 		
		O (OTHER)
	 7.Extract the sequence of tokens with A label
	 8.if the total number of tokens is within range then
			 9. Output the token block as extracted address
	 10.End if
11.End For
Input: Trained DNN model and unlabeled web page
Output: Extracted address
Applications of NLP
Information extraction
文
A
Social media analytics
Question
& Answer
Natural language
generation
Speech
recognition
Useful Tools
Questions?
Slides available on Slack channel #veeva
Stop by our booth for questions and cool swag!

Deep learning for NLP