SlideShare a Scribd company logo
1 of 56
Download to read offline
11
Bryan Hang Zhang
Natural Language Processing
Almost From Scratch
22
• Presents a deep neural network architecture for NLP tasks
• Presents results comparable to state-of-art on 4 NLP tasks
• Part of Speech tagging
• Chunking
• Named Entity Recognition
• Semantic Role Labeling
• Presents word embeddings learned from a large unlabelled corpus
and shows an improvement in results by using these features
• Presents results of joint training for the above tasks.
33
• Propose a unified neural network architecture and
learning algorithm that can be applied to various NLP
tasks
• Instead of creating hand-crafted features, we can acquire
task-specific features ( internal representation) from
great amount of labelled and unlabelled training data.
Motivation
44
• Part of Speech Tagging
• Successively assign Part-of-Speech tags to words in a text
sequence automatically.
• Chunking
• Chunking is also called shallow parsing and it's basically the
identification of parts of speech and short phrases (like noun
phrases)
• Named Entity Recognition
• classify the elements in the text into predefined categories
such as person, location etc.
Task Introduction
55
• SRL is sometimes also called shallow semantic parsing, is a task
in consisting of the detection of the semantic arguments
associated with the predicate or verb of a sentence and their
classification into their specific roles.
Semantic Role Labeling
e.g 1. Mark sold the car to Mary.
agent
represent
predicate
theme recipient
e.g 2.
66
State-of-the-art systems
experiment setup
77
Benchmark Systems
88
Networks Architecture
99
• Traditional Approach:
• hand-design features
• New Approach:
• multi-layer neural networks.
The Networks
1010
• Transforming Words into Feature Vectors
• Extracting Higher Level Features from Word Feature Vector
• Training
• Benchmark Result
Bullet
1111
• Transforming Words into Feature Vectors
• Extracting Higher Level Features from Word Feature Vector
• Training
• Benchmark Result
Bullet
1212
Neural Network
1313
Window approach network Sentence approach networkWindow approach network Sentence approach network
Two Approaches Overview
1414
• K Discrete Features construct a Matrix as a lookup table
Lookup tables
K discrete feature
Matrix
Lookup Tables
1515
• Window Size: for example, 5
• Raw text features:
• — Lower case word
• — Capitalised feature
Words to Features: Window Approach
1616
Window Approach
My Name is Bryan
PADDING PADDING My Name is Bryan PADDING PADDING
PADDING PADDING My Name is
PADDING My Name is Bryan
My Name is Bryan PADDING
Name is Bryan PADDING PADDING
1717
Word to Features
Words to features!
My
Word index!
Caps index!
Vocabulary size (130,000)!
Number of
options (5)!
50!
5!
6"
Word Lookup Table
Caps Lookup
Table
1818
Words to Features
PADDING
PADDING
My
Name
is
Words to features!
PADDING
PADDING
My
Name
is
275!
7"
1919
• Transforming Words into Feature Vectors
• Extracting Higher Level Features from Word Feature Vector
• Training
• Benchmark Result
Bullet
2020
Extracting Higher Level Features
Word Feature Vectors
L-layer Neural Network
Word Feature Vectors
L Neural Network
l
Extracting Higher Level Features From
Word Feature Vectors
L Neural Network
l
Word Feature Vecto
L Neural Network
l
Any feed forward neural network with L layers cane be
seen as a composition of function corresponding to
each layer l
: parameters
2121
Window approach
t = 3,dwi n = 2
w1
1
w1
2
M
w1
3
M
w5
K−1
w5
K
ndow approach
Window approach
t = 3,dwi n = 2
w1
1
w1
2
M
w1
3
M
w5
K−1
w5
K
dow approach
Words to features!
PADDING
PADDING
My
Name
is
275!
7"
This is a window vector
2222
Linear Layer (window approach)
yer Linear Layer
Window approach
Parameters to be
trained
€
nhu
l
l hidden unit
a
f1
✓ = hLTW ([w
Linear Layer The fixed size vec
network layers which perform a ne
f
where Wl 2 Rnl
hu⇥nl 1
hu and bl 2 Rnl
hu
nl
hu is usually called the number of h
HardTanh Layer Several linear l
function, to extract highly non-linear
number of hidden units
of the l th layer
Linear Layer The fixed size vector f1
✓ can
network layers which perform a ne transforma
fl
✓ = Wl
fl
✓
where Wl 2 Rnl
hu⇥nl 1
hu and bl 2 Rnl
hu are the par
nl
hu is usually called the number of hidden unit
HardTanh Layer Several linear layers are of
function, to extract highly non-linear features. I
10
hWi1
[w]t+dw
Linear Layer The fixed size vector f1
✓ can be fed to o
network layers which perform a ne transformations over th
fl
✓ = Wl
fl 1
✓ + bl
,
where Wl 2 Rnl
hu⇥nl 1
hu and bl 2 Rnl
hu are the parameters to be
nl
hu is usually called the number of hidden units of the lth la
HardTanh Layer Several linear layers are often stacked, i
function, to extract highly non-linear features. If no non-linea
10
To be trained
linear layers stacked
interleaved with nonlinearity function to
extract highly non linear features. with
out non linearity, the network would be
just a linear model.
2323
HardTanh Layer
yer • Non-linear feature
Window approach HardTanh Layer
• Non-linear feature
Window approach
Using hardTanh instead of
hyperbolic Tanh is to make
the computation cheaper
2424
• Window Approach works well for most NLP tasks . However, it
fails with Semantic Role Labelling.
Window Approach Remark
Reason: the tag of a word depends on the
verb ( predicate) chosen beforehand in
the sentence . If the verb falls outside the
window then one cannot expect this word
to be tagged correctly. Then it requires
the consideration of sentence approach.
2525
Convoluntional Layer : Sentence Approachal Layer Convolutional Layer
Sentence approach
sentence
→1
generalisation of window approach,
windows in a sequence can be all taken
into consideration
2626
Neural Network Architecture!
Look"up"Table"
Words!
Linear"Layer"
Hard"Tanh"
Linear"Layer"
Convolu7on"
Max"Over"Time"
3"
2727
Convoluntional NN
2828
Time Delay Neural Neural NetworkTime Delay Neural Network
2929
Max Layer: Sentence Approach
yer
l
Max Layer
Sentence approach
hidden unit t=0 t
The max of the hidden units
over t = 0 - t
3030
Tagging SchemeTagging Schemes
3131
• Transforming Words into Feature Vectors
• Extracting Higher Level Features from Word Feature Vector
• Training
• Benchmark Result
Bullet
3232
Training
Maximising the log-likelihood with respect to Theta
Training
Training
is the training set
3333
Training: Word Level Log-Likelihood
Training
Word Level
Log-Likelihood
soft max all
over tags
cross-entropy, it is not ideal because of the tag of a
word in the sentence and its neighbouring tags
3434
Training: Sentence Level Log-Likelihood
Sentence Level Log-Likelihood
transition score to jump from tag k to tag iAk,l
Sentence score for a tag path
€
[i ]1
T
3535
Training Sentence Level Log-Likelihood
Training
Sentence Level
Log-Likelihood
Conditional likelihood
by normalizing w.r.t all possible paths
3636
TrainingTraining
recursive Forward algorithm
Inference: Viterbi algorithm (replace logAdd by
max)
3737
• Transforming Words into Feature Vectors
• Extracting Higher Level Features from Word Feature Vector
• Training
• Benchmark Result
Bullet
3838
• use lower case words in the dictionary
• add ‘caps’ feature to words that have at least one non-initial
capital letter.
• number with in a word are replaced with the string ‘Number’
Pre-processing
3939
Hyper-parametersHyper-parameters
4040
Benchmark Result
Sentences with similar words should be
tagged in the same way.
e.g.
The cat sat on the mat.
The feline sat on the mat.
4141
Neighbouring Words
neighboring words
neighboring words
word embeddings in the word lookup table of a SRL neural
network trained from scratch. 10 nearest neighbours using
Euclidean metirc.
4242
• The Lookup table can also be trained on unlabelled data by
optimising it to learn a language model.
• This gives words features that map similar words to similar
vectors (semantically)
Word Embeddings
4343
Sentence Embedding
Document Embedding
Word Embedding
4444
Ranking Language Model
Ranking Language Model
Ranking Language Model
4545
Tremendous Unlabelled Data
Lots of Unlabeled Data
• Two window approach (11) networks (100HU) trained on
two corpus
• LM1
– Wikipedia: 631 Mwords
– order dictionary words by frequency
– increase dictionary size: 5000, 10; 000, 30; 000, 50; 000,
100; 000
– 4 weeks of training
• LM2
– Wikipedia + Reuter=631+221=852M words
– initialized with LM1, dictionary size is 130; 000
– 30,000 additional most frequent Reuters words
– 3 additional weeks of training
4646
Word Embeddings
Word Embeddings
neighboring words
4747
Benchmark PerformanceBenchmark Performance
4848
Multitask Learning
4949
Xiv
Natural Language Processing (almost) from Scratch
Lookup Table
Linear
Lookup Table
Linear
HardTanh HardTanh
Linear
Task 1
Linear
Task 2
M2
(t1) ⇥ · M2
(t2) ⇥ ·
LTW 1
...
LTW K
M1
⇥ ·
n1
hu n1
hu
n2
hu,(t1)
= #tags n2
hu,(t2)
= #tags
Figure 5: Example of multitasking with NN. Task 1 and Task 2 are two tasks trained with
the window approach architecture presented in Figure 1. Lookup tables as well as the first
hidden layer are shared. The last layer is task specific. The principle is the same with more
than two tasks.
5.2 Multi-Task Benchmark Results
Table 9 reports results obtained by jointly trained models for the POS, CHUNK, NER and
SRL tasks using the same setup as Section 4.5. We trained jointly POS, CHUNK and NER
using the window approach network. As we mentioned earlier, SRL can be trained only
with the sentence approach network, due to long-range dependencies related to the verb
Joint Training
5050
MultiTask Learning
5151
Temptation
5252
The Temptation
• Suffix Features
– Use last two characters as feature
• Gazetters
– 8,000 locations, person names, organizations
and misc entries from CoNLL 2003
• POS
– use POS as a feature for CHUNK & NER
• CHUNK
– use CHUNK as a feature for SRL
5353
5454
Ensembles
10 Neural Network
→
voting ensemble: voting ten network outputs on a per tag basis
joined ensemble: parameters of the combining layer were trained on the
existing training set while keeping the networks fixed.
5555
ConclusionConclusion
• Achievements
– “All purpose" neural network architecture for NLP tagging
– Limit task-specic engineering
– Rely on very large unlabeled datasets
– We do not plan to stop here
• Critics
– Why forgetting NLP expertise for neural network training
skills?
• NLP goals are not limited to existing NLP task
• Excessive task-specic engineering is not desirable
– Why neural networks?
• Scale on massive datasets
• Discover hidden representations
• Most of neural network technology existed in 1997 (Bottou,
1997)
5656
Thank you!

More Related Content

What's hot

Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information RetrievalRoelof Pieters
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systemsQi He
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Márton Miháltz
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP ApplicationsSamiur Rahman
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Jinpyo Lee
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingJonathan Mugan
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)YerevaNN research lab
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systemsQi He
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysisodsc
 

What's hot (20)

Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
NLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLPNLP Bootcamp 2018 : Representation Learning of text for NLP
NLP Bootcamp 2018 : Representation Learning of text for NLP
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)Sentence representations and question answering (YerevaNN)
Sentence representations and question answering (YerevaNN)
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
[AAAI 2019 tutorial] End-to-end goal-oriented question answering systems
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 

Viewers also liked

Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANDeep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANTAUS - The Language Data Network
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...Nicolas Kourtellis
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaDiana Maynard
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
 
Google's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation SystemGoogle's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation SystemNamHyuk Ahn
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksGuillaume Pitel
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
投影片範本 Papago!mobile ppt
投影片範本 Papago!mobile ppt投影片範本 Papago!mobile ppt
投影片範本 Papago!mobile pptXuanJun Lin
 

Viewers also liked (20)

Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRANDeep Learning for Machine Translation, by Jean Senellart, SYSTRAN
Deep Learning for Machine Translation, by Jean Senellart, SYSTRAN
 
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
SAMOA: A Platform for Mining Big Data Streams (Apache BigData North America 2...
 
GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big Data for Big Results in Chinese Social Media
Big Data for Big Results in Chinese Social MediaBig Data for Big Results in Chinese Social Media
Big Data for Big Results in Chinese Social Media
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Google's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation SystemGoogle's Multilingual Neural Machine Translation System
Google's Multilingual Neural Machine Translation System
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
话题模型2
话题模型2话题模型2
话题模型2
 
Dialogue System Iso 24617 2
Dialogue System Iso 24617 2Dialogue System Iso 24617 2
Dialogue System Iso 24617 2
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
On using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translationOn using monolingual corpora in neural machine translation
On using monolingual corpora in neural machine translation
 
Count-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasksCount-Min Tree Sketch : Approximate counting for NLP tasks
Count-Min Tree Sketch : Approximate counting for NLP tasks
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
投影片範本 Papago!mobile ppt
投影片範本 Papago!mobile ppt投影片範本 Papago!mobile ppt
投影片範本 Papago!mobile ppt
 
NLP_session-1
NLP_session-1NLP_session-1
NLP_session-1
 

Similar to NLP from scratch

NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Reuven Lerner
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachMinhazul Arefin
 
SE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTSE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTnikshaikh786
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPindico data
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & MetricsSanghamitra Deb
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
CLEAN CODING AND DEVOPS Final.pptx
CLEAN CODING AND DEVOPS Final.pptxCLEAN CODING AND DEVOPS Final.pptx
CLEAN CODING AND DEVOPS Final.pptxJEEVANANTHAMG6
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4LKoji Sekiguchi
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNico Ludwig
 
Fp for the oo programmer
Fp for the oo programmerFp for the oo programmer
Fp for the oo programmerShawn Button
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Sarah Stemmler
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture OneAngelo Corsaro
 

Similar to NLP from scratch (20)

NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014Functional Python Webinar from October 22nd, 2014
Functional Python Webinar from October 22nd, 2014
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning Approach
 
SE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTSE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPT
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
C++ chapter 1
C++ chapter 1C++ chapter 1
C++ chapter 1
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
CLEAN CODING AND DEVOPS Final.pptx
CLEAN CODING AND DEVOPS Final.pptxCLEAN CODING AND DEVOPS Final.pptx
CLEAN CODING AND DEVOPS Final.pptx
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4L
 
New c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_ivNew c sharp3_features_(linq)_part_iv
New c sharp3_features_(linq)_part_iv
 
Scala Days NYC 2016
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016
 
Fp for the oo programmer
Fp for the oo programmerFp for the oo programmer
Fp for the oo programmer
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2
 
Programming in Scala - Lecture One
Programming in Scala - Lecture OneProgramming in Scala - Lecture One
Programming in Scala - Lecture One
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

NLP from scratch

  • 1. 11 Bryan Hang Zhang Natural Language Processing Almost From Scratch
  • 2. 22 • Presents a deep neural network architecture for NLP tasks • Presents results comparable to state-of-art on 4 NLP tasks • Part of Speech tagging • Chunking • Named Entity Recognition • Semantic Role Labeling • Presents word embeddings learned from a large unlabelled corpus and shows an improvement in results by using these features • Presents results of joint training for the above tasks.
  • 3. 33 • Propose a unified neural network architecture and learning algorithm that can be applied to various NLP tasks • Instead of creating hand-crafted features, we can acquire task-specific features ( internal representation) from great amount of labelled and unlabelled training data. Motivation
  • 4. 44 • Part of Speech Tagging • Successively assign Part-of-Speech tags to words in a text sequence automatically. • Chunking • Chunking is also called shallow parsing and it's basically the identification of parts of speech and short phrases (like noun phrases) • Named Entity Recognition • classify the elements in the text into predefined categories such as person, location etc. Task Introduction
  • 5. 55 • SRL is sometimes also called shallow semantic parsing, is a task in consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles. Semantic Role Labeling e.g 1. Mark sold the car to Mary. agent represent predicate theme recipient e.g 2.
  • 9. 99 • Traditional Approach: • hand-design features • New Approach: • multi-layer neural networks. The Networks
  • 10. 1010 • Transforming Words into Feature Vectors • Extracting Higher Level Features from Word Feature Vector • Training • Benchmark Result Bullet
  • 11. 1111 • Transforming Words into Feature Vectors • Extracting Higher Level Features from Word Feature Vector • Training • Benchmark Result Bullet
  • 13. 1313 Window approach network Sentence approach networkWindow approach network Sentence approach network Two Approaches Overview
  • 14. 1414 • K Discrete Features construct a Matrix as a lookup table Lookup tables K discrete feature Matrix Lookup Tables
  • 15. 1515 • Window Size: for example, 5 • Raw text features: • — Lower case word • — Capitalised feature Words to Features: Window Approach
  • 16. 1616 Window Approach My Name is Bryan PADDING PADDING My Name is Bryan PADDING PADDING PADDING PADDING My Name is PADDING My Name is Bryan My Name is Bryan PADDING Name is Bryan PADDING PADDING
  • 17. 1717 Word to Features Words to features! My Word index! Caps index! Vocabulary size (130,000)! Number of options (5)! 50! 5! 6" Word Lookup Table Caps Lookup Table
  • 18. 1818 Words to Features PADDING PADDING My Name is Words to features! PADDING PADDING My Name is 275! 7"
  • 19. 1919 • Transforming Words into Feature Vectors • Extracting Higher Level Features from Word Feature Vector • Training • Benchmark Result Bullet
  • 20. 2020 Extracting Higher Level Features Word Feature Vectors L-layer Neural Network Word Feature Vectors L Neural Network l Extracting Higher Level Features From Word Feature Vectors L Neural Network l Word Feature Vecto L Neural Network l Any feed forward neural network with L layers cane be seen as a composition of function corresponding to each layer l : parameters
  • 21. 2121 Window approach t = 3,dwi n = 2 w1 1 w1 2 M w1 3 M w5 K−1 w5 K ndow approach Window approach t = 3,dwi n = 2 w1 1 w1 2 M w1 3 M w5 K−1 w5 K dow approach Words to features! PADDING PADDING My Name is 275! 7" This is a window vector
  • 22. 2222 Linear Layer (window approach) yer Linear Layer Window approach Parameters to be trained € nhu l l hidden unit a f1 ✓ = hLTW ([w Linear Layer The fixed size vec network layers which perform a ne f where Wl 2 Rnl hu⇥nl 1 hu and bl 2 Rnl hu nl hu is usually called the number of h HardTanh Layer Several linear l function, to extract highly non-linear number of hidden units of the l th layer Linear Layer The fixed size vector f1 ✓ can network layers which perform a ne transforma fl ✓ = Wl fl ✓ where Wl 2 Rnl hu⇥nl 1 hu and bl 2 Rnl hu are the par nl hu is usually called the number of hidden unit HardTanh Layer Several linear layers are of function, to extract highly non-linear features. I 10 hWi1 [w]t+dw Linear Layer The fixed size vector f1 ✓ can be fed to o network layers which perform a ne transformations over th fl ✓ = Wl fl 1 ✓ + bl , where Wl 2 Rnl hu⇥nl 1 hu and bl 2 Rnl hu are the parameters to be nl hu is usually called the number of hidden units of the lth la HardTanh Layer Several linear layers are often stacked, i function, to extract highly non-linear features. If no non-linea 10 To be trained linear layers stacked interleaved with nonlinearity function to extract highly non linear features. with out non linearity, the network would be just a linear model.
  • 23. 2323 HardTanh Layer yer • Non-linear feature Window approach HardTanh Layer • Non-linear feature Window approach Using hardTanh instead of hyperbolic Tanh is to make the computation cheaper
  • 24. 2424 • Window Approach works well for most NLP tasks . However, it fails with Semantic Role Labelling. Window Approach Remark Reason: the tag of a word depends on the verb ( predicate) chosen beforehand in the sentence . If the verb falls outside the window then one cannot expect this word to be tagged correctly. Then it requires the consideration of sentence approach.
  • 25. 2525 Convoluntional Layer : Sentence Approachal Layer Convolutional Layer Sentence approach sentence →1 generalisation of window approach, windows in a sequence can be all taken into consideration
  • 28. 2828 Time Delay Neural Neural NetworkTime Delay Neural Network
  • 29. 2929 Max Layer: Sentence Approach yer l Max Layer Sentence approach hidden unit t=0 t The max of the hidden units over t = 0 - t
  • 31. 3131 • Transforming Words into Feature Vectors • Extracting Higher Level Features from Word Feature Vector • Training • Benchmark Result Bullet
  • 32. 3232 Training Maximising the log-likelihood with respect to Theta Training Training is the training set
  • 33. 3333 Training: Word Level Log-Likelihood Training Word Level Log-Likelihood soft max all over tags cross-entropy, it is not ideal because of the tag of a word in the sentence and its neighbouring tags
  • 34. 3434 Training: Sentence Level Log-Likelihood Sentence Level Log-Likelihood transition score to jump from tag k to tag iAk,l Sentence score for a tag path € [i ]1 T
  • 35. 3535 Training Sentence Level Log-Likelihood Training Sentence Level Log-Likelihood Conditional likelihood by normalizing w.r.t all possible paths
  • 36. 3636 TrainingTraining recursive Forward algorithm Inference: Viterbi algorithm (replace logAdd by max)
  • 37. 3737 • Transforming Words into Feature Vectors • Extracting Higher Level Features from Word Feature Vector • Training • Benchmark Result Bullet
  • 38. 3838 • use lower case words in the dictionary • add ‘caps’ feature to words that have at least one non-initial capital letter. • number with in a word are replaced with the string ‘Number’ Pre-processing
  • 40. 4040 Benchmark Result Sentences with similar words should be tagged in the same way. e.g. The cat sat on the mat. The feline sat on the mat.
  • 41. 4141 Neighbouring Words neighboring words neighboring words word embeddings in the word lookup table of a SRL neural network trained from scratch. 10 nearest neighbours using Euclidean metirc.
  • 42. 4242 • The Lookup table can also be trained on unlabelled data by optimising it to learn a language model. • This gives words features that map similar words to similar vectors (semantically) Word Embeddings
  • 44. 4444 Ranking Language Model Ranking Language Model Ranking Language Model
  • 45. 4545 Tremendous Unlabelled Data Lots of Unlabeled Data • Two window approach (11) networks (100HU) trained on two corpus • LM1 – Wikipedia: 631 Mwords – order dictionary words by frequency – increase dictionary size: 5000, 10; 000, 30; 000, 50; 000, 100; 000 – 4 weeks of training • LM2 – Wikipedia + Reuter=631+221=852M words – initialized with LM1, dictionary size is 130; 000 – 30,000 additional most frequent Reuters words – 3 additional weeks of training
  • 49. 4949 Xiv Natural Language Processing (almost) from Scratch Lookup Table Linear Lookup Table Linear HardTanh HardTanh Linear Task 1 Linear Task 2 M2 (t1) ⇥ · M2 (t2) ⇥ · LTW 1 ... LTW K M1 ⇥ · n1 hu n1 hu n2 hu,(t1) = #tags n2 hu,(t2) = #tags Figure 5: Example of multitasking with NN. Task 1 and Task 2 are two tasks trained with the window approach architecture presented in Figure 1. Lookup tables as well as the first hidden layer are shared. The last layer is task specific. The principle is the same with more than two tasks. 5.2 Multi-Task Benchmark Results Table 9 reports results obtained by jointly trained models for the POS, CHUNK, NER and SRL tasks using the same setup as Section 4.5. We trained jointly POS, CHUNK and NER using the window approach network. As we mentioned earlier, SRL can be trained only with the sentence approach network, due to long-range dependencies related to the verb Joint Training
  • 52. 5252 The Temptation • Suffix Features – Use last two characters as feature • Gazetters – 8,000 locations, person names, organizations and misc entries from CoNLL 2003 • POS – use POS as a feature for CHUNK & NER • CHUNK – use CHUNK as a feature for SRL
  • 53. 5353
  • 54. 5454 Ensembles 10 Neural Network → voting ensemble: voting ten network outputs on a per tag basis joined ensemble: parameters of the combining layer were trained on the existing training set while keeping the networks fixed.
  • 55. 5555 ConclusionConclusion • Achievements – “All purpose" neural network architecture for NLP tagging – Limit task-specic engineering – Rely on very large unlabeled datasets – We do not plan to stop here • Critics – Why forgetting NLP expertise for neural network training skills? • NLP goals are not limited to existing NLP task • Excessive task-specic engineering is not desirable – Why neural networks? • Scale on massive datasets • Discover hidden representations • Most of neural network technology existed in 1997 (Bottou, 1997)