SlideShare a Scribd company logo
1 of 23
2017/12/13
M2 Masahiro Kaneko
Introduction
• The potential for leveraging multiple levels of representation
has been demonstrated in various ways in the field of Natural
Language Processing
• For example, Part-Of-Speech (POS) tags are used for
syntactic parsers
• These systems are often pipelines and not trained end-to-
end
Problems
• Deep NLP models have yet shown benefits from predicting
many increasingly complex tasks each at a successively
deeper layer
• Existing models often ignore linguistic hierarchies by
predicting different tasks either entirely separately or at the
same depth
The Joint Many-Task Model (JMT)
• Unlike traditional pipeline systems, our
single JMT model can be trained end-
to-end for POS tagging, chunking,
dependency parsing, semantic
relatedness, and textual entailment, by
considering linguistic hierarchies.
• We propose an adaptive training and
regularization strategy to grow this
model in its depth.
Word Representations
• Word embeddings
• We use Skip-gram to train word embeddings.
• Character embeddings
• Character n-gram embeddings are trained by the same Skip-gram
objective.
• The concatenation of its corresponding word and character
embeddings shared across the tasks.
Word-Level Task: POS Tagging
• For predicting the POS tag of wt, we use the concatenation
of the forward and backward states in a one-layer bi-LSTM
layer corresponding to the t-th word.
• Then each ht is fed into a standard softmax classifier with a
single ReLU layer which outputs the probability vector y(1)
for each of the POS tags.
Word-Level Task: Chunking
• Chunking is also a word-level classification task which
assigns a chunking tag (B-NP, I-VP, etc.) for each word.
• Chunking is performed in the second bi-LSTM layer on top of
the POS layer.
• We define the weighted label embedding y(pos)
• we employ the same strategy as POS tagging
Syntactic Task: Dependency Parsing
• Dependency parsing identifies syntactic relations (such as an
adjective modifying a noun) between word pairs in a
sentence.
• We use the third bi-LSTM layer to classify relations between
all pairs of words.
• where computed the chunking vector in a similar fashion as
the POS vector in Eq. (2).
• We predict the parent node (head) for each word. Then a
dependency label is predicted for each child-parent pair.
• We greedily select the parent node and the dependency label for
each word.
• When the parsing result is not a well-formed tree, we apply the
first-order Eisner’s algorithm to obtain a well-formed tree from it.
Syntactic Task: Dependency Parsing
Semantic Task
• The first task measures the semantic relatedness between
two sentences.
• The output is a real-valued relatedness score for the input
sentence pair.
• The second task is textual entailment, which requires one to
determine whether a premise sentence entails a hypothesis
sentence.
• There are typically three classes: entailment, contradiction,
and neutral.
Semantic Task: Semantic relatedness
• We compute the sentence-level representation as the
element-wise maximum values across all of the word-level
representations in the fourth layer
Then d1 is fed into a softmax classifier with a single Maxout hidden
layer to output a relatedness score (from 1 to 5 in our case).
Semantic Task: Textual entailment
• For entailment classification, we also use the max-pooling
technique as in the semantic relatedness task.
• To classify the premise-hypothesis pair (s, s’) into one of the
three classes.
• we compute the feature vector d2(s, s’) as Eq. (5) except that
we do not use the absolute values of the element-wise
subtraction, because we need to identify which is the
premise (or hypothesis).
• Then d2(s, s’) is fed into a softmax classifier.
Pre-Training Word Representations
• We pre-train word embeddings using the Skip-gram model
with negative sampling.
• We also pre-train the character n-gram embeddings using
Skip-gram.
• The only difference is that each input word embedding is
replaced with its corresponding average character n-gram
embedding.
Training of each layers
• POS
• Chk
• dep
Training of each layers
• Rel
• ent
Datasets
• POS tagging: Wall Street Journal (WSJ)
• Chunking: CoNLL 2000
• Dependency parsing: WSJ
• Semantic relatedness: SICK
• Textual entailment: SICK
Result
何がいいたいの?
Results
Shortcut connections & Output label embeddings
LE: POS, chunking, and relatedness
Depth
Successive Regularization & Vertical Connections
Character n-gram embeddings
Conclusion
• We presented a joint many-task model to handle multiple
NLP tasks with growing depth in a single end-to-end model.
• Our model is successively trained by considering linguistic
hierarchies, directly feeding word representations into all
layers, explicitly using low-level predictions, and applying
successive regularization.
• In experiments on five NLP tasks, our single model achieves
the state-of-the-art or competitive results on chunking,
dependency parsing, semantic relatedness, and textual
entailment.

More Related Content

What's hot

[Emnlp] what is glo ve part iii - towards data science
[Emnlp] what is glo ve  part iii - towards data science[Emnlp] what is glo ve  part iii - towards data science
[Emnlp] what is glo ve part iii - towards data scienceNikhil Jaiswal
 
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?Hayahide Yamagishi
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken languagePallavi Bharti
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Knowledge based System
Knowledge based SystemKnowledge based System
Knowledge based SystemTamanna36
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopeSchwannden Kuo
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
 

What's hot (20)

Word2Vec
Word2VecWord2Vec
Word2Vec
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Language models
Language modelsLanguage models
Language models
 
[Emnlp] what is glo ve part iii - towards data science
[Emnlp] what is glo ve  part iii - towards data science[Emnlp] what is glo ve  part iii - towards data science
[Emnlp] what is glo ve part iii - towards data science
 
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken language
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Nlp
NlpNlp
Nlp
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
NLP
NLPNLP
NLP
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Knowledge based System
Knowledge based SystemKnowledge based System
Knowledge based System
 
PL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and ScopePL Lecture 02 - Binding and Scope
PL Lecture 02 - Binding and Scope
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Improving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural LanguageImproving Document Clustering by Eliminating Unnatural Language
Improving Document Clustering by Eliminating Unnatural Language
 
Deep learning for nlp
Deep learning for nlpDeep learning for nlp
Deep learning for nlp
 
Oops
OopsOops
Oops
 

Similar to A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptxNameetDaga1
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERTAbdurrahimDerric
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge BasesKushal Arora
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsJinho Choi
 
State-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsState-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsAusaf Ahmed
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Association for Computational Linguistics
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...ayaha osaki
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
Word Space Models and Random Indexing
Word Space Models and Random IndexingWord Space Models and Random Indexing
Word Space Models and Random IndexingDileepa Jayakody
 
Word Space Models & Random indexing
Word Space Models & Random indexingWord Space Models & Random indexing
Word Space Models & Random indexingDileepa Jayakody
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deren Lei
 

Similar to A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks (20)

Word_Embedding.pptx
Word_Embedding.pptxWord_Embedding.pptx
Word_Embedding.pptx
 
Word embedding
Word embedding Word embedding
Word embedding
 
Turkish language modeling using BERT
Turkish language modeling using BERTTurkish language modeling using BERT
Turkish language modeling using BERT
 
wordembedding.pptx
wordembedding.pptxwordembedding.pptx
wordembedding.pptx
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Neural Network in Knowledge Bases
Neural Network in Knowledge BasesNeural Network in Knowledge Bases
Neural Network in Knowledge Bases
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
 
Nn kb
Nn kbNn kb
Nn kb
 
semeval2016
semeval2016semeval2016
semeval2016
 
Intrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word EmbeddingsIntrinsic and Extrinsic Evaluations of Word Embeddings
Intrinsic and Extrinsic Evaluations of Word Embeddings
 
State-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word RepresentationsState-of-the-Art Text Classification using Deep Contextual Word Representations
State-of-the-Art Text Classification using Deep Contextual Word Representations
 
Understanding GloVe
Understanding GloVeUnderstanding GloVe
Understanding GloVe
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Word Space Models and Random Indexing
Word Space Models and Random IndexingWord Space Models and Random Indexing
Word Space Models and Random Indexing
 
Word Space Models & Random indexing
Word Space Models & Random indexingWord Space Models & Random indexing
Word Space Models & Random indexing
 
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

  • 2. Introduction • The potential for leveraging multiple levels of representation has been demonstrated in various ways in the field of Natural Language Processing • For example, Part-Of-Speech (POS) tags are used for syntactic parsers • These systems are often pipelines and not trained end-to- end
  • 3. Problems • Deep NLP models have yet shown benefits from predicting many increasingly complex tasks each at a successively deeper layer • Existing models often ignore linguistic hierarchies by predicting different tasks either entirely separately or at the same depth
  • 4. The Joint Many-Task Model (JMT) • Unlike traditional pipeline systems, our single JMT model can be trained end- to-end for POS tagging, chunking, dependency parsing, semantic relatedness, and textual entailment, by considering linguistic hierarchies. • We propose an adaptive training and regularization strategy to grow this model in its depth.
  • 5. Word Representations • Word embeddings • We use Skip-gram to train word embeddings. • Character embeddings • Character n-gram embeddings are trained by the same Skip-gram objective. • The concatenation of its corresponding word and character embeddings shared across the tasks.
  • 6. Word-Level Task: POS Tagging • For predicting the POS tag of wt, we use the concatenation of the forward and backward states in a one-layer bi-LSTM layer corresponding to the t-th word. • Then each ht is fed into a standard softmax classifier with a single ReLU layer which outputs the probability vector y(1) for each of the POS tags.
  • 7. Word-Level Task: Chunking • Chunking is also a word-level classification task which assigns a chunking tag (B-NP, I-VP, etc.) for each word. • Chunking is performed in the second bi-LSTM layer on top of the POS layer. • We define the weighted label embedding y(pos) • we employ the same strategy as POS tagging
  • 8. Syntactic Task: Dependency Parsing • Dependency parsing identifies syntactic relations (such as an adjective modifying a noun) between word pairs in a sentence. • We use the third bi-LSTM layer to classify relations between all pairs of words. • where computed the chunking vector in a similar fashion as the POS vector in Eq. (2). • We predict the parent node (head) for each word. Then a dependency label is predicted for each child-parent pair.
  • 9. • We greedily select the parent node and the dependency label for each word. • When the parsing result is not a well-formed tree, we apply the first-order Eisner’s algorithm to obtain a well-formed tree from it. Syntactic Task: Dependency Parsing
  • 10. Semantic Task • The first task measures the semantic relatedness between two sentences. • The output is a real-valued relatedness score for the input sentence pair. • The second task is textual entailment, which requires one to determine whether a premise sentence entails a hypothesis sentence. • There are typically three classes: entailment, contradiction, and neutral.
  • 11. Semantic Task: Semantic relatedness • We compute the sentence-level representation as the element-wise maximum values across all of the word-level representations in the fourth layer Then d1 is fed into a softmax classifier with a single Maxout hidden layer to output a relatedness score (from 1 to 5 in our case).
  • 12. Semantic Task: Textual entailment • For entailment classification, we also use the max-pooling technique as in the semantic relatedness task. • To classify the premise-hypothesis pair (s, s’) into one of the three classes. • we compute the feature vector d2(s, s’) as Eq. (5) except that we do not use the absolute values of the element-wise subtraction, because we need to identify which is the premise (or hypothesis). • Then d2(s, s’) is fed into a softmax classifier.
  • 13. Pre-Training Word Representations • We pre-train word embeddings using the Skip-gram model with negative sampling. • We also pre-train the character n-gram embeddings using Skip-gram. • The only difference is that each input word embedding is replaced with its corresponding average character n-gram embedding.
  • 14. Training of each layers • POS • Chk • dep
  • 15. Training of each layers • Rel • ent
  • 16. Datasets • POS tagging: Wall Street Journal (WSJ) • Chunking: CoNLL 2000 • Dependency parsing: WSJ • Semantic relatedness: SICK • Textual entailment: SICK
  • 19. Shortcut connections & Output label embeddings LE: POS, chunking, and relatedness
  • 20. Depth
  • 21. Successive Regularization & Vertical Connections
  • 23. Conclusion • We presented a joint many-task model to handle multiple NLP tasks with growing depth in a single end-to-end model. • Our model is successively trained by considering linguistic hierarchies, directly feeding word representations into all layers, explicitly using low-level predictions, and applying successive regularization. • In experiments on five NLP tasks, our single model achieves the state-of-the-art or competitive results on chunking, dependency parsing, semantic relatedness, and textual entailment.