SlideShare a Scribd company logo
1 of 13
Download to read offline
How Deep Learning Quietly
Revolutionized NLP
Łukasz Kaiser, Google Brain
What NLP tasks are we talking about?
● Part Of Speech Tagging Assign part-of-speech to each word.
● Parsing Create a grammar tree
given a sentence.
● Named Entity Recognition Recognize people, places, etc. in a
sentence.
● Language Modeling Generate natural sentences.
● Translation Translate a sentence into another
language.
● Sentence Compression Remove words to summarize a sentence.
● Abstractive Summarization Summarize a paragraph in new words.
Can deep learning solve these tasks?
● Inputs and outputs have variable size, how can neural networks handle it?
● Recurrent Neural Networks can do it, but how do we train them?
● Long Short-Term Memory [Hochreiter et al., 1997], but how to compose it?
● Encoder-Decoder (sequence-to-sequence) architectures
[Sutskever et al., 2014; Bahdanau et al., 2014; Cho et al., 2014]
Advanced sequence-to-sequence LSTM
Parsing with sequence-to-sequence LSTMs
(1) Represent the tree as a sequence.
(2) Generate data and train a sequence-to-sequence LSTM model.
(3) Results: 92.8 F1 score vs 92.4 previous best [Vinyals & Kaiser et al., 2014]
Language modeling with LSTMs
Language model performance is measured in perplexity (lower is better).
● Kneser-Ney 5-gram: 67.6 [Chelba et al.,
2013]
● RNN-1024 + 9-gram: 51.3 [Chelba et al.,
2013]
● LSTM-512-512: 54.1
[Józefowicz et al., 2016]
● 2-layer LSTM-8192-1024: 30.6 [Józefowicz et al.,
2016]
Language modeling with LSTMs: Examples
Raw (not hand-selected) sampled sentences: [Józefowicz et al.,
2016]
About 800 people gathered at Hever Castle on Long Beach from noon to 2pm ,
three to four times that of the funeral cortege .
It is now known that coffee and cacao products can do no harm on the body .
Yuri Zhirkov was in attendance at the Stamford Bridge at the start of the second
half but neither Drogba nor Malouda was able to push on through the Barcelona
Sentence compression with LSTMs
Example:
Input: State Sen. Stewart Greenleaf discusses his proposed human
trafficking bill at Calvery Baptist Church in Willow Grove Thursday night.
Output: Stewart Greenleaf discusses his human trafficking bill.
Results: readability
informativeness
MIRA (previous best): 4.31 3.55
LSTM [Filippova et al., 2015]: 4.51 3.78
Translation with LSTMs
Translation performance is measured in BLEU scores (higher is better, EnDe):
● Phrase-Based MT: 20.7 [Durrani et al., 2014]
● Early LSTM model: 19.4 [Sébastien et al., 2015]
● DeepAtt (large LSTM): 20.6 [Zhou et al., 2016]
● GNMT (large LSTM): 24.9 [Wu et al., 2016]
● GNMT+MoE: 26.0 [Shazeer &
Mirhoseini et al., 2016]
Again, model size and tuning seem to be the decisive factor.
Translation with LSTMs: Examples
German:
Probleme kann man niemals mit derselben Denkweise lösen, durch die sie
entstanden sind.
PBMT Translate: GNMT Translate:
No problem can be solved from Problems can never be
solved
the same consciousness that with the same way of
thinking
Translation with LSTMs: How good is it?
PBMT GNMT Human Relative improvement
English → Spanish 4.885 5.428 5.504 87%
English → French 4.932 5.295 5.496 64%
English → Chinese 4.035 4.594 4.987 58%
Spanish → English 4.872 5.187 5.372 63%
French → English 5.046 5.343 5.404 83%
Chinese → English 3.694 4.263 4.636 60%
Google Translate production data, median score by human evaluation on the scale 0-6. [Wu et al.,
‘16]
What are the problems with LSTMs?
Speed is a problem for sequence-to-sequence LSTM models.
● These models are large, we need more computing power.
Solution: new hardware trading less precision for more computing.
● These model are sequential, can be slow even at small sizes.
Solution: new, more parallel models (Neural GPU, ByteNet).
Sequence-to-sequence LSTMs require a lot of data.
● Attention and other new architectures increase data efficiency.
● Use regularizers like dropout, confidence penalty and layer normalization.
Thank You
- Deep learning has profoundly
changed the field of NLP.
- Sequence-to-sequence LSTMs
yield state-of-the-art results on
many NLP tasks.
- Google Translate uses LSTMs in
production. Big improvements in
translation quality.
- New models are addressing
problems with LSTMs.

More Related Content

What's hot

Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
Mustafa Jarrar
 

What's hot (20)

End-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue SystemsEnd-to-End Task-Completion Neural Dialogue Systems
End-to-End Task-Completion Neural Dialogue Systems
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Recent Advances in NLP
  Recent Advances in NLP  Recent Advances in NLP
Recent Advances in NLP
 
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information ...
 
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Lan...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Deep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
 
Deep Learning for Dialogue Systems
Deep Learning for Dialogue SystemsDeep Learning for Dialogue Systems
Deep Learning for Dialogue Systems
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI missionIlya Sutskever at AI Frontiers : Progress towards the OpenAI mission
Ilya Sutskever at AI Frontiers : Progress towards the OpenAI mission
 

Similar to Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
AtulKumarUpadhyay4
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
 

Similar to Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP (20)

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptxNeurIPS_2018_ConvAI2_ParticipantSlides.pptx
NeurIPS_2018_ConvAI2_ParticipantSlides.pptx
 
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems[KDD 2018 tutorial] End to-end goal-oriented question answering systems
[KDD 2018 tutorial] End to-end goal-oriented question answering systems
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
The NLP Muppets revolution!
The NLP Muppets revolution!The NLP Muppets revolution!
The NLP Muppets revolution!
 
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
SLSP 2017 presentation - Attentional Parallel RNNs for Generating Punctuation...
 
Conversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep LearningConversational Agents in Portuguese: A Study Using Deep Learning
Conversational Agents in Portuguese: A Study Using Deep Learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group
 
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptxEXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
EXPLORING NATURAL LANGUAGE PROCESSING (1).pptx
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Introduction to natural language processing, history and origin
Introduction to natural language processing, history and originIntroduction to natural language processing, history and origin
Introduction to natural language processing, history and origin
 
Hi I am Ram.pptx
Hi I am Ram.pptxHi I am Ram.pptx
Hi I am Ram.pptx
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation NLG, Training, Inference & Evaluation
NLG, Training, Inference & Evaluation
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Seq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese LanguageSeq2seq Model to Tokenize the Chinese Language
Seq2seq Model to Tokenize the Chinese Language
 

More from AI Frontiers

Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the Sky
AI Frontiers
 

More from AI Frontiers (20)

Divya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video SummarizationDivya Jain at AI Frontiers : Video Summarization
Divya Jain at AI Frontiers : Video Summarization
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 1: Heuristi...
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-lecture 2: Incremen...
 
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural NetworksTraining at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
Training at AI Frontiers 2018 - Udacity: Enhancing NLP with Deep Neural Networks
 
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
Training at AI Frontiers 2018 - LaiOffer Self-Driving-Car-Lecture 3: Any-Angl...
 
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine LearningPercy Liang at AI Frontiers : Pushing the Limits of Machine Learning
Percy Liang at AI Frontiers : Pushing the Limits of Machine Learning
 
Mark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber ElevateMark Moore at AI Frontiers : Uber Elevate
Mark Moore at AI Frontiers : Uber Elevate
 
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
Mario Munich at AI Frontiers : Consumer robotics: embedding affordable AI in ...
 
Arnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the SkyArnaud Thiercelin at AI Frontiers : AI in the Sky
Arnaud Thiercelin at AI Frontiers : AI in the Sky
 
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
 
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
Wei Xu at AI Frontiers : Language Learning in an Interactive and Embodied Set...
 
Sumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for EnterpriseSumit Gupta at AI Frontiers : AI for Enterprise
Sumit Gupta at AI Frontiers : AI for Enterprise
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in HealthcareAlex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
Alex Ermolaev at AI Frontiers : Major Applications of AI in Healthcare
 
Long Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in GamingLong Lin at AI Frontiers : AI in Gaming
Long Lin at AI Frontiers : AI in Gaming
 
Melissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & FinanceMelissa Goldman at AI Frontiers : AI & Finance
Melissa Goldman at AI Frontiers : AI & Finance
 
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
Li Deng at AI Frontiers : From Modeling Speech/Language to Modeling Financial...
 
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic ProblemsAshok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
Ashok Srivastava at AI Frontiers : Using AI to Solve Complex Economic Problems
 
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
Rohit Tripathi at AI Frontiers : Using intelligent connectivity and AI to tra...
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 

Lukasz Kaiser at AI Frontiers: How Deep Learning Quietly Revolutionized NLP

  • 1. How Deep Learning Quietly Revolutionized NLP Łukasz Kaiser, Google Brain
  • 2. What NLP tasks are we talking about? ● Part Of Speech Tagging Assign part-of-speech to each word. ● Parsing Create a grammar tree given a sentence. ● Named Entity Recognition Recognize people, places, etc. in a sentence. ● Language Modeling Generate natural sentences. ● Translation Translate a sentence into another language. ● Sentence Compression Remove words to summarize a sentence. ● Abstractive Summarization Summarize a paragraph in new words.
  • 3. Can deep learning solve these tasks? ● Inputs and outputs have variable size, how can neural networks handle it? ● Recurrent Neural Networks can do it, but how do we train them? ● Long Short-Term Memory [Hochreiter et al., 1997], but how to compose it? ● Encoder-Decoder (sequence-to-sequence) architectures [Sutskever et al., 2014; Bahdanau et al., 2014; Cho et al., 2014]
  • 5. Parsing with sequence-to-sequence LSTMs (1) Represent the tree as a sequence. (2) Generate data and train a sequence-to-sequence LSTM model. (3) Results: 92.8 F1 score vs 92.4 previous best [Vinyals & Kaiser et al., 2014]
  • 6. Language modeling with LSTMs Language model performance is measured in perplexity (lower is better). ● Kneser-Ney 5-gram: 67.6 [Chelba et al., 2013] ● RNN-1024 + 9-gram: 51.3 [Chelba et al., 2013] ● LSTM-512-512: 54.1 [Józefowicz et al., 2016] ● 2-layer LSTM-8192-1024: 30.6 [Józefowicz et al., 2016]
  • 7. Language modeling with LSTMs: Examples Raw (not hand-selected) sampled sentences: [Józefowicz et al., 2016] About 800 people gathered at Hever Castle on Long Beach from noon to 2pm , three to four times that of the funeral cortege . It is now known that coffee and cacao products can do no harm on the body . Yuri Zhirkov was in attendance at the Stamford Bridge at the start of the second half but neither Drogba nor Malouda was able to push on through the Barcelona
  • 8. Sentence compression with LSTMs Example: Input: State Sen. Stewart Greenleaf discusses his proposed human trafficking bill at Calvery Baptist Church in Willow Grove Thursday night. Output: Stewart Greenleaf discusses his human trafficking bill. Results: readability informativeness MIRA (previous best): 4.31 3.55 LSTM [Filippova et al., 2015]: 4.51 3.78
  • 9. Translation with LSTMs Translation performance is measured in BLEU scores (higher is better, EnDe): ● Phrase-Based MT: 20.7 [Durrani et al., 2014] ● Early LSTM model: 19.4 [Sébastien et al., 2015] ● DeepAtt (large LSTM): 20.6 [Zhou et al., 2016] ● GNMT (large LSTM): 24.9 [Wu et al., 2016] ● GNMT+MoE: 26.0 [Shazeer & Mirhoseini et al., 2016] Again, model size and tuning seem to be the decisive factor.
  • 10. Translation with LSTMs: Examples German: Probleme kann man niemals mit derselben Denkweise lösen, durch die sie entstanden sind. PBMT Translate: GNMT Translate: No problem can be solved from Problems can never be solved the same consciousness that with the same way of thinking
  • 11. Translation with LSTMs: How good is it? PBMT GNMT Human Relative improvement English → Spanish 4.885 5.428 5.504 87% English → French 4.932 5.295 5.496 64% English → Chinese 4.035 4.594 4.987 58% Spanish → English 4.872 5.187 5.372 63% French → English 5.046 5.343 5.404 83% Chinese → English 3.694 4.263 4.636 60% Google Translate production data, median score by human evaluation on the scale 0-6. [Wu et al., ‘16]
  • 12. What are the problems with LSTMs? Speed is a problem for sequence-to-sequence LSTM models. ● These models are large, we need more computing power. Solution: new hardware trading less precision for more computing. ● These model are sequential, can be slow even at small sizes. Solution: new, more parallel models (Neural GPU, ByteNet). Sequence-to-sequence LSTMs require a lot of data. ● Attention and other new architectures increase data efficiency. ● Use regularizers like dropout, confidence penalty and layer normalization.
  • 13. Thank You - Deep learning has profoundly changed the field of NLP. - Sequence-to-sequence LSTMs yield state-of-the-art results on many NLP tasks. - Google Translate uses LSTMs in production. Big improvements in translation quality. - New models are addressing problems with LSTMs.