SlideShare a Scribd company logo
Successes and Challenges
Emily Pitler, Google AI
Representations
from Natural
Language Data
State-of-the-art in Natural Language Understanding in 2017
→ → Custom Recurrent Architectures
P 2
Oct. 2018: One Model with Task-specific Tuning in Minutes
P 3
BERT: Bidirectional Encoder Representations from Transformers
P 4
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Transformers: Attention is All You Need
P 5Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, NIPS 2017
From One-Hot Vectors to Word Embeddings &
Self-Attention
P 6
animal...street...it
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
From One-Hot Vectors to Word Embeddings &
Self-Attention
P 7
animal...street...it
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
query, key, value
From One-Hot Vectors to Word Embeddings &
Self-Attention
P 8
animal...street...it
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
query, key, value
From One-Hot Vectors to Word Embeddings &
Self-Attention
P 9
animal...street...it
0000…10001…01000…0
one-hot
0.1
0.2
0.7
(self-)attention
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
query, key, value
From One-Hot Vectors to Word Embeddings &
Self-Attention
P 10
animal...street...it
0000…10001…01000…0
one-hot
0.1
0.2
0.7
(self-)attention
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
OpenAI: Generative Pretraining
The animal tired Acceptable
<s> the … too <s> the … tired
P 11
Transformer Transformer Transformer
Transformer TransformerTransformer
Transformer Transformer Transformer
Transformer TransformerTransformer
Understanding Can Need “Future” Information
How far is Jacksonville from Miami?
Jacksonville is in the First Coast region of northeast Florida and is centered on the
banks of the St. Johns River, about 25 miles (40 km) south of the Georgia state line
and about 340 miles (550 km) north of Miami.
VERB NOUN
Mark which area you want to distress. Mark, which area do you want to distress?
P 12
Naive Bidirectionality: Words Can “See Themselves”
The animal tired The animal tired
<s> the … too <s> the … too
P 13
Transformer Transformer Transformer
Transformer TransformerTransformer
Transformer Transformer Transformer
Transformer TransformerTransformer
Training BERT
Masked Language Model (Fill-in-the-blank)
Deep learning (also [MASK] [MASK] deep structured learning or [MASK]
learning) is part of a broader family of machine learning methods
[MASK] on [MASK] data representations, as opposed to task-specific
algorithms.
[MASK] is allergic to peaches. Is
P 14https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Daniel_Tiger%27s_Neighborhood
BooksCorpus: Zhu, Kiros, Zemel, Salakhutdinov, Urtasun, Torralba, Fidler, CVPR 2015
Basic BERT Recipe
P 15
Basic BERT Recipe
P 16
Basic BERT Recipe
P 17
SWAG: Zellers, Bisk, Schwartz, Choi, EMNLP 2018 SQuAD: Rajpurkar, Zhang, Lopyrev, Liang, EMNLP 2016
Results: Commonsense Reasoning and Question Answering
P 18
MRPC: Dolan and Brockett, IWP 2005
Pretraining Tasks Matter...and Bigger = Better *
P 19
Do I Need Full BERT Models for All My Tasks?
P 20Houlsby, Giurgiu, Jastrzebski, Morrone, de Laroussilhe, Gesmundo, Attariyan, Gelly, arxiv Feb 2019
Try It Out, Get Faster Training with TPUs
P 21
Mismatches between Training and
Realistic Inputs
Two Case Studies: Mixed Language Text and Identifying Commands
P 22
P 23
Multiple Languages: Frequent “In-the-Wild”, Rare in Training
P 24
Multiple Languages: Frequent “In-the-Wild”, Rare in Training
“A Fast, Compact, Accurate Model for Language Identification
of Codemixed Text”
Zhang, Riesa , Gillick , Bakalov, Baldridge, Weiss, EMNLP 2018 P 25
Accuracy and Speed of Token-level Language Id
P 26
Accuracy and Speed of Token-level Language Id
P 27
Useful Preprocessing Step Across Tasks
P 28
Mismatches between Training and
Realistic Inputs
Two Case Studies: Mixed Language Text and Identifying Commands
P 29
Noun-Verb Ambiguity
“lives” / Noun → /laIvz/
“lives” / Verb → /lIvz/
flies
NOUN
Mark VERB
P 30Elkahky, Webster, Andor, Pitler, EMNLP 2018
Certain insects can damage plumerias, such as mites, flies, or aphids. NOUN
Mark which area you want to distress. VERB
P 31
“A Challenge Set and Methods for Noun-Verb Ambiguity”,
EMNLP 2018
Accuracy on Noun-Verb Disambiguation
P 32
Pronunciation of Homographs Accuracy
P 33
Mark VERB
Webster, Recasens, Axelrod, Baldridge, TACL 2019 Kwiatkowski, Palomaki, Redfield, Collins, Parikh, Alberti, Epstein, Polosukhin, Kelcey, Devlin, Lee, Toutanova, Jones, Chang, Dai, Uszkoreit, Le, Petrov, TACL 2019
Released Datasets with “In-the-Wild” Natural Challenges
P 34
Summary
P 35

More Related Content

Similar to Emily Pitler - Representations from Natural Language Data: Successes and Challenges

ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
RIILP
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Forward Gradient
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
Edgar Marca
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informaticsbutest
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
Egor Pushkin
 
Biemann ibm cog_comp_jan2015_noanim
Biemann ibm cog_comp_jan2015_noanimBiemann ibm cog_comp_jan2015_noanim
Biemann ibm cog_comp_jan2015_noanim
diannepatricia
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
Dafydd Gibbon
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
Valeria de Paiva
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
Zoltan Varju
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Bhaskar Mitra
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
Mahsadelavari
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Apache OpenNLP
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
Arthur Charpentier
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
fgodin
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
Assessment of english language learners final
Assessment of english language learners finalAssessment of english language learners final
Assessment of english language learners finalcswstyle
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part I
Delip Rao
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
Lidia Pivovarova
 

Similar to Emily Pitler - Representations from Natural Language Data: Successes and Challenges (20)

ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical InformaticsMEBI 591C/598 – Data and Text Mining in Biomedical Informatics
MEBI 591C/598 – Data and Text Mining in Biomedical Informatics
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 
Biemann ibm cog_comp_jan2015_noanim
Biemann ibm cog_comp_jan2015_noanimBiemann ibm cog_comp_jan2015_noanim
Biemann ibm cog_comp_jan2015_noanim
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 
Parameter setting
Parameter settingParameter setting
Parameter setting
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Assessment of english language learners final
Assessment of english language learners finalAssessment of english language learners final
Assessment of english language learners final
 
NLP in Practice - Part I
NLP in Practice - Part INLP in Practice - Part I
NLP in Practice - Part I
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Emily Pitler - Representations from Natural Language Data: Successes and Challenges

  • 1. Successes and Challenges Emily Pitler, Google AI Representations from Natural Language Data
  • 2. State-of-the-art in Natural Language Understanding in 2017 → → Custom Recurrent Architectures P 2
  • 3. Oct. 2018: One Model with Task-specific Tuning in Minutes P 3
  • 4. BERT: Bidirectional Encoder Representations from Transformers P 4
  • 5. https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html Transformers: Attention is All You Need P 5Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, NIPS 2017
  • 6. From One-Hot Vectors to Word Embeddings & Self-Attention P 6 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  • 7. From One-Hot Vectors to Word Embeddings & Self-Attention P 7 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  • 8. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 8 animal...street...it 0000…10001…01000…0 one-hot 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  • 9. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 9 animal...street...it 0000…10001…01000…0 one-hot 0.1 0.2 0.7 (self-)attention 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  • 10. query, key, value From One-Hot Vectors to Word Embeddings & Self-Attention P 10 animal...street...it 0000…10001…01000…0 one-hot 0.1 0.2 0.7 (self-)attention 1.4…3.74.9…6.42.5…8.0 embedding The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT
  • 11. OpenAI: Generative Pretraining The animal tired Acceptable <s> the … too <s> the … tired P 11 Transformer Transformer Transformer Transformer TransformerTransformer Transformer Transformer Transformer Transformer TransformerTransformer
  • 12. Understanding Can Need “Future” Information How far is Jacksonville from Miami? Jacksonville is in the First Coast region of northeast Florida and is centered on the banks of the St. Johns River, about 25 miles (40 km) south of the Georgia state line and about 340 miles (550 km) north of Miami. VERB NOUN Mark which area you want to distress. Mark, which area do you want to distress? P 12
  • 13. Naive Bidirectionality: Words Can “See Themselves” The animal tired The animal tired <s> the … too <s> the … too P 13 Transformer Transformer Transformer Transformer TransformerTransformer Transformer Transformer Transformer Transformer TransformerTransformer
  • 14. Training BERT Masked Language Model (Fill-in-the-blank) Deep learning (also [MASK] [MASK] deep structured learning or [MASK] learning) is part of a broader family of machine learning methods [MASK] on [MASK] data representations, as opposed to task-specific algorithms. [MASK] is allergic to peaches. Is P 14https://en.wikipedia.org/wiki/Deep_learning https://en.wikipedia.org/wiki/Daniel_Tiger%27s_Neighborhood BooksCorpus: Zhu, Kiros, Zemel, Salakhutdinov, Urtasun, Torralba, Fidler, CVPR 2015
  • 18. SWAG: Zellers, Bisk, Schwartz, Choi, EMNLP 2018 SQuAD: Rajpurkar, Zhang, Lopyrev, Liang, EMNLP 2016 Results: Commonsense Reasoning and Question Answering P 18
  • 19. MRPC: Dolan and Brockett, IWP 2005 Pretraining Tasks Matter...and Bigger = Better * P 19
  • 20. Do I Need Full BERT Models for All My Tasks? P 20Houlsby, Giurgiu, Jastrzebski, Morrone, de Laroussilhe, Gesmundo, Attariyan, Gelly, arxiv Feb 2019
  • 21. Try It Out, Get Faster Training with TPUs P 21
  • 22. Mismatches between Training and Realistic Inputs Two Case Studies: Mixed Language Text and Identifying Commands P 22
  • 23. P 23 Multiple Languages: Frequent “In-the-Wild”, Rare in Training
  • 24. P 24 Multiple Languages: Frequent “In-the-Wild”, Rare in Training
  • 25. “A Fast, Compact, Accurate Model for Language Identification of Codemixed Text” Zhang, Riesa , Gillick , Bakalov, Baldridge, Weiss, EMNLP 2018 P 25
  • 26. Accuracy and Speed of Token-level Language Id P 26
  • 27. Accuracy and Speed of Token-level Language Id P 27
  • 28. Useful Preprocessing Step Across Tasks P 28
  • 29. Mismatches between Training and Realistic Inputs Two Case Studies: Mixed Language Text and Identifying Commands P 29
  • 30. Noun-Verb Ambiguity “lives” / Noun → /laIvz/ “lives” / Verb → /lIvz/ flies NOUN Mark VERB P 30Elkahky, Webster, Andor, Pitler, EMNLP 2018
  • 31. Certain insects can damage plumerias, such as mites, flies, or aphids. NOUN Mark which area you want to distress. VERB P 31 “A Challenge Set and Methods for Noun-Verb Ambiguity”, EMNLP 2018
  • 32. Accuracy on Noun-Verb Disambiguation P 32
  • 34. Mark VERB Webster, Recasens, Axelrod, Baldridge, TACL 2019 Kwiatkowski, Palomaki, Redfield, Collins, Parikh, Alberti, Epstein, Polosukhin, Kelcey, Devlin, Lee, Toutanova, Jones, Chang, Dai, Uszkoreit, Le, Petrov, TACL 2019 Released Datasets with “In-the-Wild” Natural Challenges P 34