SlideShare a Scribd company logo
Moshe Wasserblat
Intel AI Lab
NLP MeetUp, Aug. 2020
BIO
2
● NICE Systems
● Led Speech & Text Analytics research group
● First company to productize Speech2Text, ED, Voice Biometric in Call-Center
● INTEL
● Innovate for our products
● Collaborate with top academic
● Explore compute features that disrupt our HW
AGENDA
3
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop in EMNLP Nov. 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
1980-2018 1980-2019
The advantages of BERT
1. Efficient transfer learning
Leverage a large model that was pre-trained for a generic task using
a large amount of data for specific task using small amount of data.
high accuracy with smaller amount of data
2. Context embeddings.
Produces vectors that represent each word in a context of a
sentence. E.g. bank in “river bank” vs. “investment bank”
5
Task Specific
Classifier
Context embeddings
Input sentence
Task output
12/24 stacked layers of transformer encoder
(110/330M parameters)
6
Pre-trained LMs have become extremely
large and deep
Pre-trained LMs have become extremely
large and deep
T5
11b
2.5
5
7.5
10
12.5
15
#par
b
Source: HuggingFace
7
• Heavy computation
• Large memory footprint
• Hard to train/fine-tune
• Hard to deploy
How should we put these monsters in production?
0
20
40
60
80
100
120
8
BERT
aLBERT
Year 2020: from accuracy to efficiency
MobileBERT
DistilBERT
TinyBERT
#par
M
Vectors for optimization
9
•Quantization of weights to int8 or other lower precision representation
•Pruning of weights and structural (complete layers, self-attention heads)
•Early prediction of samples by using predictors attached to shallow layers
•Sharing weights of self-attention and FFs modules across all model
blocks
•Training smaller models using Distillation and other novel techniques
•Replacing Transformer modules and searching for best architecture using
Neural Architecture Search
Quantization
10
•Quantization of BERT models to 16/8-bit weights
4x compression, minimal loss in accuracy
We Scaled Bert To Serve
1+ Billion Daily Requests
on CPUs
Pruning
11
It is possible, for some tasks, to prune up to 9 of the
top layers from a 12 layer model without degrading
the performance more than 3%.
Poor Man's BERT: Smaller and Faster Transformer Models
Distillation
12
teacher
Small BERT
student
Loss
TinyBERT
MobileBERT
DistilBERT
hard labels
probability/logit
embeddings
attentions
Naïve approach (Thieves on Sesame street, Krishna et al. ICLR20)
13
FF
Classifier
for fine
tuning
“Mulan is highly
recommended”
“The movie was
good as the book”
teacher
student
pseudo labels
annotated
labels
Unlabeled
examples
Labeled
examples
Task
Loss
Sent: POS
Sent: POS
*Distillation- mimic the output teacher probability
14
FF
Classifier
for fine
tuning
teacher
Unlabeled
examples
Distill
Loss
**mse
• Surprisingly work well
• Great for low resource tasks
Total
Loss
Task
Loss
student
*Hinton et al.**Tang et al.
BERT
2 BERT
15
Note: performance is
cited from the original
paper
Can we do more?
16
LSTM/CNN
>100x
Or CBOW
>1000x
19
Real use-case example
• Named Entity Recognition (NER) is a widely used Information Extraction task in
many industrial applications and use cases
• Ramping up on a new domain can be difficult
§ Lots of unlabeled data, little of no labeled data and often not good enough for
training a model with good performance
Solution A
? Hire a linguist or data scientist to tune/build model
? Hire annotators to label more data or buy similar dataset
? Time/compute resource limitations
Solution B
? Pre-trained Language Models such as BERT, GPT, ELMo are great at low-
resource scenarios
? Require great compute and memory resources and suffer from high latency in
inference
? Deploying such models in production or on edge devices is a major issue This Photo by Unknown Author is licensed under CC BY
20
65
70
75
80
85
90
95
150 300 750 3000
Accuracy
#samples
Name Entity Recognition (CoNLL-2003)
BERT Distil LSTM Distil ID-CNN
Compression Rate x1 x36 x36
•Train a small LSTM/CNN
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher
Peter et al. NeurIPS19
21
78
80
82
84
86
88
90
92
94
Agnews 0.4K
samples
Dair's Emotions
16K samples
IMDB 1K samples STS-2 7K samples
Accuracy Text Classification
BERT Distill LSTM Distill CBOW
Compression Rate x1 x100 x1500
•Train a small CBOW
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher in specific
dataset
Wasserblat, more details coming soon
22
Takeaways
• Compact models perform equally well as pre-trained LM in low-resource
scenarios, and with superior inference speed and with high compression rate
• Practical Tips:
• Set simpler classifier as baseline
• Finetune DistillBERT/BERT on your task
• High resource for labeled data:
Go with DistillBERT or other compact pre-trained models
• Low resources for labeled data:
Distill BERT to simpler NN and compare to BERT
23
•Data and training efficiency: models requiring less training data and/or less computational
resources and/or time;
•Inference efficiency: models with lower comp. complexity of prediction/inference
https://sites.google.com/view/sustainlp2020
AGENDA
24
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
The NLP today
25
● Create a model to individual task and domain
● Need a large team of domain experts, large amount of labeled data
and very time consuming
● Hard to scale and adapt solutions across different domains
● No adaptation to business environment
26
ABSAexampleandusage
the owner is super friendly and service is fastthe owner is super friendly and service is fastfriendly fast
ASP ASPopinion opinion
TheAdvantagesofthealgo.Advantage
Aspect Based SA Producing knowledge regarding specific aspects hence enables
to gain targeted business insight.
Unsupervised -
Domain Adaptive
Unsupervised method - does not require costly manually
tagged data for training
Explainable AI Displaying the relation between opinion terms and aspects
enables the interpretability of the results
• ABSA recommended amongst Top 10 ML Code Examples on Azure
and Included by MSFT in their NLP Recipes
• Published in EMNLP19
• ABSA used by University of British Columbia and the British Columbia CDC to
analyze COVID-19 related tweets in North America. See Jang et al, 2020.
Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

More Related Content

Similar to Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
indico data
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
BigML, Inc
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
Saad Elbeleidy
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Ron Bodkin
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
Sudarsun Santhiappan
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
Florian Wilhelm
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
Vivian S. Zhang
 
Domain Driven Design Introduction
Domain Driven Design IntroductionDomain Driven Design Introduction
Domain Driven Design Introduction
wojtek_s
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Alexey Rybakov
 
Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.
Amin Shahnazari
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
IRJET Journal
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
freshdatabos
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
Owen Zhang
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
Safe Software
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
BigML, Inc
 
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
Edge AI and Vision Alliance
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Software Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeySoftware Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit Dey
Cefalo
 

Similar to Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI (20)

USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
ODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLPODSC East: Effective Transfer Learning for NLP
ODSC East: Effective Transfer Learning for NLP
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017Enterprise deep learning lessons bodkin o reilly ai sf 2017
Enterprise deep learning lessons bodkin o reilly ai sf 2017
 
Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 
Domain Driven Design Introduction
Domain Driven Design IntroductionDomain Driven Design Introduction
Domain Driven Design Introduction
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.Refactoring, Therapeutic Attitude to Programming.
Refactoring, Therapeutic Attitude to Programming.
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
How to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR DataHow to Get the Most Out of LiDAR Data
How to Get the Most Out of LiDAR Data
 
VSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and DeepnetsVSSML17 L6. Time Series and Deepnets
VSSML17 L6. Time Series and Deepnets
 
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Software Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit DeySoftware Design Principles and Best Practices - Satyajit Dey
Software Design Principles and Best Practices - Satyajit Dey
 

More from Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Seth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
Seth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
Seth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Seth Grimes
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
Seth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
Seth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
Seth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
Seth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
Seth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
Seth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
Seth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
Seth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
Seth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
Seth Grimes
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
Seth Grimes
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
Seth Grimes
 

More from Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI

  • 1. Moshe Wasserblat Intel AI Lab NLP MeetUp, Aug. 2020
  • 2. BIO 2 ● NICE Systems ● Led Speech & Text Analytics research group ● First company to productize Speech2Text, ED, Voice Biometric in Call-Center ● INTEL ● Innovate for our products ● Collaborate with top academic ● Explore compute features that disrupt our HW
  • 3. AGENDA 3 ● Efficiency ● Large model intro ● Inference efficiency: models with lower comp. complexity ● Examples ● SustiaNLP Workshop in EMNLP Nov. 2020 ● Data challenges ● Extensibility: address new domain with limited data and minimal supervision ● Weakly-supervised ABSA example
  • 5. The advantages of BERT 1. Efficient transfer learning Leverage a large model that was pre-trained for a generic task using a large amount of data for specific task using small amount of data. high accuracy with smaller amount of data 2. Context embeddings. Produces vectors that represent each word in a context of a sentence. E.g. bank in “river bank” vs. “investment bank” 5 Task Specific Classifier Context embeddings Input sentence Task output 12/24 stacked layers of transformer encoder (110/330M parameters)
  • 6. 6 Pre-trained LMs have become extremely large and deep Pre-trained LMs have become extremely large and deep T5 11b 2.5 5 7.5 10 12.5 15 #par b Source: HuggingFace
  • 7. 7 • Heavy computation • Large memory footprint • Hard to train/fine-tune • Hard to deploy How should we put these monsters in production?
  • 8. 0 20 40 60 80 100 120 8 BERT aLBERT Year 2020: from accuracy to efficiency MobileBERT DistilBERT TinyBERT #par M
  • 9. Vectors for optimization 9 •Quantization of weights to int8 or other lower precision representation •Pruning of weights and structural (complete layers, self-attention heads) •Early prediction of samples by using predictors attached to shallow layers •Sharing weights of self-attention and FFs modules across all model blocks •Training smaller models using Distillation and other novel techniques •Replacing Transformer modules and searching for best architecture using Neural Architecture Search
  • 10. Quantization 10 •Quantization of BERT models to 16/8-bit weights 4x compression, minimal loss in accuracy We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs
  • 11. Pruning 11 It is possible, for some tasks, to prune up to 9 of the top layers from a 12 layer model without degrading the performance more than 3%. Poor Man's BERT: Smaller and Faster Transformer Models
  • 13. Naïve approach (Thieves on Sesame street, Krishna et al. ICLR20) 13 FF Classifier for fine tuning “Mulan is highly recommended” “The movie was good as the book” teacher student pseudo labels annotated labels Unlabeled examples Labeled examples Task Loss Sent: POS Sent: POS
  • 14. *Distillation- mimic the output teacher probability 14 FF Classifier for fine tuning teacher Unlabeled examples Distill Loss **mse • Surprisingly work well • Great for low resource tasks Total Loss Task Loss student *Hinton et al.**Tang et al.
  • 15. BERT 2 BERT 15 Note: performance is cited from the original paper
  • 16. Can we do more? 16 LSTM/CNN >100x Or CBOW >1000x
  • 17. 19 Real use-case example • Named Entity Recognition (NER) is a widely used Information Extraction task in many industrial applications and use cases • Ramping up on a new domain can be difficult § Lots of unlabeled data, little of no labeled data and often not good enough for training a model with good performance Solution A ? Hire a linguist or data scientist to tune/build model ? Hire annotators to label more data or buy similar dataset ? Time/compute resource limitations Solution B ? Pre-trained Language Models such as BERT, GPT, ELMo are great at low- resource scenarios ? Require great compute and memory resources and suffer from high latency in inference ? Deploying such models in production or on edge devices is a major issue This Photo by Unknown Author is licensed under CC BY
  • 18. 20 65 70 75 80 85 90 95 150 300 750 3000 Accuracy #samples Name Entity Recognition (CoNLL-2003) BERT Distil LSTM Distil ID-CNN Compression Rate x1 x36 x36 •Train a small LSTM/CNN model using BERT •Utilizing unlabeled data via Teacher •Student competitive with Teacher Peter et al. NeurIPS19
  • 19. 21 78 80 82 84 86 88 90 92 94 Agnews 0.4K samples Dair's Emotions 16K samples IMDB 1K samples STS-2 7K samples Accuracy Text Classification BERT Distill LSTM Distill CBOW Compression Rate x1 x100 x1500 •Train a small CBOW model using BERT •Utilizing unlabeled data via Teacher •Student competitive with Teacher in specific dataset Wasserblat, more details coming soon
  • 20. 22 Takeaways • Compact models perform equally well as pre-trained LM in low-resource scenarios, and with superior inference speed and with high compression rate • Practical Tips: • Set simpler classifier as baseline • Finetune DistillBERT/BERT on your task • High resource for labeled data: Go with DistillBERT or other compact pre-trained models • Low resources for labeled data: Distill BERT to simpler NN and compare to BERT
  • 21. 23 •Data and training efficiency: models requiring less training data and/or less computational resources and/or time; •Inference efficiency: models with lower comp. complexity of prediction/inference https://sites.google.com/view/sustainlp2020
  • 22. AGENDA 24 ● Efficiency ● Large model intro ● Inference efficiency: models with lower comp. complexity ● Examples ● SustiaNLP Workshop 2020 ● Data challenges ● Extensibility: address new domain with limited data and minimal supervision ● Weakly-supervised ABSA example
  • 23. The NLP today 25 ● Create a model to individual task and domain ● Need a large team of domain experts, large amount of labeled data and very time consuming ● Hard to scale and adapt solutions across different domains ● No adaptation to business environment
  • 24. 26 ABSAexampleandusage the owner is super friendly and service is fastthe owner is super friendly and service is fastfriendly fast ASP ASPopinion opinion
  • 25. TheAdvantagesofthealgo.Advantage Aspect Based SA Producing knowledge regarding specific aspects hence enables to gain targeted business insight. Unsupervised - Domain Adaptive Unsupervised method - does not require costly manually tagged data for training Explainable AI Displaying the relation between opinion terms and aspects enables the interpretability of the results • ABSA recommended amongst Top 10 ML Code Examples on Azure and Included by MSFT in their NLP Recipes • Published in EMNLP19 • ABSA used by University of British Columbia and the British Columbia CDC to analyze COVID-19 related tweets in North America. See Jang et al, 2020.