Moshe Wasserblat, Intel AI, presents on Efficient Deep Learning in Natural Language Processing Production to an online NLP meetup audience, August 3, 2020. Visit https://www.meetup.com/NY-NLP for the New York NLP meetup.
Роман Нікітченко
Big Data solutions architect в компанії V.I.Tech. Спеціаліст з більш ніж 20-річним досвідом роботи в галузі телекому і вбудованих систем, що змінив індустрію на Java Enterprise. Завдяки попередньому досвіду за короткий термін став одним з провідних архітекторів Big Data в Україні.
When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application. But what changes if you step ahead of existing solutions to bring things like population health management?
Let's talk about our Big Data experience and meaningful framework usage:
- What makes the difference when you go Big Data and Hadoop.
- Frameworks and big data: hamsters vs hipsters.
- Reality matters. Frameworks cost. How much?
- What framework is good for you?
- Making your own frameworks.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Optimize your AI / Deep Learning models and pipelines.
Cut cost on infrastructure, deployment time and inference time.
Pruning, Compression, Retraining, Loss Targets, Quantization, TensorRT, Tensorflow
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
10 Limitations of Large Language Models and ways to overcome them. Dealing with hallucinations, performance,
costs, stale training data, injecting private data, token limits and contextual memory, text conversion, lack of
transparency, ethical concerns and training costs.
Роман Нікітченко
Big Data solutions architect в компанії V.I.Tech. Спеціаліст з більш ніж 20-річним досвідом роботи в галузі телекому і вбудованих систем, що змінив індустрію на Java Enterprise. Завдяки попередньому досвіду за короткий термін став одним з провідних архітекторів Big Data в Україні.
When your clients need only small database for personal music library and some kind of HTTP interface to it, everything looks nice and you can use lot of bright frameworks and trusted approaches for your application. But what changes if you step ahead of existing solutions to bring things like population health management?
Let's talk about our Big Data experience and meaningful framework usage:
- What makes the difference when you go Big Data and Hadoop.
- Frameworks and big data: hamsters vs hipsters.
- Reality matters. Frameworks cost. How much?
- What framework is good for you?
- Making your own frameworks.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Optimize your AI / Deep Learning models and pipelines.
Cut cost on infrastructure, deployment time and inference time.
Pruning, Compression, Retraining, Loss Targets, Quantization, TensorRT, Tensorflow
10 Limitations of Large Language Models and Mitigation OptionsMihai Criveti
10 Limitations of Large Language Models and ways to overcome them. Dealing with hallucinations, performance,
costs, stale training data, injecting private data, token limits and contextual memory, text conversion, lack of
transparency, ethical concerns and training costs.
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
Though insights from Big Data gives a breakthrough to make better business decision, it poses its own set of challenges. This paper addresses the gap of Variety problem and suggest a way to seamlessly handle data processing even if there is change in data type/processing algorithm. It explores the various map reduce design patterns and comes out with a unified working solution (library). The library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce saving lot of man hours and enforce good practices in code.
ODSC East: Effective Transfer Learning for NLPindico data
Presented by indico co-founder Madison May at ODSC East.
Abstract: Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning on text, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify benchmarking of transfer learning methods on a variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
When data size grows in terms of sample count, feature count and model parameter count, things go crazy. The slideshow presents an overview of what to expect and how to handle them.
In this presentation, you will be introduced to the concept of Integer Programming and its application in conference scheduling. We will delve into the fundamentals of Integer Programming and its practical utilization in optimizing the allocation of talks to specific time slots and rooms within a conference program. By the conclusion of the talk, attendees will gain a clearer comprehension of the potential of this powerful tool in creating a conference schedule that is both efficient and effective, ultimately maximizing attendee satisfaction. Whether you are involved in conference organization or simply curious about optimization algorithms, this presentation is tailored to meet your interests.
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
<featured> Meetup event hosted by NYC Open Data Meetup, NYC Data Science Academy. Speaker: Owen Zhang, Event Info: http://www.meetup.com/NYC-Open-Data/events/219370251/
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
Developing a Real-life DNN-based Embedded Vision Product
for Agriculture, Construction, Medical, or Retail.
What it takes to succeed in a real-life development of a DNN-based embedded vision product? You have your hardware and software building blocks – want’s next? Learn how to plan and design for deep learning, how to select and cascade algorithms, where to get the training data and how much is enough, and how to optimize and troubleshoot your product.
By now we very well know how to design and train a neural network to recognize cats, dogs and cars. But what about real projects — agriculture, construction, medical, retail? This how-to talk will provide an overview of what it takes to design, train, and fine-tune a real-life DNN-based embedded vision solution. Presentation will explore algorithmic, data set, training, and optimization decisions that take you from proofs-of-concepts to solid, reliable, and highly optimized systems. This material is based on our own successes, failures, and other lessons we learnt while implementing embedded vision solutions over the past few years.
Alexey Rybakov is Senior Director with Luxoft, and manages software R&D, consulting and optimization services in artificial intelligence, deep learning, computer vision, and video processing.
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos
Owen Zhang is no stranger to data science competitions. He has competed in and won several high profile challenges, and is currently ranked 1st out of a community of 200,000 data scientists on Kaggle. This is an opportunity to learn the tips, tricks and techniques Owen employs in building world-class predictive analytic solutions
Maximize the possibilities of your LiDAR data with FME. Through demos, you’ll learn how to extract the full value of point clouds by quickly processing and combining them with other data sources. We’ll also show you real-world examples using LiDAR for 3D city modelling & viewshed analysis, with specific takeaways that can be applied to your own data. Plus, find out how to integrate command-line programs like LAStools into your FME workflow.
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 6: Time Series and Deepnets. By Charles Parker (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/
For more information about edge AI and vision, please visit:
http://www.edge-ai-vision.com
Christine Cheng, co-chair of the inference benchmark working group at MLPerf and a senior machine learning optimization engineer at Intel, delivers the presentation “MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Cheng explains how MLPerf’s inference benchmark suite for evaluating processor performance works and is evolving.
Creating an AI Startup: What You Need to KnowSeth Grimes
Seth Grimes presented "Creating an AI Startup: What You Need to Know," at a May 20, 2021 Launch Annapolis + Maryland AI (https://www.meetup.com/MarylandAI) program, focusing on opportunity and resources for Maryland tech entrepreneurs.
More Related Content
Similar to Efficient Deep Learning in Natural Language Processing Production, with Moshe Wasserblat, Intel AI
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
Though insights from Big Data gives a breakthrough to make better business decision, it poses its own set of challenges. This paper addresses the gap of Variety problem and suggest a way to seamlessly handle data processing even if there is change in data type/processing algorithm. It explores the various map reduce design patterns and comes out with a unified working solution (library). The library has the potential to ‘adapt’ itself to any data processing need which can be achieved by Map Reduce saving lot of man hours and enforce good practices in code.
ODSC East: Effective Transfer Learning for NLPindico data
Presented by indico co-founder Madison May at ODSC East.
Abstract: Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. In this talk, we explore parameter and data efficient mechanisms for transfer learning on text, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify benchmarking of transfer learning methods on a variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.
https://telecombcn-dl.github.io/2018-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
When data size grows in terms of sample count, feature count and model parameter count, things go crazy. The slideshow presents an overview of what to expect and how to handle them.
In this presentation, you will be introduced to the concept of Integer Programming and its application in conference scheduling. We will delve into the fundamentals of Integer Programming and its practical utilization in optimizing the allocation of talks to specific time slots and rooms within a conference program. By the conclusion of the talk, attendees will gain a clearer comprehension of the potential of this powerful tool in creating a conference schedule that is both efficient and effective, ultimately maximizing attendee satisfaction. Whether you are involved in conference organization or simply curious about optimization algorithms, this presentation is tailored to meet your interests.
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
<featured> Meetup event hosted by NYC Open Data Meetup, NYC Data Science Academy. Speaker: Owen Zhang, Event Info: http://www.meetup.com/NYC-Open-Data/events/219370251/
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsAlexey Rybakov
Developing a Real-life DNN-based Embedded Vision Product
for Agriculture, Construction, Medical, or Retail.
What it takes to succeed in a real-life development of a DNN-based embedded vision product? You have your hardware and software building blocks – want’s next? Learn how to plan and design for deep learning, how to select and cascade algorithms, where to get the training data and how much is enough, and how to optimize and troubleshoot your product.
By now we very well know how to design and train a neural network to recognize cats, dogs and cars. But what about real projects — agriculture, construction, medical, retail? This how-to talk will provide an overview of what it takes to design, train, and fine-tune a real-life DNN-based embedded vision solution. Presentation will explore algorithmic, data set, training, and optimization decisions that take you from proofs-of-concepts to solid, reliable, and highly optimized systems. This material is based on our own successes, failures, and other lessons we learnt while implementing embedded vision solutions over the past few years.
Alexey Rybakov is Senior Director with Luxoft, and manages software R&D, consulting and optimization services in artificial intelligence, deep learning, computer vision, and video processing.
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festivalfreshdatabos
Owen Zhang is no stranger to data science competitions. He has competed in and won several high profile challenges, and is currently ranked 1st out of a community of 200,000 data scientists on Kaggle. This is an opportunity to learn the tips, tricks and techniques Owen employs in building world-class predictive analytic solutions
Maximize the possibilities of your LiDAR data with FME. Through demos, you’ll learn how to extract the full value of point clouds by quickly processing and combining them with other data sources. We’ll also show you real-world examples using LiDAR for 3D city modelling & viewshed analysis, with specific takeaways that can be applied to your own data. Plus, find out how to integrate command-line programs like LAStools into your FME workflow.
Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 6: Time Series and Deepnets. By Charles Parker (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/08/once-for-all-dnns-simplifying-design-of-efficient-models-for-diverse-hardware-a-presentation-from-mit/
For more information about edge AI and vision, please visit:
http://www.edge-ai-vision.com
Christine Cheng, co-chair of the inference benchmark working group at MLPerf and a senior machine learning optimization engineer at Intel, delivers the presentation “MLPerf: An Industry Standard Performance Benchmark Suite for Machine Learning” at the Edge AI and Vision Alliance’s July 2020 Edge AI and Vision Innovation Forum. Cheng explains how MLPerf’s inference benchmark suite for evaluating processor performance works and is evolving.
Creating an AI Startup: What You Need to KnowSeth Grimes
Seth Grimes presented "Creating an AI Startup: What You Need to Know," at a May 20, 2021 Launch Annapolis + Maryland AI (https://www.meetup.com/MarylandAI) program, focusing on opportunity and resources for Maryland tech entrepreneurs.
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
From Customer Emotions to Actionable Insights -- A presentation by Peter Dorrington, founder, XMplify Consulting, at the 2020 CX Emotion conference (https://cx-emotion.com), July 22, 2020.
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
Dan Lee from Dentuit AI presented an Intro to Deep Learning for Medical Image Analysis at the Maryland AI meetup (https://www.meetup.com/Maryland-AI), May 27, 2020. Visit https://www.youtube.com/watch?v=xl8i7CGDQi0 for video.
Emotion AI refers to a set of technologies -- natural language processing, voice tech, facial coding, neuroscience, and behavioral analytics -- applied to interactions to extract, convey, and induce emotion. Emotion AI is a presentation by Seth Grimes at AI for Human Language, March 5, 2020 in Tel Aviv.
Text Analytics for NLPers, a presentation by Seth Grimes, created for the December 2, 2019 Natural Language Processing-New York (NYC-NLP) meetup, https://www.meetup.com/NLP-NY/events/266093296/
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
"Our FinTech Future – AI’s Opportunities and Challenges?" is a presentation by Jim Kyung-Soo Liew, Ph.D. to the Artificial Intelligence Maryland (MD-AI) meetup (https://www.meetup.com/Maryland-AI/), November 20, 2019. Dr. Liew is Co-Founder of SoKat.co and Associate Professor at Johns Hopkins Carey Business School.
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
Presentation by Nathan Schneider, Assistant Professor of Linguistics and Computer Science at Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019 (https://www.meetup.com/DC-NLP/events/264894589/).
The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpu...Seth Grimes
Presentation by Nathan Scheider, Georgetown University, to the Washington DC Natural Language Processing meetup, October 14, 2019, https://www.meetup.com/DC-NLP/events/264894589/.
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
2. BIO
2
● NICE Systems
● Led Speech & Text Analytics research group
● First company to productize Speech2Text, ED, Voice Biometric in Call-Center
● INTEL
● Innovate for our products
● Collaborate with top academic
● Explore compute features that disrupt our HW
3. AGENDA
3
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop in EMNLP Nov. 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
5. The advantages of BERT
1. Efficient transfer learning
Leverage a large model that was pre-trained for a generic task using
a large amount of data for specific task using small amount of data.
high accuracy with smaller amount of data
2. Context embeddings.
Produces vectors that represent each word in a context of a
sentence. E.g. bank in “river bank” vs. “investment bank”
5
Task Specific
Classifier
Context embeddings
Input sentence
Task output
12/24 stacked layers of transformer encoder
(110/330M parameters)
6. 6
Pre-trained LMs have become extremely
large and deep
Pre-trained LMs have become extremely
large and deep
T5
11b
2.5
5
7.5
10
12.5
15
#par
b
Source: HuggingFace
7. 7
• Heavy computation
• Large memory footprint
• Hard to train/fine-tune
• Hard to deploy
How should we put these monsters in production?
9. Vectors for optimization
9
•Quantization of weights to int8 or other lower precision representation
•Pruning of weights and structural (complete layers, self-attention heads)
•Early prediction of samples by using predictors attached to shallow layers
•Sharing weights of self-attention and FFs modules across all model
blocks
•Training smaller models using Distillation and other novel techniques
•Replacing Transformer modules and searching for best architecture using
Neural Architecture Search
10. Quantization
10
•Quantization of BERT models to 16/8-bit weights
4x compression, minimal loss in accuracy
We Scaled Bert To Serve
1+ Billion Daily Requests
on CPUs
11. Pruning
11
It is possible, for some tasks, to prune up to 9 of the
top layers from a 12 layer model without degrading
the performance more than 3%.
Poor Man's BERT: Smaller and Faster Transformer Models
13. Naïve approach (Thieves on Sesame street, Krishna et al. ICLR20)
13
FF
Classifier
for fine
tuning
“Mulan is highly
recommended”
“The movie was
good as the book”
teacher
student
pseudo labels
annotated
labels
Unlabeled
examples
Labeled
examples
Task
Loss
Sent: POS
Sent: POS
14. *Distillation- mimic the output teacher probability
14
FF
Classifier
for fine
tuning
teacher
Unlabeled
examples
Distill
Loss
**mse
• Surprisingly work well
• Great for low resource tasks
Total
Loss
Task
Loss
student
*Hinton et al.**Tang et al.
16. Can we do more?
16
LSTM/CNN
>100x
Or CBOW
>1000x
17. 19
Real use-case example
• Named Entity Recognition (NER) is a widely used Information Extraction task in
many industrial applications and use cases
• Ramping up on a new domain can be difficult
§ Lots of unlabeled data, little of no labeled data and often not good enough for
training a model with good performance
Solution A
? Hire a linguist or data scientist to tune/build model
? Hire annotators to label more data or buy similar dataset
? Time/compute resource limitations
Solution B
? Pre-trained Language Models such as BERT, GPT, ELMo are great at low-
resource scenarios
? Require great compute and memory resources and suffer from high latency in
inference
? Deploying such models in production or on edge devices is a major issue This Photo by Unknown Author is licensed under CC BY
18. 20
65
70
75
80
85
90
95
150 300 750 3000
Accuracy
#samples
Name Entity Recognition (CoNLL-2003)
BERT Distil LSTM Distil ID-CNN
Compression Rate x1 x36 x36
•Train a small LSTM/CNN
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher
Peter et al. NeurIPS19
19. 21
78
80
82
84
86
88
90
92
94
Agnews 0.4K
samples
Dair's Emotions
16K samples
IMDB 1K samples STS-2 7K samples
Accuracy Text Classification
BERT Distill LSTM Distill CBOW
Compression Rate x1 x100 x1500
•Train a small CBOW
model using BERT
•Utilizing unlabeled data
via Teacher
•Student competitive
with Teacher in specific
dataset
Wasserblat, more details coming soon
20. 22
Takeaways
• Compact models perform equally well as pre-trained LM in low-resource
scenarios, and with superior inference speed and with high compression rate
• Practical Tips:
• Set simpler classifier as baseline
• Finetune DistillBERT/BERT on your task
• High resource for labeled data:
Go with DistillBERT or other compact pre-trained models
• Low resources for labeled data:
Distill BERT to simpler NN and compare to BERT
21. 23
•Data and training efficiency: models requiring less training data and/or less computational
resources and/or time;
•Inference efficiency: models with lower comp. complexity of prediction/inference
https://sites.google.com/view/sustainlp2020
22. AGENDA
24
● Efficiency
● Large model intro
● Inference efficiency: models with lower comp. complexity
● Examples
● SustiaNLP Workshop 2020
● Data challenges
● Extensibility: address new domain with limited data and minimal supervision
● Weakly-supervised ABSA example
23. The NLP today
25
● Create a model to individual task and domain
● Need a large team of domain experts, large amount of labeled data
and very time consuming
● Hard to scale and adapt solutions across different domains
● No adaptation to business environment
24. 26
ABSAexampleandusage
the owner is super friendly and service is fastthe owner is super friendly and service is fastfriendly fast
ASP ASPopinion opinion
25. TheAdvantagesofthealgo.Advantage
Aspect Based SA Producing knowledge regarding specific aspects hence enables
to gain targeted business insight.
Unsupervised -
Domain Adaptive
Unsupervised method - does not require costly manually
tagged data for training
Explainable AI Displaying the relation between opinion terms and aspects
enables the interpretability of the results
• ABSA recommended amongst Top 10 ML Code Examples on Azure
and Included by MSFT in their NLP Recipes
• Published in EMNLP19
• ABSA used by University of British Columbia and the British Columbia CDC to
analyze COVID-19 related tweets in North America. See Jang et al, 2020.