Domain specific nlp pipelines

Domain Specific
NLP Pipelines
Rajesh Muppalla
Senior Director of Engineering
@codingnirvana

2
About Me
● Senior Director of Engineering, Avalara
● Previously
○ Co-Founder @ Indix (acquired by Avalara in Feb 2019)
○ Tech Lead, Go-CD @ Thoughtworks - Open Source CI/CD Tool
● Scale By The Bay
○ 2016 - Data Pipelines Panelist
○ 2017 - Continuous Delivery For Machine Learning
● Based in Chennai, India

3
Talk Focus
● Working on NLP problems since early 2012
● Two domains now
○ Indix - E-Commerce
■ Evolution across 7 years
○ Avalara - Tax Compliance
■ Learnings from e-commerce
■ Newer techniques from last couple of years
● Share what we learnt

4
Indix - “Google Map” of Products

5
NLP Stack - E-Commerce Domain @ Indix
Classification NER
Document
Similarity
Auto
Complete
Query
ExpansionAlgorithms
Use Cases
Product
Classification
Attribute
Extraction
SearchMatching
Featurization/ NLU Embeddings
(Character/Word)
Language Models
(n-gram based)
Knowledge Graph
Pre-Processing Tokenizers Lemmatizers POS Tagging
Language
Detection
Extraction
Data
Parsers (HTML)
Training Data
(Labeled)
Raw Data
(UnLabeled)
Technology

Embeddings
Word Embeddings
Word2Vec FastText Glove
CBOW Skip-Gram
Dense Representation of word vectors learned from large unlabelled corpus
● Why?
○ Machines do not understand text, need a numerical representation
● Part of feature engineering
● Learned Embeddings
○ Capture notion of similarity
○ Popularized by Word2Vec by Mikolov in 2013
○ Glove and FastText are other implementations

8
Embeddings - Useful Properties
● Word Embeddings capture certain relations between words
Source - https://www.tensorflow.org/tutorials/text/word_embeddings

9
Embeddings - Training (Word2Vec - Skip Gram Model)
Source - http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

10
Embeddings - Hidden layer is Embedding Matrix
Source - http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

11
Word Embeddings for Document Similarity
Are these two products the same?
Cosine Similarity
Yes No
< Threshold> Threshold
0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1
0 1 2 4 ….
Burt’s Bee’s Lip Balm
0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1 0 1 -2 1 0 3 4 0 1
0 1 2 4 ….
Average
LipBalm By Burts Bees
LipBalm by Burts Bees Burt’s Bees Lip Balm

12
Product Classification - E-Commerce Domain

lip care (0.91)
Product Classification - Using FastText
0
1
-
2
1
0
Burt’s
Lip
Balm
Bees
0
1
-
3
4
6
0
1
-
4
6
7
0
1
-
5
9
3
0
1
2
3
5
7
...
1
9
3
2
9
4
5
8
Averaging
shoes (0.03)
mobiles (0.03)
Hidden Linear
Layer
Output Layer
Softmax
apparel (0.03)

14
Avalara - Part of “every” transaction
ERP
Ecommerce
Retail
Transactions
Tax
Tax Rates
Tax Boundaries
Taxability Rules
Returns
Exemption Statutes
Certificate Templates
Customers
Customers
Certificates

15
Taxes for specific type of Clothing
Tax Rules in New York
Source - https://www.avalara.com/us/en/learn/whitepapers/the-trials-and-tribulations-of-sales-tax-in-the-united-states.html

16
Product Classification - Tax Domain
Skin Care
(SK0001)
Tax Code US State Rate (%)
SK0001 Alabama 3
SK0001 New York 8
SK0001 Texas 0
SK0001 Oklahoma 5
Beauty / Makeup
(33041000)
HS Code Country
of Origin
Customs
Rate (%)
33041000 Canada 5
33041000 India 2
33041000 Singapore 7
33041000 Australia 5
Product
(Transaction)
Tax Code
HS Code
Tax Rule
Tax Rule
BB Lip Balm - 4pc

17
Product Classification - Tax Domain
● Should be easy?
○ “Similar” to Product Classification in E-Commerce Domain
● However, there are challenges
○ Classification Taxonomy is different - 2000 vs 5000 leaf nodes
○ Lack of labelled data for training (small data or low data problem)
○ Vocab is different
■ E.g. Abbreviations used in product transaction data
○ Data is noisy
● Decent Baseline
○ Re-use the Embeddings layer from e-Commerce and Re-train the Tax classifier model
● Can we do better?

18
Yes, we can
● Transfer Learning
● Weak Supervision
● Data Augmentation & Synthesis

Source - https://towardsdatascience.com/a-comprehensive-hands-on-guide-to-transfer-learning-with-real-world-applications-in-deep-learning-212bf3b2f27a
20
Transfer Learning

● Computer Vision
■ 2012 - ImageNet Competition
■ AlexNet - beat competition by 41% using Transfer Learning
● 2018 - Watershed moment in NLP
○ Transfer Learning using Pre-Trained Language Models
○ ELMO, ULMFIT, BERT, GPT, GPT-2
“It only seems to be a question of time until pretrained word embeddings will be dethroned
and replaced by pretrained language models in the toolbox of every NLP practitioner.”
Sebastian Ruder (Researcher, Deep Mind) - https://thegradient.pub/nlp-imagenet/
Transfer Learning - History

Transfer Learning - Using ULMFIT
Step 1 - Pre-Training on Source Dataset
AWD-LSTM
Softmax Layer
Hidden Layer
(3 Stacked BiLSTM)
Embedding Layer
2. Objective Function
Predict the next word
Pre-Trained
Language Model
1. Dataset (Huge)
ECommerce Product Data

23
Pre-Trained Language Models
Two-layer BiLSTM AWD-LSTM Transformer
ELMO ULMFIT GPT BERT GPT2
(Architecture)

Step 2 - Fine-Tuning on Domain Dataset
Hidden Layer
Weight Matrix
Embedding
Matrix
Predict the next word
1. Dataset (Large)
Tax Domain Product Data
(Unlabelled)
Pre-Trained Language Model

Step 2 - Fine-Tuning Tricks - Freezing + Gradual Unfreezing
Freeze
First few epochs of training
freeze the Bi-LSTM weights to
prevent catastrophic forgetting
of what it learned from the
source dataset

Step 2 - Fine-Tuning Tricks - Slanted Learning Rates
A short steep increase in size of LRs to quickly
converge to a suitable region in the parameter
space
A long decay period to precisely fine-tune the
weights

Step 3 - Domain Task Classifier
RELU
Softmax
Cross Entropy
1. Dataset (Small)
Tax Domain Labelled Data
Fine-Tuned Language Model

28
Tax Domain - Tax Rate Changes
Key Value
Effective Date May 1, 2019
Jurisdiction GeorgeTown County
New Rate 7%
Tax Type General
Semantic Role Labeling
(Slot Filling)
CoReference
Resolution
ELMO w/o Fine
Tuning

29
Transfer Learning - Why does it work?
● Many NLP tasks share common knowledge about language
○ Linguistic Representations, Structural Similarities
● Tasks can inform each other
○ Syntax, Semantics
● Labeled data is rare but unlabeled data is abundant in every domain

30
Pre-Trained Language Models vs Word Embeddings
● Entire Network vs Single Layer
● Pre-Trained Language Models are able to capture context
○ Example
■ Apple iPhone 64 GB Black
■ Apple Cider Vinegar

31
Weak Supervision
● AKA Data Programming
○ Libraries - Snorkel, Snuba
● Steps
○ Create “domain heuristics” or “labeling functions” based on the small unlabelled dataset
■ Snorkel needs humans to do this, Snuba can do that automatically for you
○ Learn a generative model that denoise these heuristics and can emit probabilistic labels
○ Run this model on the entire unlabelled dataset to get probabilistic labels
○ You now have a “good enough” large training data set

32
Data Augmentation & Synthesis
● Back Translation
○ Source language -> Pivot Language -> Source Language
● Synonyms Replacement
● Make sure
○ Preserve the semantic structure and the meaning

NLP Stack - Tax Domain @ Avalara
Classification SRL
CoReference
ResolutionAlgorithms
Use Cases
Product
Classification
Rates/Rules
Extraction
<Future Use
Case>
Embeddings
(Character/Word)
(using Transfer Learning)
Pre-Processing Tokenizers Lemmatizers POS Tagging
Language
Detection
Extraction
Data
Parsers (HTML)
Training Data
(Labeled)
Raw Data
(UnLabeled)
Parsers (PDF) OCR
<TBD>
33
Technology
Featurization/ NLU

34
Conclusion
● NLP Pipelines need to be domain specific
○ Libraries, Infrastructure and Techniques can be re-used across domains
○ Having good quality and labelled domain specific data is utmost important
● Domain Specific Data
○ Large labelled data
■ Most techniques will work out of the box
○ Use unlabelled data from your domain to your advantage
○ Small/Low labelled data
■ Transfer Learning using Pre-Trained language models gives you a strong baseline
■ Techniques like Weak Supervision and Data Augmentation will help too

Domain specific nlp pipelines

Recommended

Recommended

More Related Content

Similar to Domain specific nlp pipelines

Similar to Domain specific nlp pipelines (20)

Recently uploaded

Recently uploaded (20)

Domain specific nlp pipelines