SlideShare a Scribd company logo
1 of 60
ADAM ……for 5th
Elephant
ATTRIBUTE DETECTION AND ANNOTATION MODULE….. (pitched this
name because we have an “EVE” building up too)
What is ADAM?
ADAM is an advanced
Named Entity Recognition
module for domain specific
data designed to deal with
the situation of absence of
labelled data.
It leverages concepts from
weak labelling, deep
sequence models and
active learning
WHAT IS NAMED ENTITY RECOGNITION?
Sundar Pichai is the CEO of Google.
Person: Sundar Pichai
Position: CEO
Organisation: Google
WHAT NER CAN BE IN DOMAIN SPECIFIC
DATA?
Hospital Records…
“You will need to have your uterine bleeding evaluated. This continued
agitation may be caused by intra-parenchymal hemorrhage.”
Symptoms: uterine bleeding, continued agitation
Diagnosis: intra-parenchymal haemorrhage
WHAT NER CAN BE IN DOMAIN SPECIFIC
DATA?
Product Description…
“Pedigree is a complete and balanced food for dogs. Pedigree is rich in
proteins and nutrition.”
Brand: Pedigree Type: food for dogs
Nutrients: proteins
What will this session cover?
• Motivation for this problem.
• Why off-the-shelf solutions didn’t work for us.
• An approach to entity extraction from product title like text in the absence
of labelled data.
• Comparison with other models
• And takeaways
PROBLEM STATEMENT
Product Titles often have references to attributes which play a crucial role in
driving the use-cases we will describe later.
Therefore, there is a need to extract these attributes from the product titles.
Hence ADAM!!!
What will ADAM Do?
resolute black vodka 180ml
resoluteBb blackIb vodkaBc 180ml
(Tagged product title)
A.D.A.M.
MOTIVATION
PRODUCT CATALOGUE
• Universal Product Catalogue
with the ability of enhanced
semantic search
AGGREGATION AND MARKET ANALYSIS
Aggregation and Market Analysis
• Lift Insight
• Demand Insight
• Market Penetration
• Predictive Customer Analytics
EVOLVING KNOWLEDGE GRAPH
• Evolving Knowledge Graph.
• Graph with entities being
I(individual product or their
attributes) as nodes and edges
describing the relationship
between them
• Given a seed graph, the idea is
to evolve it from the data-set.
.
Challenges
• Domain specific data (Product Titles in our case)
• Zero ground truth and no training data available whatsoever.
• Multiple sources of data thus imagine the variance
• KESHKANTI HAIR CLEASNER SILK & SHINE 8 ML [1*960 PC] …….. A well described
stock-item
• N Deo Whitening Talc Touch100gm ……….. A not so well described stock-item
• B. M. W. 2L ………… Could have been a good stock-item desc.
• Short representations and extremely noisy
• A H F SHAMPOO400mlM220 ………. Very hard for a machine to understand
• HW Sandrop Hert 5 Ltr.
Some relatable previous works…
• Traditional Algorithms:
• Information extraction using CRF algorithm for generating hand-crafted features.
(Citation: Ajinkya M. , Attribute Extraction from Product Titles in e-Commerce)
• Information Extraction using weak labelling with the help of knowledge base on
twitter data.
(Citation: Alan R., Sam C,. Oren E., Named Entity Recognition in Tweets)
• Deep Learning Models:
• Lample et. al. , Neural Architecture for Deep learning models
• Xuezhe Ma and Edward Hovy, End-to-end Sequence Labeling via Bi-directional
LSTM-CNNs-CRF
Some relatable previous works…
• Off-the-shelf Tools:
• Stanford’s NLTK
• Google’s train and play deep learning models
• CRF-suite
Why they didn’t work?
• These approaches either leveraged a large knowledge base or a large amount of labelled
data. (Both of which are generally not available for industry applications)
• These models were trained on rather clean data-set with much smaller variance than
what we needed to deal with.
• Some of these approaches used hand-crafted features which couldn’t scale on a data-set
provided by millions of sources.
• Most of the pretrained tools like NLTK is trained on Natural Language Dataset which is
simply not the case with Product Titles.
• The attributes that they can predict are not relevant to us.
Off the Shelf tools Limitation
• They were trained on Natural Languages( i.e. Languages with proper
grammatical structure). Hence It works only on those type of sentences
Barrack/NNP Obama/NNP is/VBZ the/DT next/JJ president/NN
Person -> Barrack Obama , Org. -> president
Nestle/NNP Maggi/NNP Noodles/NNP 100gm/CD
Person -> Nestle, Person -> Maggi Noodles
Hand crafted feature
Limitation
• Worked well on Walmart Data-
set.
• Was a disaster when tried on our
data-set (because of high
variance)
CRF Results on Our Data-set
CRF Trained on hand crafted features on Label data crawled from various
websites
• shri hanuman blended mustard oil 15kg jar
• tata tea elaichi 250gm
• relive fruity jelly war jar 50gm
It did well on popular products:
• maggi tomato ketchup 200g
• veet hair removal cream 100g
Why ADAM is better.
• Leverages weak labelled data.
• Leverages active learning.
• Immune to noise.
• Does not require any hand-crafted features as input.
A DEEPER LOOK
AT ADAM
PROBLEMS FACED AND HOW WE SOLVED THEM
ARCHITECTUR
E
The 3 Main Components of ADAM…
• Weak Label Generation
• State of the Art Sequence Tagging Model
• Active Learning approach
Weak Label Generation
• How we used an existing knowledge Graph.
• How we improvised using the information from other sources like Amazon
catalogue.
• In addition, How we leveraged the structure of dataset (Stock-item and
Stock-group).
Knowledge Graph
Examples
• Fig. 1 Our Knowledge
Graph
• Fig. 2 A unit of amazon
data-set
EXAMPLE OF
OUR DATA
SET…
• Id Seg. Stock-Group Stock-item
Weak Labelling Algorithm
• We used a complicated rule based string matching algorithm. Which
annotates different tokens present in a stock-item using our knowledge
base.
• Then we use some constraints to pick the good quality annotated stock-
items. Using once again a bunch of rules.
Label Generation
STOCK_GROUP STOCK_ITEM
Quality
PICKWICK PICKWICK WAFOBIX PINEAPPLE
Bb Bb Bc
Wood Wood T-shirt Man 1000
Bc Bc Bc Bb
Seed Data-set Count
• Now We passed around 0.7million stock-items through this process.
• Out of that only 8 thousand were able to pass the quality check.
• And that becomes our seed data-set.
SEQUENTIAL
MODEL
ARCHITECTU
RE
Word Embedding Layer
WORD EMBEDDING
LAYER
• Why we couldn’t use an
existing embedding space
or pretrained set of vectors
for this?
• How we created our own?
What data did we use?
• Used “Skip-Gram” with
“hierarchical softmax”
optimization (mikolov et.
al)
SKIP GRAM
ALGORITHM
Data-set we used.
• Stock-items provided by Tally’s
Product
• Product titles from Amazon’s
Catalogue and GS1 data
• Product titles crawled from
various websites.
• A total size of around 13 million
titles were used
BI-LSTM LAYER
BI-LSTM LAYER
Sequential Training Model
• word level encoder which
leverages the sequential
information to encode the
tokens of the sentence.
• Why another word level
encoder used? What more
information does it encode?
CONDITIONAL
RANDOM FIELDS
• Like other layers this also uses
context from neighbouring
tokens and labels but,
• Bi-LSTM only leverages input
context.
• CRF is the only layer that
leverages output label context.
THE TRANSITION MATRIX
SOME RESULTS WITH BI-LSTM AS FINAL
LAYER
• brit cow ghee 1ltr
• patanjali kesh kanti natural 200ml
• dabur chawanprash 500gm mrp
• smirnoff green apple triple distilled vodka 750ml
• eveready torch
IMPROVED RESULTS BECAUSE OF CRF
• brit cow ghee 1ltr
• patanjali kesh kanti natural 200ml
• dabur chawanprash 500gm mrp
• smirnoff green apple triple distilled vodka 750ml
• eveready torch
LOSS FUNCTION
Paramet
ers
Description
f() Function that generates emission score
and transition score
I Ith time-step or ith token
lambda
Trainable hyperparameter
Z(x) Normalisation Score
MINIMIZING TOTAL ENERGY OF THE
SEQUENCE
BASELINE RESULTS NUMBERS HERE WHY
IT WASN’T ENOUGH
On a hold out set of 1321 data points
Surface format Match
• Brand: 50.1%
• Category: 44.4%
Complete Sequence Match: 30.2%
ACTIVE LEARNING.
• The first 2 parts give us a good baseline model. But Why Isn’t it good
enough?
• One limitation with our automated weak label generation process is that it
is constrained by the completeness and quality of knowledge base.
• So we need to get some data-points manually labelled.
• Hence, the aim is to generate samples consciously which will lead to
maximal improvement in the model with minimal labelling effort
WHAT WE DID…
• Extrinsic Sampling. (Diversity based Sampling technique)
• Why uncertainty based sampling didn’t work?
• Manually labelled those samples.
• Retrain the model using augmented data-points using SGD and lesser no.
of epochs
• Test the model on the hold out set. If the model is improved, Repeat, until
you reach maturity
ACTIVE LEARNING : TRAINING ITERATION
EXTRINSIC SAMPLING
EXTRINSIC SAMPLING
ITERATIVE IMPROVEMENT IN TRAINING
DATA
ADAM TRAINING PIPELINE
SOME OPTIMISTIC FINAL RESULTS
• n relive fruity jelly box 70ml
• zee citric acid 20g
• shri hanuman blended mustard oil 15kg jar
• shaving cream good morning 70gm
• Sandisc flash harddrive 100TB model TTB200
SOME NOT SO OPTIMISTIC RESULTS
• hawkins induction ltr heavy base pr ih
• mixed pickle 200g freshy 36pcs
• nutrela m oil 5ltr
METRICS
All the results that we see below is generated on a hold-out set !! i.e. On a
sample set, whose entities were never part of training data
• Baseline accuracy :
• Iterative Accuracy :
• Precision and Recall :
Iter 0 Iter 1 Iter 2 …. Iter n
Iter 0 Iter 1 Iter 2 …. Iter n
TAG FLIPS (MODEL STABILITY
IMPROVEMENTS)
IMPROVEMENT GRAPHS
• < Will be added shortly>
COMPARISON WITH CRF-SUITE
• CRF suite was very positional bias and couldn’t generalise on relative
positioning
• The results on the holdout set (data-set with entities which were not
present during training) with CRF-suite while It was a great improvement
when we switched to ADAM
• Scaling was impossible with CRF-suite
• <Will update some comparison metrics shortly>
OUR CONTRIBUTION
• We propose a novel entity extraction model for domain-specific data – short
and noisy – and in the absence of pre-labelled data.
• Our model builds upon a state-of-the-art model based on deep learning and
CRF and further leverages weak labelling and active learning techniques.
• We propose a novel extrinsic sampling technique for active learning (which
performs better than uncertainty sampling for this task)
TAKE-AWAYS:
• How can Industry grade Information Extraction model be made for domain
specific data.
• How to tackle the noise problem in case of textual data.
• Why a deep NN model plays an important role in generalisation.
• Why Active Learning is a really important concept for dealing with the
problem of zero label data.
FUTURE SCOPE OF ADAM
• Higher order attribute extraction.
• Good Night Advance Mat 12 pack.
• Britannia Goodday cashew and nuts cookies 50 gm.
• Relationship Extraction
CONCLUSION
Deep Learning is surely making a mark in the field of NLP. But its
industrialisation is still an open problem. Mostly because the quality of
textual data-set is not very apt for the models to learn.
Active Learning is an interesting concept to tackle the above situation. Not
only in the field of NLP, but the same concept can be generalised to other
domains of Machine Learning and AI as well.
ACKNOWLEDGMENTS
• Mikolov et. al. Word2Vector
• https://arxiv.org/pdf/1707.05928v2.pdf For active learning.
• My teammates: Abishek Ahluwalia( Data Scientist II), Deepak Sharma
(Lead Data Scientist), Ashish Anand Kulkarni (Director Data Science)
THANK YOU

More Related Content

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

Featured

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming LanguageSimplilearn
 

Featured (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

5th e le_revised

  • 1. ADAM ……for 5th Elephant ATTRIBUTE DETECTION AND ANNOTATION MODULE….. (pitched this name because we have an “EVE” building up too)
  • 2. What is ADAM? ADAM is an advanced Named Entity Recognition module for domain specific data designed to deal with the situation of absence of labelled data. It leverages concepts from weak labelling, deep sequence models and active learning
  • 3. WHAT IS NAMED ENTITY RECOGNITION? Sundar Pichai is the CEO of Google. Person: Sundar Pichai Position: CEO Organisation: Google
  • 4. WHAT NER CAN BE IN DOMAIN SPECIFIC DATA? Hospital Records… “You will need to have your uterine bleeding evaluated. This continued agitation may be caused by intra-parenchymal hemorrhage.” Symptoms: uterine bleeding, continued agitation Diagnosis: intra-parenchymal haemorrhage
  • 5. WHAT NER CAN BE IN DOMAIN SPECIFIC DATA? Product Description… “Pedigree is a complete and balanced food for dogs. Pedigree is rich in proteins and nutrition.” Brand: Pedigree Type: food for dogs Nutrients: proteins
  • 6. What will this session cover? • Motivation for this problem. • Why off-the-shelf solutions didn’t work for us. • An approach to entity extraction from product title like text in the absence of labelled data. • Comparison with other models • And takeaways
  • 7. PROBLEM STATEMENT Product Titles often have references to attributes which play a crucial role in driving the use-cases we will describe later. Therefore, there is a need to extract these attributes from the product titles. Hence ADAM!!!
  • 8. What will ADAM Do? resolute black vodka 180ml resoluteBb blackIb vodkaBc 180ml (Tagged product title) A.D.A.M.
  • 10. PRODUCT CATALOGUE • Universal Product Catalogue with the ability of enhanced semantic search
  • 11. AGGREGATION AND MARKET ANALYSIS Aggregation and Market Analysis • Lift Insight • Demand Insight • Market Penetration • Predictive Customer Analytics
  • 12. EVOLVING KNOWLEDGE GRAPH • Evolving Knowledge Graph. • Graph with entities being I(individual product or their attributes) as nodes and edges describing the relationship between them • Given a seed graph, the idea is to evolve it from the data-set. .
  • 13. Challenges • Domain specific data (Product Titles in our case) • Zero ground truth and no training data available whatsoever. • Multiple sources of data thus imagine the variance • KESHKANTI HAIR CLEASNER SILK & SHINE 8 ML [1*960 PC] …….. A well described stock-item • N Deo Whitening Talc Touch100gm ……….. A not so well described stock-item • B. M. W. 2L ………… Could have been a good stock-item desc. • Short representations and extremely noisy • A H F SHAMPOO400mlM220 ………. Very hard for a machine to understand • HW Sandrop Hert 5 Ltr.
  • 14. Some relatable previous works… • Traditional Algorithms: • Information extraction using CRF algorithm for generating hand-crafted features. (Citation: Ajinkya M. , Attribute Extraction from Product Titles in e-Commerce) • Information Extraction using weak labelling with the help of knowledge base on twitter data. (Citation: Alan R., Sam C,. Oren E., Named Entity Recognition in Tweets) • Deep Learning Models: • Lample et. al. , Neural Architecture for Deep learning models • Xuezhe Ma and Edward Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF
  • 15. Some relatable previous works… • Off-the-shelf Tools: • Stanford’s NLTK • Google’s train and play deep learning models • CRF-suite
  • 16. Why they didn’t work? • These approaches either leveraged a large knowledge base or a large amount of labelled data. (Both of which are generally not available for industry applications) • These models were trained on rather clean data-set with much smaller variance than what we needed to deal with. • Some of these approaches used hand-crafted features which couldn’t scale on a data-set provided by millions of sources. • Most of the pretrained tools like NLTK is trained on Natural Language Dataset which is simply not the case with Product Titles. • The attributes that they can predict are not relevant to us.
  • 17. Off the Shelf tools Limitation • They were trained on Natural Languages( i.e. Languages with proper grammatical structure). Hence It works only on those type of sentences Barrack/NNP Obama/NNP is/VBZ the/DT next/JJ president/NN Person -> Barrack Obama , Org. -> president Nestle/NNP Maggi/NNP Noodles/NNP 100gm/CD Person -> Nestle, Person -> Maggi Noodles
  • 18. Hand crafted feature Limitation • Worked well on Walmart Data- set. • Was a disaster when tried on our data-set (because of high variance)
  • 19. CRF Results on Our Data-set CRF Trained on hand crafted features on Label data crawled from various websites • shri hanuman blended mustard oil 15kg jar • tata tea elaichi 250gm • relive fruity jelly war jar 50gm It did well on popular products: • maggi tomato ketchup 200g • veet hair removal cream 100g
  • 20. Why ADAM is better. • Leverages weak labelled data. • Leverages active learning. • Immune to noise. • Does not require any hand-crafted features as input.
  • 21. A DEEPER LOOK AT ADAM PROBLEMS FACED AND HOW WE SOLVED THEM
  • 22. ARCHITECTUR E The 3 Main Components of ADAM… • Weak Label Generation • State of the Art Sequence Tagging Model • Active Learning approach
  • 23. Weak Label Generation • How we used an existing knowledge Graph. • How we improvised using the information from other sources like Amazon catalogue. • In addition, How we leveraged the structure of dataset (Stock-item and Stock-group).
  • 24. Knowledge Graph Examples • Fig. 1 Our Knowledge Graph • Fig. 2 A unit of amazon data-set
  • 25. EXAMPLE OF OUR DATA SET… • Id Seg. Stock-Group Stock-item
  • 26. Weak Labelling Algorithm • We used a complicated rule based string matching algorithm. Which annotates different tokens present in a stock-item using our knowledge base. • Then we use some constraints to pick the good quality annotated stock- items. Using once again a bunch of rules.
  • 27. Label Generation STOCK_GROUP STOCK_ITEM Quality PICKWICK PICKWICK WAFOBIX PINEAPPLE Bb Bb Bc Wood Wood T-shirt Man 1000 Bc Bc Bc Bb
  • 28. Seed Data-set Count • Now We passed around 0.7million stock-items through this process. • Out of that only 8 thousand were able to pass the quality check. • And that becomes our seed data-set.
  • 31. WORD EMBEDDING LAYER • Why we couldn’t use an existing embedding space or pretrained set of vectors for this? • How we created our own? What data did we use? • Used “Skip-Gram” with “hierarchical softmax” optimization (mikolov et. al)
  • 32. SKIP GRAM ALGORITHM Data-set we used. • Stock-items provided by Tally’s Product • Product titles from Amazon’s Catalogue and GS1 data • Product titles crawled from various websites. • A total size of around 13 million titles were used
  • 34. BI-LSTM LAYER Sequential Training Model • word level encoder which leverages the sequential information to encode the tokens of the sentence. • Why another word level encoder used? What more information does it encode?
  • 35. CONDITIONAL RANDOM FIELDS • Like other layers this also uses context from neighbouring tokens and labels but, • Bi-LSTM only leverages input context. • CRF is the only layer that leverages output label context.
  • 37. SOME RESULTS WITH BI-LSTM AS FINAL LAYER • brit cow ghee 1ltr • patanjali kesh kanti natural 200ml • dabur chawanprash 500gm mrp • smirnoff green apple triple distilled vodka 750ml • eveready torch
  • 38. IMPROVED RESULTS BECAUSE OF CRF • brit cow ghee 1ltr • patanjali kesh kanti natural 200ml • dabur chawanprash 500gm mrp • smirnoff green apple triple distilled vodka 750ml • eveready torch
  • 39. LOSS FUNCTION Paramet ers Description f() Function that generates emission score and transition score I Ith time-step or ith token lambda Trainable hyperparameter Z(x) Normalisation Score
  • 40. MINIMIZING TOTAL ENERGY OF THE SEQUENCE
  • 41. BASELINE RESULTS NUMBERS HERE WHY IT WASN’T ENOUGH On a hold out set of 1321 data points Surface format Match • Brand: 50.1% • Category: 44.4% Complete Sequence Match: 30.2%
  • 42. ACTIVE LEARNING. • The first 2 parts give us a good baseline model. But Why Isn’t it good enough? • One limitation with our automated weak label generation process is that it is constrained by the completeness and quality of knowledge base. • So we need to get some data-points manually labelled. • Hence, the aim is to generate samples consciously which will lead to maximal improvement in the model with minimal labelling effort
  • 43. WHAT WE DID… • Extrinsic Sampling. (Diversity based Sampling technique) • Why uncertainty based sampling didn’t work? • Manually labelled those samples. • Retrain the model using augmented data-points using SGD and lesser no. of epochs • Test the model on the hold out set. If the model is improved, Repeat, until you reach maturity
  • 44. ACTIVE LEARNING : TRAINING ITERATION
  • 47. ITERATIVE IMPROVEMENT IN TRAINING DATA
  • 49. SOME OPTIMISTIC FINAL RESULTS • n relive fruity jelly box 70ml • zee citric acid 20g • shri hanuman blended mustard oil 15kg jar • shaving cream good morning 70gm • Sandisc flash harddrive 100TB model TTB200
  • 50. SOME NOT SO OPTIMISTIC RESULTS • hawkins induction ltr heavy base pr ih • mixed pickle 200g freshy 36pcs • nutrela m oil 5ltr
  • 51. METRICS All the results that we see below is generated on a hold-out set !! i.e. On a sample set, whose entities were never part of training data • Baseline accuracy : • Iterative Accuracy : • Precision and Recall : Iter 0 Iter 1 Iter 2 …. Iter n Iter 0 Iter 1 Iter 2 …. Iter n
  • 52. TAG FLIPS (MODEL STABILITY IMPROVEMENTS)
  • 53. IMPROVEMENT GRAPHS • < Will be added shortly>
  • 54. COMPARISON WITH CRF-SUITE • CRF suite was very positional bias and couldn’t generalise on relative positioning • The results on the holdout set (data-set with entities which were not present during training) with CRF-suite while It was a great improvement when we switched to ADAM • Scaling was impossible with CRF-suite • <Will update some comparison metrics shortly>
  • 55. OUR CONTRIBUTION • We propose a novel entity extraction model for domain-specific data – short and noisy – and in the absence of pre-labelled data. • Our model builds upon a state-of-the-art model based on deep learning and CRF and further leverages weak labelling and active learning techniques. • We propose a novel extrinsic sampling technique for active learning (which performs better than uncertainty sampling for this task)
  • 56. TAKE-AWAYS: • How can Industry grade Information Extraction model be made for domain specific data. • How to tackle the noise problem in case of textual data. • Why a deep NN model plays an important role in generalisation. • Why Active Learning is a really important concept for dealing with the problem of zero label data.
  • 57. FUTURE SCOPE OF ADAM • Higher order attribute extraction. • Good Night Advance Mat 12 pack. • Britannia Goodday cashew and nuts cookies 50 gm. • Relationship Extraction
  • 58. CONCLUSION Deep Learning is surely making a mark in the field of NLP. But its industrialisation is still an open problem. Mostly because the quality of textual data-set is not very apt for the models to learn. Active Learning is an interesting concept to tackle the above situation. Not only in the field of NLP, but the same concept can be generalised to other domains of Machine Learning and AI as well.
  • 59. ACKNOWLEDGMENTS • Mikolov et. al. Word2Vector • https://arxiv.org/pdf/1707.05928v2.pdf For active learning. • My teammates: Abishek Ahluwalia( Data Scientist II), Deepak Sharma (Lead Data Scientist), Ashish Anand Kulkarni (Director Data Science)

Editor's Notes

  1. 1) Why we had to solve NER problem again even though it has been solved in many different ways? What are the real life application of ADAM? Why does its existence matter and helpful not only at Clustr but for many other businesses too? 2) Ditto 3) Will explain through and through why we did what we did? Why and how we achieve our end goal? 4) Will show some results that will justify our approach 5) How this approach can inspire others in ML domain.
  2. 1) We at Clustr deal with product titles. Like you see in any e-commerce website 2) <read through>
  3. A good catalogue is defined by its coverage and advanced search and select ability. Attributes of those product titles can be a great filter for complicated searches. ADAM can help automate them
  4. Market penetration Localisation Trend flow All can be automated with the help of ADAM
  5. Obviously these ontology needs extracted information to fill in nodes and edges. Hence if we can automate the extraction, we can build an ontology that can evolve automatically
  6. 1) a) Walmart published a paper which showed attribute extraction using CRF algorithm, which uses hand crafted feature which I will describe later. 1) b) There have been a few paper where they generate weak label data automatically using a healthy knowledge base and strict string matching algorithm 2) Few good off the shelf tools like NLTK(s), Google deep-learning models and CRF-suite. Are good with natural language data but not with ours.
  7. <Read it through.> Refer to the paper of open-tag and how they used very specific domain to test their algorithm Since our data-set that we have a are produced by millions of users there is no specific way of writing or rules that they follow. Hence no hand crafted features.
  8. We have a special way to leverage a small knowledge base and structure of data to generate labelled data (seed to be exact) We leverage the concept of active learning whose aim is to get the minimum amount of manual labelling for training purpose. Whose details are followed later. Why this approach can adapt to noise. (Both because of model and active learning) No need to generate hand crafted feature.
  9. Put only the picture in the slide and basically focus on the 3 components
  10. Describe why stock groups are important
  11. Use the term resolution. How we use our knowledge base to resolve and talk about constraint Tell them the numbers to get the idea of constraints
  12. Explain the whole architecture
  13. Introduce the data that we use, how we removed the digits , and demonstrate the embedding space mikolov et al. introduce citations
  14. It is an advance form of recurrent neural network. <Describe the diagram> Talk about what are its important enhancement. Forget gate. Talk about an example of differentiating apple as a category when gm is present and apple as brand when something else. Why this was needed? Since it takes representation into account, it can differentiate very well between ambiguous words. Which is absent in word embedding.
  15. BILSTM would’ve only used the sequential context of the symbols to make the decision while CRF used the context of sequential states. Explain the concept of transfer energy and emission energy you are trying to minimise
  16. {'surface_format_match_brand': 663, 'hypothised_entity_match_brand': 419, 'wrong_surface_boundary_category': 533, 'assign_wrong_surface_boundary_and_attribute_brand': 150, 'hypothised_entity_match_category': 50, 'surface_format_match_category': 587, 'assign_wrong_surface_boundary_and_attribute_category': 424, 'wrong_surface_boundary_brand': 166, 'assign_wrong_attribute_brand': 157, 'complete_match': 387, 'assign_wrong_attribute_category': 170}
  17. Decide on weather to inform about intrinsic sampling
  18. <Numbers will be added shortly>