SlideShare a Scribd company logo
1 of 41
Download to read offline
5 Lessons I’ve Learned
Tackling Product Matching for
E-commerce
Govind Chandrasekhar
@govind201
Hello!
1. Unsupervised Content Extraction
HTML→ structured attributes
2. Categorization
Dell LED Monitor →
Electronics | Computers & Accessories | Monitors
3. Feature Enhancement
Hip rubber insole Air Jordans available in black →
{“model” : “Air Jordan”, “color” : “Black”,
“insole_material” : “Rubber”}
Product Matching
Challenges - Variations
Challenges - Subtleties
Challenges - Branding Themes
5 Stories
● Goal
● Problem
● Cause
● Lessons
Story #1
Build a good dataset for training and validation
● Matches (1): Manually curated by humans
● Non-Matches (0): Semi-automated heuristic-based generation
Goal
Same Website
Goal
Highly Rated
Low Edit Distance
=> Not a Match
Training & Validation
Problem
≈90%
≈70%Real World Testing
Same Distinct Signal
{
“name” : “Apple iPhone 7 - 128GB - Black”,
“image” : “sem3-idn/image.jpg”,
“description” : “This sleek …”,
“features” : {
"Hello.com SKU": "B123",
}, ...
}
Cause
{
“name” : “Apple iPhone 7 - 32GB - Black”,
“image” : “sem3-idn/image.jpg”,
“description” : “This sleek …”,
“features” : {
"Hello.com SKU": "B456",
}, ...
}
Same Website
=> Not a Match
Cause
Same Background Color
Same Website
=> Not a Match
Cause
Different Background Color
Different Website
=> Match
➔ Watch out for quirks in your training dataset, especially causal
vs. incidental relationships.
➔ Models don’t care about your problem; they only care about
minimizing loss.
➔ When working on your own custom problems, you can’t make
the assumption that your dataset is flawless (vis-a-vis
peer-reviewed standardized datasets).
Lessons
➔ “Automated Inference on Criminality using Face Images” [Wu &
Zhang - Nov 2016]
➔ Identified criminality with 90% accuracy (AlexNet)
“[…] the angle θ from nose tip to two mouth corners is on average 19.6% smaller for criminals
than for non-criminals and has a larger variance. Also, the upper lip curvature ρ is on average
23.4% larger for criminals than for noncriminals. On the other hand, the distance d between
two eye inner corners for criminals is slightly narrower (5.6%) than for non-criminals.”
Aside
➔ Teardown (Link)
◆ Bias towards collared shirts?
◆ Bias against younger people?
◆ Bias towards likelihood of conviction or criminality?
Aside
Story #2
Multiple signals on offer
➔ Text
➔ Images
➔ Identifiers (UPC/Model …)
➔ HTML
Goal
FOCUS
Problem
Find the odd one out:
1. Map of Arizona
2. Map of AR
3. Map of Arkansas
Cause
No underlying rule here
➔ Sift out knowledge-based tasks from logic-based tasks.
➔ “Never mind a neural network; can a human with no prior
knowledge, educated on nothing but a diet of your training
dataset, solve the problem?”
➔ Spending hours poring over your dataset can be rewarding.
Lessons
Story #3
Combine multiple models built on individual signals into a single
multimodal model
Goal
DECISION
Problem
Combined model only slightly better than the best individual model
Model Accuracy
Text Only X %
Image Only Y %
Image + Text max(X, Y) + %
➔ Combined model had learned to only consider unimodal
features / the stronger of the two signals.
➔ It had failed to learn correlations between images and text.
➔ Since our text and image models had been pre-trained
separately, they’d learned isolated, unrelated representations.
Cause
How do we learn shared representations?
Lessons
Multimodal Deep Learning by Ngiam, Khosla, Kim, Nam, Lee & Ng
Lessons
Multimodal Deep Learning by Ngiam, Khosla, Kim, Nam, Lee & Ng
➔ Check if your multimodal models have been able to learn
meaningful correlations / shared representations.
➔ If you want your network to develop a characteristic, explicitly
set an objective to achieve this goal (autoencoder example).
Lessons
Story #4
➔ Make a case to the team for replacing our hand-crafted
heuristic-based model with a machine-learning model.
➔ But in benchmark tests, for certain pockets of data, the simplistic
heuristic-based approach performed better!
Goal & Problem
For these pockets of data, one or more of the following was at play:
➔ Our training data wasn’t rich enough.
➔ Our model hadn’t been perfectly tuned.
➔ Our older hand-crafted features were surprisingly good.
Cause
Lessons
DECISION
FEATURE SET 1
FEATURE SET 2
Accuracy went up by ≈3%!
Lessons
➔ Hand-crafted feature engineering is a potent tool. Critical for
best-in-class solutions for image retrieval, tagging and more.
➔ It can be cheaper & quicker than architecture engineering. You
can’t deep-learn your way out of everything.
➔ A good way to think up features is to retrace your own
intermediary cognitive steps.
➔ Find data-scientists who are willing to do last-mile grunt work.
Story #5
Goal & Problem
➔ Package our model as an AI-as-a-service product offering.
➔ Load the model behind a metered firewall, and we’re good to
go. Right ...?
➔ The service worked well for some customers. Failed miserably
for the others.
Cause
Expectation of inference of missing data
iPhone 7 128GB Black Apple iPhone 7 128GB Blackvs.
{
“name” : “Apple iPhone 7 128 gig Black”,
“image” : undef
}
Cause
Significant variety of quality in our inputs.
{
“name” : “Apple iPhone 7 - 128GB - Black
Unlocked”,
“image” : “sem3-idn/image.jpg”,
“description” : “This sleek …”,
“features” : {
"RAM Memory": "2 GB",
"Has Touchscreen": "Y",
"Has Bluetooth": "Y",
"Processor Type": "A10 Fusion",
"Has Flash": "Y",
"Display Resolution": "3840 x 2160",
"Display Technology": "LCD",
"Backlight Type": "LED",
"Operating System": "iOS 10",
"Screen Size": "5.5"",
"Assembled Product Weight": "6.63 oz",
}, ...
}
vs.
➔ Moving from ML model → ML product isn’t easy. Algorithmic APIs
are “non-deterministic” (vis-a-vis Stripe/Facebook APIs).
➔ Product design is crucial; PMs take note.
➔ Setting customer expectations is crucial; UX designers take note.
➔ Building (multiple) models resilient to different types of data in the
last mile is crucial; data-scientists take note.
Lessons
Fin!
Contact
semantics3.com/blog
govindc.com
medium.com/@govind201
twitter.com/@govind201

More Related Content

Similar to 5 Lessons I’ve Learned Tackling Product Matching for E-commerce

(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...Edge AI and Vision Alliance
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptxgdgsurrey
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami PresentationGreg Werner
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUnity Technologies
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine LearningPrzemek Biecek
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningNikolay Karelin
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksHima Patel
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management ToolkitJack Moore
 
Day 1 wazz up ai
Day 1  wazz up aiDay 1  wazz up ai
Day 1 wazz up aiHuyPhmNht2
 
Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Lviv Startup Club
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and MLNeo4j
 

Similar to 5 Lessons I’ve Learned Tackling Product Matching for E-commerce (20)

(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f..."Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
"Solving Vision Tasks Using Deep Learning: An Introduction," a Presentation f...
 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx2024-02-24_Session 1 - PMLE_UPDATED.pptx
2024-02-24_Session 1 - PMLE_UPDATED.pptx
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Data Science Salon Miami Presentation
Data Science Salon Miami PresentationData Science Salon Miami Presentation
Data Science Salon Miami Presentation
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
Ezml Stanford 2015
Ezml Stanford 2015Ezml Stanford 2015
Ezml Stanford 2015
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Synergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software EngineeringSynergy of Human and Artificial Intelligence in Software Engineering
Synergy of Human and Artificial Intelligence in Software Engineering
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine Learning
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 
Data Quality for Machine Learning Tasks
Data Quality for Machine Learning TasksData Quality for Machine Learning Tasks
Data Quality for Machine Learning Tasks
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
Day 1 wazz up ai
Day 1  wazz up aiDay 1  wazz up ai
Day 1 wazz up ai
 
Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...Yurii Pashchenko: Tips and tricks for building your own automated visual data...
Yurii Pashchenko: Tips and tricks for building your own automated visual data...
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

5 Lessons I’ve Learned Tackling Product Matching for E-commerce

  • 1. 5 Lessons I’ve Learned Tackling Product Matching for E-commerce Govind Chandrasekhar @govind201
  • 2. Hello! 1. Unsupervised Content Extraction HTML→ structured attributes 2. Categorization Dell LED Monitor → Electronics | Computers & Accessories | Monitors 3. Feature Enhancement Hip rubber insole Air Jordans available in black → {“model” : “Air Jordan”, “color” : “Black”, “insole_material” : “Rubber”}
  • 7. 5 Stories ● Goal ● Problem ● Cause ● Lessons
  • 9. Build a good dataset for training and validation ● Matches (1): Manually curated by humans ● Non-Matches (0): Semi-automated heuristic-based generation Goal
  • 10. Same Website Goal Highly Rated Low Edit Distance => Not a Match
  • 12. Same Distinct Signal { “name” : “Apple iPhone 7 - 128GB - Black”, “image” : “sem3-idn/image.jpg”, “description” : “This sleek …”, “features” : { "Hello.com SKU": "B123", }, ... } Cause { “name” : “Apple iPhone 7 - 32GB - Black”, “image” : “sem3-idn/image.jpg”, “description” : “This sleek …”, “features” : { "Hello.com SKU": "B456", }, ... } Same Website => Not a Match
  • 13. Cause Same Background Color Same Website => Not a Match
  • 15. ➔ Watch out for quirks in your training dataset, especially causal vs. incidental relationships. ➔ Models don’t care about your problem; they only care about minimizing loss. ➔ When working on your own custom problems, you can’t make the assumption that your dataset is flawless (vis-a-vis peer-reviewed standardized datasets). Lessons
  • 16. ➔ “Automated Inference on Criminality using Face Images” [Wu & Zhang - Nov 2016] ➔ Identified criminality with 90% accuracy (AlexNet) “[…] the angle θ from nose tip to two mouth corners is on average 19.6% smaller for criminals than for non-criminals and has a larger variance. Also, the upper lip curvature ρ is on average 23.4% larger for criminals than for noncriminals. On the other hand, the distance d between two eye inner corners for criminals is slightly narrower (5.6%) than for non-criminals.” Aside
  • 17. ➔ Teardown (Link) ◆ Bias towards collared shirts? ◆ Bias against younger people? ◆ Bias towards likelihood of conviction or criminality? Aside
  • 19. Multiple signals on offer ➔ Text ➔ Images ➔ Identifiers (UPC/Model …) ➔ HTML Goal FOCUS
  • 21. Find the odd one out: 1. Map of Arizona 2. Map of AR 3. Map of Arkansas Cause No underlying rule here
  • 22. ➔ Sift out knowledge-based tasks from logic-based tasks. ➔ “Never mind a neural network; can a human with no prior knowledge, educated on nothing but a diet of your training dataset, solve the problem?” ➔ Spending hours poring over your dataset can be rewarding. Lessons
  • 24. Combine multiple models built on individual signals into a single multimodal model Goal DECISION
  • 25. Problem Combined model only slightly better than the best individual model Model Accuracy Text Only X % Image Only Y % Image + Text max(X, Y) + %
  • 26. ➔ Combined model had learned to only consider unimodal features / the stronger of the two signals. ➔ It had failed to learn correlations between images and text. ➔ Since our text and image models had been pre-trained separately, they’d learned isolated, unrelated representations. Cause How do we learn shared representations?
  • 27. Lessons Multimodal Deep Learning by Ngiam, Khosla, Kim, Nam, Lee & Ng
  • 28. Lessons Multimodal Deep Learning by Ngiam, Khosla, Kim, Nam, Lee & Ng
  • 29. ➔ Check if your multimodal models have been able to learn meaningful correlations / shared representations. ➔ If you want your network to develop a characteristic, explicitly set an objective to achieve this goal (autoencoder example). Lessons
  • 31. ➔ Make a case to the team for replacing our hand-crafted heuristic-based model with a machine-learning model. ➔ But in benchmark tests, for certain pockets of data, the simplistic heuristic-based approach performed better! Goal & Problem
  • 32. For these pockets of data, one or more of the following was at play: ➔ Our training data wasn’t rich enough. ➔ Our model hadn’t been perfectly tuned. ➔ Our older hand-crafted features were surprisingly good. Cause
  • 33. Lessons DECISION FEATURE SET 1 FEATURE SET 2 Accuracy went up by ≈3%!
  • 34. Lessons ➔ Hand-crafted feature engineering is a potent tool. Critical for best-in-class solutions for image retrieval, tagging and more. ➔ It can be cheaper & quicker than architecture engineering. You can’t deep-learn your way out of everything. ➔ A good way to think up features is to retrace your own intermediary cognitive steps. ➔ Find data-scientists who are willing to do last-mile grunt work.
  • 36. Goal & Problem ➔ Package our model as an AI-as-a-service product offering. ➔ Load the model behind a metered firewall, and we’re good to go. Right ...? ➔ The service worked well for some customers. Failed miserably for the others.
  • 37. Cause Expectation of inference of missing data iPhone 7 128GB Black Apple iPhone 7 128GB Blackvs.
  • 38. { “name” : “Apple iPhone 7 128 gig Black”, “image” : undef } Cause Significant variety of quality in our inputs. { “name” : “Apple iPhone 7 - 128GB - Black Unlocked”, “image” : “sem3-idn/image.jpg”, “description” : “This sleek …”, “features” : { "RAM Memory": "2 GB", "Has Touchscreen": "Y", "Has Bluetooth": "Y", "Processor Type": "A10 Fusion", "Has Flash": "Y", "Display Resolution": "3840 x 2160", "Display Technology": "LCD", "Backlight Type": "LED", "Operating System": "iOS 10", "Screen Size": "5.5"", "Assembled Product Weight": "6.63 oz", }, ... } vs.
  • 39. ➔ Moving from ML model → ML product isn’t easy. Algorithmic APIs are “non-deterministic” (vis-a-vis Stripe/Facebook APIs). ➔ Product design is crucial; PMs take note. ➔ Setting customer expectations is crucial; UX designers take note. ➔ Building (multiple) models resilient to different types of data in the last mile is crucial; data-scientists take note. Lessons
  • 40. Fin!