Product matching is the challenge of examining two different representations of retail products (think items that you see on e-commerce websites) and determining whether they both refer to the same product. Tackling this problem requires a mix of NLP (to deal with text data), computer vision (to deal with product images), ontology management and more (to ingest a host of other signals on offer).
I’ve been working on this problem in various capacities for a few years now at Semantics3. During this period, I’ve made a fair number of mistakes which in turn have taught me useful lessons about applying deep/machine learning in an industry setting.
During this talk, I’d like to walk you through 5 specific scenarios in which I attempted to achieve a specific goal in the context of product matching, but ran into an unexpected problem that threw a spanner in the works. I’ll then talk about the root cause that sprouted the problem in the first place and the lesson I learned having made this discovery. Where relevant, I’ll bring in examples from outside the retail domain to broaden the perspective offered.
The goal of the talk isn’t to provide a guidebook for solving the product matching problem — the goal is to give you insight into the ups and downs of working through a specific data-science problem, and in the process, delivering packaged lessons that you could potentially draw on in your own field of work.
9. Build a good dataset for training and validation
● Matches (1): Manually curated by humans
● Non-Matches (0): Semi-automated heuristic-based generation
Goal
15. ➔ Watch out for quirks in your training dataset, especially causal
vs. incidental relationships.
➔ Models don’t care about your problem; they only care about
minimizing loss.
➔ When working on your own custom problems, you can’t make
the assumption that your dataset is flawless (vis-a-vis
peer-reviewed standardized datasets).
Lessons
16. ➔ “Automated Inference on Criminality using Face Images” [Wu &
Zhang - Nov 2016]
➔ Identified criminality with 90% accuracy (AlexNet)
“[…] the angle θ from nose tip to two mouth corners is on average 19.6% smaller for criminals
than for non-criminals and has a larger variance. Also, the upper lip curvature ρ is on average
23.4% larger for criminals than for noncriminals. On the other hand, the distance d between
two eye inner corners for criminals is slightly narrower (5.6%) than for non-criminals.”
Aside
17. ➔ Teardown (Link)
◆ Bias towards collared shirts?
◆ Bias against younger people?
◆ Bias towards likelihood of conviction or criminality?
Aside
21. Find the odd one out:
1. Map of Arizona
2. Map of AR
3. Map of Arkansas
Cause
No underlying rule here
22. ➔ Sift out knowledge-based tasks from logic-based tasks.
➔ “Never mind a neural network; can a human with no prior
knowledge, educated on nothing but a diet of your training
dataset, solve the problem?”
➔ Spending hours poring over your dataset can be rewarding.
Lessons
24. Combine multiple models built on individual signals into a single
multimodal model
Goal
DECISION
25. Problem
Combined model only slightly better than the best individual model
Model Accuracy
Text Only X %
Image Only Y %
Image + Text max(X, Y) + %
26. ➔ Combined model had learned to only consider unimodal
features / the stronger of the two signals.
➔ It had failed to learn correlations between images and text.
➔ Since our text and image models had been pre-trained
separately, they’d learned isolated, unrelated representations.
Cause
How do we learn shared representations?
29. ➔ Check if your multimodal models have been able to learn
meaningful correlations / shared representations.
➔ If you want your network to develop a characteristic, explicitly
set an objective to achieve this goal (autoencoder example).
Lessons
31. ➔ Make a case to the team for replacing our hand-crafted
heuristic-based model with a machine-learning model.
➔ But in benchmark tests, for certain pockets of data, the simplistic
heuristic-based approach performed better!
Goal & Problem
32. For these pockets of data, one or more of the following was at play:
➔ Our training data wasn’t rich enough.
➔ Our model hadn’t been perfectly tuned.
➔ Our older hand-crafted features were surprisingly good.
Cause
34. Lessons
➔ Hand-crafted feature engineering is a potent tool. Critical for
best-in-class solutions for image retrieval, tagging and more.
➔ It can be cheaper & quicker than architecture engineering. You
can’t deep-learn your way out of everything.
➔ A good way to think up features is to retrace your own
intermediary cognitive steps.
➔ Find data-scientists who are willing to do last-mile grunt work.
36. Goal & Problem
➔ Package our model as an AI-as-a-service product offering.
➔ Load the model behind a metered firewall, and we’re good to
go. Right ...?
➔ The service worked well for some customers. Failed miserably
for the others.
39. ➔ Moving from ML model → ML product isn’t easy. Algorithmic APIs
are “non-deterministic” (vis-a-vis Stripe/Facebook APIs).
➔ Product design is crucial; PMs take note.
➔ Setting customer expectations is crucial; UX designers take note.
➔ Building (multiple) models resilient to different types of data in the
last mile is crucial; data-scientists take note.
Lessons