Data Science workflow
Ingest Transform Model Deploy
What is product matching?
• In 2016, global e-commerce sales are expected to reach
• Online retailers and price comparison sites curate product
catalogues by aggregating from multiple sources.
• Product matching is the task of keeping these catalogues
free of duplicates, full of attributes per product, and
consistent across different sites.
Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010.
• Ironically, there are similar names for very similar
• Entity resolution
• Record linking
• Reference reconciliation
• Data matching
• and more…
• In GraphLab Create we distinguish between Record
Linkage and De-duplication.
• Record Linkage refers to matching structured query records
to a fixed set of reference records with the same schema.
• De-duplication refers to assigning an entity label to each
row. Records with the same label are likely correspond to
the same real-world entity.
Product matching demo – using real public
• Product matching is at the heart of e-commerce.
• Many relevant similar problems with similar solutions.
• Easy exploration, modeling, and evaluation using
Our machine learning course