Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Webinar - Fraud Detection - Palombo (20160428)
Next
Download to read offline and view in fullscreen.

Share

Webinar - Product Matching - Palombo (20160428)

Download to read offline

Presented by Alon Palombo

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Webinar - Product Matching - Palombo (20160428)

  1. 1. Dato Confidential1 Fraud Detection Webinar Alon Palombo Data Scientist alon@dato.com Product Matching Webinar
  2. 2. Dato Confidential2 Agenda • Who is Dato? • Data science workflow • What is product matching? • Demo using real public data • Questions
  3. 3. Dato Confidential3 Dato: We Intelligent Applications 45+ and growing fast!
  4. 4. Dato Confidential4 Customers
  5. 5. Dato Confidential Data Science workflow Ingest Transform Model Deploy Unstructured Data
  6. 6. Dato Confidential6 What is product matching? • In 2016, global e-commerce sales are expected to reach $1.92 Trillion. • Online retailers and price comparison sites curate product catalogues by aggregating from multiple sources. • Product matching is the task of keeping these catalogues free of duplicates, full of attributes per product, and consistent across different sites. 6
  7. 7. Dato Confidential Difficulty 7 Structured Attributes Reviews Images Description Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010. {Aggregate Multiple Sources
  8. 8. Dato Confidential Definition • Ironically, there are similar names for very similar problems: • Entity resolution • Record linking • De-duplication • Reference reconciliation • Data matching • and more… 8
  9. 9. Dato Confidential Definition • In GraphLab Create we distinguish between Record Linkage and De-duplication. • Record Linkage refers to matching structured query records to a fixed set of reference records with the same schema. • De-duplication refers to assigning an entity label to each row. Records with the same label are likely correspond to the same real-world entity. 9
  10. 10. Dato Confidential Product matching demo – using real public data
  11. 11. Dato Confidential11 Summary • Product matching is at the heart of e-commerce. • Many relevant similar problems with similar solutions. • Easy exploration, modeling, and evaluation using GraphLab Create.
  12. 12. Dato Confidential12 Our machine learning course https://www.coursera.org/learn/ml-foundations
  13. 13. Dato Confidential Questions? alon@dato.com

Presented by Alon Palombo

Views

Total views

328

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

15

Shares

0

Comments

0

Likes

0

×