SlideShare a Scribd company logo
1 of 16
Download to read offline
Amadeus Magrabi

Data Scientist
MACHINE LEARNING PROJECTS @
COMMERCETOOLS
All Rights Reserved © 2017 2
Overview
• Category Recommendations
• Detecting Similar Products (duplicate detection)
• Attribute Normalization
• Detecting Anomalous Orders (fraud detection)
• Detecting Missing Data (attributes, prices, images)
All Rights Reserved © 2017 3
Category Recommendations
• Goal: Predict which categories fit to a given product based on product images,
names or descriptions.
• Two API versions:
• Project-specific API recommends only categories defined in a particular
commercetools project.
• General API recommends from a broad set of categories for any image, name or
description.
• Tech: Convolutional neural networks (tensorflow), transfer learning (Inception v3),
natural language processing (spacy), tf-idf (scikit-learn), word2vec (gensim), logistic
regression (scikit-learn), microservices (flask), Google Cloud Compute Engine
All Rights Reserved © 2017 4
Category Recommendations
General API
All Rights Reserved © 2017 5
Category Recommendations
Project-specific API
All Rights Reserved © 2017 6
Category Recommendations
User Interface Integration
All Rights Reserved © 2017 7
Product Similarity
• Goal: Identify the most similar products, either within a project or between two
projects.
• Use cases:
• Detect and clean up duplicate products.
• Product matching: Check whether a product in one project already exists in
another project.
• Use information about product similarity to improve search engine optimization
(e.g. make product descriptions more unique).
• Analyze how similar two projects are.
• Tech: Convolutional neural networks (keras, ResNet), cosine similarity (numpy),
numeric scaling (scikit-learn), string matching (fuzzywuzzy), tf-idf (scikit-learn)
All Rights Reserved © 2017 8
Product Similarity
/similarities/products/example-store-name?region=EU&staged=true&similarityMeasures=name
All Rights Reserved © 2017 9
Product Similarity
/similarities/products/example-store-name-2?region=EU&staged=true

&similarityMeasures=name,image,variantNumber
All Rights Reserved © 2017 10
Attribute Normalization
• Goal:
• Attribute values can be quite inconsistent when projects have low data quality
(e.g. lowercase vs. uppercase-style, occasional spelling mistakes, inconsistent
use of abbreviations, etc.).
• This API predicts how attribute values can be normalized to match a cleanly
defined set.

• Tech: tf-idf (scikit-learn), cosine similarity (numpy), affinity propagation (scikit-learn)
All Rights Reserved © 2017 11
Attribute Normalization
/normalizations/attributes/example-store?attributeName=sizes&attributeValueSet=xs,s,m,l,xl,xxl
All Rights Reserved © 2017 12
Missing Data Analysis
• Goal: Direct attention of merchants to products with a lot of missing data.
• Currently supported:
• how many attributes of the corresponding product type are covered in a
product and whether they contain valid attribute values
• whether product images are missing (takes into account how many images per
product are common in a project)
• whether prices are defined and and still valid for selected time frames

• Planned extension:
• Not just detect missing data, but also automatically recommend how it should
be filled.
All Rights Reserved © 2017 13
Missing Data Analysis
/missing-data/attributes/example-store?staged=false&region=EU&productSetLimit=5000&limit=2
All Rights Reserved © 2017 14
Order Anomalies
• Goal: Detect any unusual orders that should be checked for potential fraud.
• Currently supported cases:
• Unusual total cost of an order
• Unusual number of products in an order
• Unusual time between orders of the same user
• Unusual amount of orders of the same user
• Machine learning makes sure that the context of individual projects is
automatically taken into account when checking for unusual cases (e.g. orders in a
grocery store and a luxury jewelry store naturally have a very different pattern).

• Tech: IsolationForest (scikit-learn)
All Rights Reserved © 2017 15
Order Anomalies
/anomalies/orders/example-store?region=EU&orderSetLimit=10000
All Rights Reserved © 2016 16
Thank you!
techblog.commercetools.com
amadeus.magrabi@commercetools.com
www.commercetools.com

More Related Content

What's hot

Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 

What's hot (12)

Well, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine LearningWell, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
Well, That Escalated Quickly: Anomaly Detection with Elastic Machine Learning
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Edmc use cases 2018 nyc
Edmc use cases 2018   nycEdmc use cases 2018   nyc
Edmc use cases 2018 nyc
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
State street edmc swaps pilot
State street edmc swaps pilotState street edmc swaps pilot
State street edmc swaps pilot
 
InfiniteGraph
InfiniteGraphInfiniteGraph
InfiniteGraph
 
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
 
Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1Tuning for Systematic Trading: Talk 1
Tuning for Systematic Trading: Talk 1
 
Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategyTuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy
 

Similar to Machine Learning Projects @ commercetools

ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
Dr. Haxel Consult
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
Sandesh Rao
 

Similar to Machine Learning Projects @ commercetools (20)

ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde..."Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
"Deep Learning Beyond Cats and Cars: Developing a Real-life DNN-based Embedde...
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and CarsPractical Artificial Intelligence: Deep Learning Beyond Cats and Cars
Practical Artificial Intelligence: Deep Learning Beyond Cats and Cars
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Graph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraudGraph representation learning to prevent payment collusion fraud
Graph representation learning to prevent payment collusion fraud
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Using Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROIUsing Machine Learning to Understand and Predict Marketing ROI
Using Machine Learning to Understand and Predict Marketing ROI
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 

Recently uploaded

4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
mikehavy0
 
MARCOM Proposal for ACFC Vietnam May 2024
MARCOM Proposal for ACFC Vietnam May 2024MARCOM Proposal for ACFC Vietnam May 2024
MARCOM Proposal for ACFC Vietnam May 2024
William Do
 
Preview-Product -ZenBasket your ecommerce solution
Preview-Product -ZenBasket your ecommerce solutionPreview-Product -ZenBasket your ecommerce solution
Preview-Product -ZenBasket your ecommerce solution
Deborahnich
 

Recently uploaded (9)

4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
4. ☎️Looking for Abortion Pills? Contact +27791653574.. 💊💊Available in Muscat...
 
Steel door malappuram contact number and price list with catalogue
Steel door malappuram contact number and price list with catalogueSteel door malappuram contact number and price list with catalogue
Steel door malappuram contact number and price list with catalogue
 
MARCOM Proposal for ACFC Vietnam May 2024
MARCOM Proposal for ACFC Vietnam May 2024MARCOM Proposal for ACFC Vietnam May 2024
MARCOM Proposal for ACFC Vietnam May 2024
 
Netherlands Holland Stock CAS 5449-12-7 BMK Glycidic
Netherlands Holland Stock CAS 5449-12-7 BMK GlycidicNetherlands Holland Stock CAS 5449-12-7 BMK Glycidic
Netherlands Holland Stock CAS 5449-12-7 BMK Glycidic
 
Supermarket Floral Ad Roundup- Week 20 2024.pdf
Supermarket Floral Ad Roundup- Week 20 2024.pdfSupermarket Floral Ad Roundup- Week 20 2024.pdf
Supermarket Floral Ad Roundup- Week 20 2024.pdf
 
Preview-Product -ZenBasket your ecommerce solution
Preview-Product -ZenBasket your ecommerce solutionPreview-Product -ZenBasket your ecommerce solution
Preview-Product -ZenBasket your ecommerce solution
 
Jakub Krolikowski (Mirakl), Deja Horvat Zupanc (Big Bang).pdf
Jakub Krolikowski (Mirakl), Deja Horvat Zupanc (Big Bang).pdfJakub Krolikowski (Mirakl), Deja Horvat Zupanc (Big Bang).pdf
Jakub Krolikowski (Mirakl), Deja Horvat Zupanc (Big Bang).pdf
 
CAS 5449-12-7 BMK Powder High quality seller
CAS 5449-12-7 BMK Powder High quality sellerCAS 5449-12-7 BMK Powder High quality seller
CAS 5449-12-7 BMK Powder High quality seller
 
Báo cáo EBI về thương mại điện tử 2024 ENG.pdf
Báo cáo EBI về thương mại điện tử 2024 ENG.pdfBáo cáo EBI về thương mại điện tử 2024 ENG.pdf
Báo cáo EBI về thương mại điện tử 2024 ENG.pdf
 

Machine Learning Projects @ commercetools

  • 1. Amadeus Magrabi
 Data Scientist MACHINE LEARNING PROJECTS @ COMMERCETOOLS
  • 2. All Rights Reserved © 2017 2 Overview • Category Recommendations • Detecting Similar Products (duplicate detection) • Attribute Normalization • Detecting Anomalous Orders (fraud detection) • Detecting Missing Data (attributes, prices, images)
  • 3. All Rights Reserved © 2017 3 Category Recommendations • Goal: Predict which categories fit to a given product based on product images, names or descriptions. • Two API versions: • Project-specific API recommends only categories defined in a particular commercetools project. • General API recommends from a broad set of categories for any image, name or description. • Tech: Convolutional neural networks (tensorflow), transfer learning (Inception v3), natural language processing (spacy), tf-idf (scikit-learn), word2vec (gensim), logistic regression (scikit-learn), microservices (flask), Google Cloud Compute Engine
  • 4. All Rights Reserved © 2017 4 Category Recommendations General API
  • 5. All Rights Reserved © 2017 5 Category Recommendations Project-specific API
  • 6. All Rights Reserved © 2017 6 Category Recommendations User Interface Integration
  • 7. All Rights Reserved © 2017 7 Product Similarity • Goal: Identify the most similar products, either within a project or between two projects. • Use cases: • Detect and clean up duplicate products. • Product matching: Check whether a product in one project already exists in another project. • Use information about product similarity to improve search engine optimization (e.g. make product descriptions more unique). • Analyze how similar two projects are. • Tech: Convolutional neural networks (keras, ResNet), cosine similarity (numpy), numeric scaling (scikit-learn), string matching (fuzzywuzzy), tf-idf (scikit-learn)
  • 8. All Rights Reserved © 2017 8 Product Similarity /similarities/products/example-store-name?region=EU&staged=true&similarityMeasures=name
  • 9. All Rights Reserved © 2017 9 Product Similarity /similarities/products/example-store-name-2?region=EU&staged=true
 &similarityMeasures=name,image,variantNumber
  • 10. All Rights Reserved © 2017 10 Attribute Normalization • Goal: • Attribute values can be quite inconsistent when projects have low data quality (e.g. lowercase vs. uppercase-style, occasional spelling mistakes, inconsistent use of abbreviations, etc.). • This API predicts how attribute values can be normalized to match a cleanly defined set.
 • Tech: tf-idf (scikit-learn), cosine similarity (numpy), affinity propagation (scikit-learn)
  • 11. All Rights Reserved © 2017 11 Attribute Normalization /normalizations/attributes/example-store?attributeName=sizes&attributeValueSet=xs,s,m,l,xl,xxl
  • 12. All Rights Reserved © 2017 12 Missing Data Analysis • Goal: Direct attention of merchants to products with a lot of missing data. • Currently supported: • how many attributes of the corresponding product type are covered in a product and whether they contain valid attribute values • whether product images are missing (takes into account how many images per product are common in a project) • whether prices are defined and and still valid for selected time frames
 • Planned extension: • Not just detect missing data, but also automatically recommend how it should be filled.
  • 13. All Rights Reserved © 2017 13 Missing Data Analysis /missing-data/attributes/example-store?staged=false&region=EU&productSetLimit=5000&limit=2
  • 14. All Rights Reserved © 2017 14 Order Anomalies • Goal: Detect any unusual orders that should be checked for potential fraud. • Currently supported cases: • Unusual total cost of an order • Unusual number of products in an order • Unusual time between orders of the same user • Unusual amount of orders of the same user • Machine learning makes sure that the context of individual projects is automatically taken into account when checking for unusual cases (e.g. orders in a grocery store and a luxury jewelry store naturally have a very different pattern).
 • Tech: IsolationForest (scikit-learn)
  • 15. All Rights Reserved © 2017 15 Order Anomalies /anomalies/orders/example-store?region=EU&orderSetLimit=10000
  • 16. All Rights Reserved © 2016 16 Thank you! techblog.commercetools.com amadeus.magrabi@commercetools.com www.commercetools.com