Boosting Product Categorization with Machine Learning Models

Amadeus Magrabi
@amadeusmagrabi
BOOSTING PRODUCT CATEGORIZATION 
WITH MACHINE LEARNING

…
11/2017 2
Company:
Customers: People who want to sell something online
…
Main product: REST API to manage online shops

11/2017 3
User Interface
Company:

11/2017 4
Goal: Use machine learning to automatically
recommend categories for products
Machine Learning for Category Recommendations
Fashion
Men Women
Sports
Shoes Pants
Business

11/2017
5
Challenge: Every online store has a different category structure.
Challenges
Fashion
Men Women
Jeans
Clothing
Pants Shirts Shoes
Store 1 Store 2

11/2017
6
Challenges
Fashion
Men Women
Jeans
Clothing
Pants Shirts Shoes
Store 1 Store 2 Store 3
Model 1 Model 2 Model 3
predictpredictpredict
Option 1: multiple store-specific models
Store 1 Store 2

11/2017
7
Challenges
Fashion
Men Women
Jeans
Clothing
Pants Shirts Shoes
General Categories
Model 1
predict
match
Store 3Store 2Store 1
match
match
Option 2: one general model
Store 1 Store 2

11/2017
8
Challenges
Fashion
Men Women
Jeans
Clothing
Pants Shirts Shoes
General Categories
Model 1
predict
match
match
match
Option 2: one general model
• Better accuracies for  
stores with very specific 
categories
• No category matching  
necessary
• More data-per-model
• More flexible
• Easier to deploy
• Also works for stores 
with little data
• Can also recommend  
categories that are not
yet defined in the store
Store 1 Store 2

11/2017 9
Challenge: Product data is diverse and unbalanced, which 
complicates feature selection.
Challenges
Approach:
→ Focus on features names, images and descriptions
• carry most information
• available for most products
• Product names
• Images
• Prices
• Descriptions
• Sizes
• Brands
• Colors
• Expiration Dates
• …

11/2017 10
Challenge: Very large class set
• Amazon/Ebay have listed 50000+ categories
• Tradeoff: Coverage vs. Accuracy
Challenges
Approach: 
→ select broad model categories 
to cover main use cases
→ rely on category matching procedure  
to catch more specialized categories
→ current version has a selection of 
723 model categories

11/2017 11
Overview of Approach
723 General Categories
Name Model Description Model
predict
match
predict
match
match
Image Model
predict

11/2017 12
• Model: Convolutional Neural Network (Deep Learning)
• Similar to mechanisms in the brain: Idea of building complex
representations by combining simple representations
Model for Product Images

11/2017 13
• Model: Convolutional Neural Network (Deep Learning)
• Similar to mechanisms in the brain: Idea of building complex
representations by combining simple representations
• Trained via transfer learning on famous image recognition network
Inception v3 (TensorFlow, Google Cloud ML Engine)
Model for Product Images

11/2017 14
Preprocessing: (spacy, re, gensim, Google Translate, pyenchant)
• spellchecker
• translation
• tokenization
• normalization
• lemmatization
• phrasing
• word removal
Model for Product Names
Examples:
“Mens Heavyweight 6.1-ounce, 100% cotton T-Shirts in Regular, Big and Tall Sizes”
“Gala Apples Fresh Fruit, 3 LB Bag”
“Carhartt Men's Maddock Pocket T-Shirt Size M”
“Samsung SM-G900V - Galaxy S5 - 16GB Android Smartphone Verizon + GSM - Black”
(smartwathc → smartwatch)
(German → English)
(complete names → words)
(lowercasing, deleting special characters)
(apples → apple)
(louis vuitton → louis_vuitton)
(stop words, blacklist)

11/2017 15
Preprocessing: (spacy, re, gensim, Google Translate, pyenchant)
• spellchecker
• translation
• tokenization
• normalization
• lemmatization
• phrasing
• word removal
Examples:
“Mens Heavyweight 6.1-ounce, 100% cotton T-Shirts in Regular, Big and Tall Sizes”
“Gala Apples Fresh Fruit, 3 LB Bag”
“Carhartt Men's Maddock Pocket T-Shirt Size M”
“Samsung SM-G900V - Galaxy S5 - 16GB Android Smartphone Verizon + GSM - Black”
(smartwathc → smartwatch)
(German → English)
(complete names → words)
(lowercasing, deleting special characters)
(apples → apple)
(louis vuitton → louis_vuitton)
(stop words, blacklist)
Models: (scikit-learn)
• Logistic Regression
• Naive Bayes
• Random Forest
• XGBoost
• Support Vector Machine

11/2017 16
Vectorization methods: (text → numbers) 
bag-of-words:
• Simple approach, but sparse representation and blind to context

11/2017 17
bag-of-words:
tf-idf:
• Similar to bag-of-words, but weighs words higher when they  
do not occur frequently in dataset
• Intuition: “the” has less predictive value than “iPhone”
• TF(w) = (number of times word appears in name) / (total number of words in name)
• IDF(w) = log_e(total number of names / number of names with word w in it)

11/2017 18
bag-of-words:
word2vec:
• Trains two-layer neural network that predicts 
context words of a word
• Results in a dense and context-sensitive  
representation
tf-idf:
• Similar to bag-of-words, but weighs words higher when they  
do not occur frequently in dataset
• Intuition: “the” has less predictive value than “iPhone”
• TF(w) = (number of times word appears in name) / (total number of words in name)
• IDF(w) = log_e(total number of names / number of names with word w in it)

11/2017 19
Model for Product Names  
Model for Product Descriptions
Model for Product Descriptions

11/2017 20
Category Matching
Model categories are matched to store-specific categories via a word2vec model trained on a news dataset
word2vec 
similarity
predict
match
predict
match
match
Image Model
predictaveraging 
class 
probabilities

11/2017 21
REST API
General API

11/2017 22
REST API
Store-Specific API

11/2017 24
Thank you!
Amadeus Magrabi
@amadeusmagrabi
amadeus.magrabi@commercetools.com
word2vec 
similarity
predict
match
predict
match
match
Image Model
predictaveraging 
class 
probabilities
Tensorﬂow, Inception tf-idf, LogReg tf-idf, LogReg

Boosting Product Categorization with Machine Learning Models

Recommended

Recommended

More Related Content

Similar to Boosting Product Categorization with Machine Learning Models

Similar to Boosting Product Categorization with Machine Learning Models (20)

More from Dataconomy Media

More from Dataconomy Media (20)

Recently uploaded

Recently uploaded (20)

Boosting Product Categorization with Machine Learning Models