Machine Learning (ML) for
eCommerce and Retail
Dr. Andrei Lopatenko
Director of Engineering,
Recruit Institute of Technology
Recruit Holdings
former Walmart Labs, Google (twice), Apple (twice)
andrei@recruit.ai
ML for eCommerce
• Search, Browse, for commerce sites and
application
• Help users to find and discover items they
will purchase
• Maximize revenue/profit per user session
Search
Search - ranking
ranking
Search - LHN
Left
Hand
Navigation
Search spell correction
Search type ahead
Browse
Search data size
• Catalogue items
• 8 M items now compare ~ 400 M
Amazon / eBay
• X 10 in near future
• 2 K text description per item + images
• Several hundreds of structured attributes
per catalog
Search – user searches
• Tens of millions per day
• Tens billions session per year
• Online sales 13.2 B per year (http://
fortune.com/2015/11/17/walmart-
ecommerce/)
• 500B per year sales offline stories (8% USA
economy) in ~ 11K stores
• The number of transactions ~ 10B (public
data)
ML addressable problems
• Learning to rank
• Given a query, what’s the list of items
with the highest probability of conversion
(purchase), ATC (add to card), page view
ML addressable problems
• Typeahead
• Given a sequence of characters types by
user, what’s most probably competitions,
what are most probable items users wants
to buy
ML addressable problems
• Spell correction
• Given a user query, what’s the query user
actually wanted to type
ML addressable problems
• Cold start
• Given a new items with it’s set of
attributes and no history of sales or
exposure on site, predict items sales and
item sales per query
ML addressable problems
• Prediction of LHN
• Given a user query, what’s the best set of
facet and facet values, which gives higher
probability of users interacting with them
and finally buying an item
ML addressable problems
• Query understanding
• Given a query, build a semantic parse of
query, tag tokens with attributes: blue
tshirts for teenagers -> blue:color
tshirts:type for:opt
teenagers:agerestriction10-20
• Classification: blue tshirts for teenagers: -
> type:apparel, price preference: 10-30,
releaseyearpreference: 2014-2016
ML addressable problems
• Related searches
• Given a query, what are queries which are
either semantically close to this one, or
represent coincidental users interests
• Nike shoes -> adidas shoes, sport shoes,
• Coffee mugs -> travel mugs, photo coffee
mugs, cappuccino cups
ML addressable problems
• product discovery
• help users to explore product assortment,
• drive users to diverse products
• reduce risk of selecting irrelevant items
• help to find price,quality,brand etc
alternatives
• reduce pigeonhole risk
• provide relevant data to make a decision
ML addressable problems
• Image similarity
• Given images of the items, give other
items such that images of those are
visually appealing to the users which like
the original item (appealing by shape?
Color? Texture?) -> causing high conversion
in recommendation
ML addressable problems
• Voice search
• Given voice input, reply with a list of the
best items
• “what are the cheapest samsung tvs in the
store”
• “what is best deal on queen bed today?”
ML addressable problems
• extraction of item attributes
• Given an item: what are item attributes:
brand, color, size (wheel, screen, height,
S/M/XL, Queen/Twin/King/Full), Gender,
Pattern, Shape, Features
ML addressable problems
• Representations of users : actions on
websites/apps -> searches, clicks,
browsing behaviour, product -> purchase
preferences, reviews, ratings, return rates
ML addressable problems
• title generation: how to generate the title
which will cause maximum conversion
rate
• which product attributes select for the
title?
What makes a good title?
What makes a good title?
Limits
• Most models should be served in
production
• 50ms on prediction
• Part of big system, memory limits ~ 10G
Retail
Retail
• Key directions which require machine
learning:
• discounting tools
• coupons and rewards
• loyalty
• inventory management
Inventory management
• Customer want to buy products
• Customers have diverse needs
• Products should be in stock, ideally in
warehouses close to customers
• but it’s expensive to store products
• Problem: How many products of each type
should be stored, when product supply
should be refilled?
Questions?
• andrei@recruit.ai

Machine Learning for retail and ecommerce

  • 1.
    Machine Learning (ML)for eCommerce and Retail Dr. Andrei Lopatenko Director of Engineering, Recruit Institute of Technology Recruit Holdings former Walmart Labs, Google (twice), Apple (twice) andrei@recruit.ai
  • 2.
    ML for eCommerce •Search, Browse, for commerce sites and application • Help users to find and discover items they will purchase • Maximize revenue/profit per user session
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Search data size •Catalogue items • 8 M items now compare ~ 400 M Amazon / eBay • X 10 in near future • 2 K text description per item + images • Several hundreds of structured attributes per catalog
  • 10.
    Search – usersearches • Tens of millions per day • Tens billions session per year • Online sales 13.2 B per year (http:// fortune.com/2015/11/17/walmart- ecommerce/) • 500B per year sales offline stories (8% USA economy) in ~ 11K stores • The number of transactions ~ 10B (public data)
  • 11.
    ML addressable problems •Learning to rank • Given a query, what’s the list of items with the highest probability of conversion (purchase), ATC (add to card), page view
  • 12.
    ML addressable problems •Typeahead • Given a sequence of characters types by user, what’s most probably competitions, what are most probable items users wants to buy
  • 13.
    ML addressable problems •Spell correction • Given a user query, what’s the query user actually wanted to type
  • 14.
    ML addressable problems •Cold start • Given a new items with it’s set of attributes and no history of sales or exposure on site, predict items sales and item sales per query
  • 15.
    ML addressable problems •Prediction of LHN • Given a user query, what’s the best set of facet and facet values, which gives higher probability of users interacting with them and finally buying an item
  • 16.
    ML addressable problems •Query understanding • Given a query, build a semantic parse of query, tag tokens with attributes: blue tshirts for teenagers -> blue:color tshirts:type for:opt teenagers:agerestriction10-20 • Classification: blue tshirts for teenagers: - > type:apparel, price preference: 10-30, releaseyearpreference: 2014-2016
  • 17.
    ML addressable problems •Related searches • Given a query, what are queries which are either semantically close to this one, or represent coincidental users interests • Nike shoes -> adidas shoes, sport shoes, • Coffee mugs -> travel mugs, photo coffee mugs, cappuccino cups
  • 18.
    ML addressable problems •product discovery • help users to explore product assortment, • drive users to diverse products • reduce risk of selecting irrelevant items • help to find price,quality,brand etc alternatives • reduce pigeonhole risk • provide relevant data to make a decision
  • 19.
    ML addressable problems •Image similarity • Given images of the items, give other items such that images of those are visually appealing to the users which like the original item (appealing by shape? Color? Texture?) -> causing high conversion in recommendation
  • 20.
    ML addressable problems •Voice search • Given voice input, reply with a list of the best items • “what are the cheapest samsung tvs in the store” • “what is best deal on queen bed today?”
  • 21.
    ML addressable problems •extraction of item attributes • Given an item: what are item attributes: brand, color, size (wheel, screen, height, S/M/XL, Queen/Twin/King/Full), Gender, Pattern, Shape, Features
  • 22.
    ML addressable problems •Representations of users : actions on websites/apps -> searches, clicks, browsing behaviour, product -> purchase preferences, reviews, ratings, return rates
  • 23.
    ML addressable problems •title generation: how to generate the title which will cause maximum conversion rate • which product attributes select for the title?
  • 24.
    What makes agood title?
  • 25.
    What makes agood title?
  • 26.
    Limits • Most modelsshould be served in production • 50ms on prediction • Part of big system, memory limits ~ 10G
  • 27.
  • 28.
    Retail • Key directionswhich require machine learning: • discounting tools • coupons and rewards • loyalty • inventory management
  • 29.
    Inventory management • Customerwant to buy products • Customers have diverse needs • Products should be in stock, ideally in warehouses close to customers • but it’s expensive to store products • Problem: How many products of each type should be stored, when product supply should be refilled?
  • 30.