Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Challenges of hierarchical classification in e commerce


Published on

With deep learning, we are now able to build embeddings from our data. An embedding is a vector of numbers that can represent image, text, or sound data. In the case of e-commerce, this is particularly relevant as the likeness of two products can be inferred from the similarity between their two embedding vectors. As such, this likeness is a crucial component of recommender systems. In e-commerce, the data has a taxonomy, and products are grouped in different categories and subcategories. This hierarchical structure of the data is an information in itself that can be used, but more often than not isn’t, in classifiers and machine learning systems. We will discuss common issues in e-commerce data and possible ways to alleviate some of them.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Challenges of hierarchical classification in e commerce

  1. 1. Pablo Montalvo Rakuten Institute of Technology Paris Rakuten Technology Conference October 27, 2018
  2. 2. 2 An international perspective from Taiwan Brazil France .fr Japan Germany .de U.S.A. .com Image acquisition date: Sept. 29, 2018
  3. 3. 3 SPORTS PHONES Sportswear Shoes Families Categories Sub-categories Running pants Running shoes Mountain shoes Catalog
  4. 4. 4 More than 8 million books Less than 1000 DVD players Around 200.000 shoes .fr Categories *Numbers chosen for illustrative purposes.
  5. 5. 5 READING (~10 Million) Books (8 Million) Brochures (400) .fr Comics (3 Million) Textbooks (1 Million) Literature (Few) Taxonomy tree is unbalanced Categories *Numbers chosen for illustrative purposes.
  6. 6. 6 For classification For recommender systems - Harder to recommend relevant items after purchase of rare ones - Will sometimes recommend something completely random, confusing the user Common product, Prediction: shoes User upload Uncommon product, Prediction: book User upload
  7. 7. 7 Vector of ~millions of numbers (pixels) Convolutional Neural Network Red, green and blue Low-level features Embedding: Low-dimensional vector (~ 100 numbers) 10 9 8 7 6 5 4 3 2 1
  8. 8. 8 Representation Space User purchases a product Recommend a similar product (close) Unknown image is uploaded Compute embedding Shoe The distances in the embedding space are representative of the differences or similarities between images. Classify same as its neighbors
  9. 9. 9 CLOTHING SHOES BAGS CLOTHING BAGS SHOES We also want the inclusions in the embedding space to reflect the taxonomy of the data Sampling is explicit, helps imbalance ELECTRONICS PHONES LAPTOPS
  10. 10. 10 Hierarchical encapsulation trained on dataset CIFAR- 100 - Small resolution images in 20 classes, 5 subclasses each - Clustering visible that respects taxonomy
  11. 11. 11 • Data is generally unbalanced and taxonomy can be messy in e- commerce • Sampling heuristics help (oversampling, frequency-weighting), but not convenient to train and resources-heavy • Taxonomy IS information. Take care of it! (Beware of messy/unbalanced taxonomy by design) • Predict products hierarchically is an unused inductive bias that can help all derived applications (classification, recommendation)