Challenges of hierarchical classification in e commerce

Pablo Montalvo
Rakuten Institute of Technology Paris
Rakuten Technology Conference
October 27, 2018

2
An international perspective from
Taiwan
Brazil
.com.br
France
.fr
Japan
.co.jp
Germany
.de
U.S.A.
.com
Image acquisition date: Sept. 29, 2018

3
SPORTS PHONES
Sportswear Shoes
Families
Categories
Sub-categories Running
pants
Running
shoes
Mountain
shoes
Catalog

4
More than 8
million books
Less than 1000
DVD players
Around 200.000
shoes
.fr
Categories
*Numbers chosen for
illustrative purposes.

5
READING (~10 Million)
Books (8 Million) Brochures (400)
.fr
Comics (3 Million) Textbooks (1 Million) Literature (Few)
Taxonomy tree is unbalanced
Categories
*Numbers chosen for
illustrative purposes.

6
For classification For recommender systems
- Harder to recommend
relevant items after
purchase of rare ones
- Will sometimes
recommend something
completely random,
confusing the user
Common product,
Prediction: shoes
User upload
Uncommon product,
Prediction: book
User upload

7
Vector of ~millions of
numbers (pixels)
Convolutional Neural Network
Red, green
and blue Low-level features
Embedding: Low-dimensional vector
(~ 100 numbers)
10
9
8
7
6
5
4
3
2
1

8
Representation Space
User purchases a
product
Recommend a
similar product
(close)
Unknown image is uploaded
Compute embedding
Shoe
The distances in the embedding space are representative of the
differences or similarities between images.
Classify same as its
neighbors

9
CLOTHING
SHOES BAGS
CLOTHING
BAGS
SHOES
We also want the inclusions in the embedding space to reflect the taxonomy of
the data
Sampling is explicit, helps imbalance
ELECTRONICS
PHONES
LAPTOPS

10
Hierarchical encapsulation
trained on dataset CIFAR-
100
- Small resolution images in
20 classes, 5 subclasses
each
- Clustering visible that
respects taxonomy

11
• Data is generally unbalanced and taxonomy can be messy in e-
commerce
• Sampling heuristics help (oversampling, frequency-weighting), but
not convenient to train and resources-heavy
• Taxonomy IS information. Take care of it! (Beware of
messy/unbalanced taxonomy by design)
• Predict products hierarchically is an unused inductive bias that
can help all derived applications (classification, recommendation)

Challenges of hierarchical classification in e commerce

Challenges of hierarchical classification in e commerce

Recommended

Recommended

More Related Content

More from Rakuten Group, Inc.

More from Rakuten Group, Inc. (20)

Recently uploaded

Recently uploaded (20)

Challenges of hierarchical classification in e commerce