Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five

Tensorized DPP for Basket
Completion
September, 27th 2018Romain WARLOP
With Jérémie Mary (Criteo) and Mike Gartrell (Criteo)

The objective of basket completion is to suggest to a user one or several items
according to items already in her cart
Associative Classifier DPP
Definition (Confidence rule)
𝑐𝑜𝑛𝑓 𝐴 → 𝐵 =
𝑠𝑢𝑝𝑝(𝐴 ∪ 𝐵)
𝑠𝑢𝑝𝑝(𝐴)
Definition (Lift rule)
𝑙𝑖𝑓𝑡 𝐴 → 𝐵 =
𝑠𝑢𝑝𝑝(𝐴 ∪ 𝐵)
𝑠𝑢𝑝𝑝 𝐴 𝑠𝑢𝑝𝑝(𝐵)
Let 𝐴, 𝐵 be set of items
Past baskets are then analyzed to compute all possible
confidence and lift.
A minimum support threshold, confidence threshold and
lift threshold are then define. All rules 𝐴 → 𝐵 that satisfy
three condition are selected for recommendation
Very heavy computation
Not scalable to large catalog
Kernel matrix
containing
item-item
similarity
item catalog
itemcatalog
𝐿 =
Let 𝐿 be the kernel matrix associated with the DPP
𝐿 defined a discrete DPP such that the probability to
observe the set 𝐴 is proportional to det 𝐿 𝐴 with 𝐿 𝐴 the
principal submatrix of 𝐿 indexed by items in 𝐴
𝑝 𝐴 =
det 𝐿 𝐴
det(𝐿 + 𝐼)
DPP are suitable to model co-
purchase probability
For a long time, associative classifiers have been the state-of-the-art for basket completion until Determinantal Point
Processes (DPPs) show significant improvement. One can also add constraints (e.g. lower price, different category) to classic CF
solutions.

Multiple reasons make DPP relevant for basket completion
Quadratic number of parameters in the number of items 𝑝 while the number of sets grows
exponentially with 𝑝
Entries of the matrix measure similarity between items
Enforce diversity in the sampled set
det 𝐿{1,2} = det
𝐿11 𝐿12
𝐿21 𝐿22
= 𝐿11 𝐿22 − 𝐿12
2
item 1 popularity
correlation between
items 1 and 2

Assuming a low-rank constraint on the kernel matrix allows fast training
[Gartrell et al., 2017]
Let 𝒑 be the number of items in the catalog. We assume that the kernel matrix associated with the DPP is
low-rank of rank 𝐾.
Thus there exist a matrix 𝑽 ∈ ℝ 𝒑×𝑲
such that
Learning
Let ℬ = ℬ1, ⋯ , ℬ 𝑀 a collection of observed baskets – that is subsets of items.
Maximizing the regularized log-likelihood by gradient descent allows to estimate matrix 𝑉
𝑓 𝑉 =
𝑚=1
𝑀
log 𝑝(ℬ 𝑚|𝑉) −
𝛼
2
𝑖=1
𝑝
𝜆𝑖 𝑉𝑖
2
=
𝑚=1
𝑀
log det(𝐿[𝑚]) − 𝑀 log det(𝐿 + 𝐼) −
𝛼
2
𝑖=1
𝑝
𝜆𝑖 𝑉𝑖
2
inversely proportional to
item popularity
𝐿 = 𝑉𝑉 𝑇

Pros
• Efficient learning
• Low memory (𝑝 × 𝐾 coefficient to store)
• Fast prediction
• Scalable to large datasets
Cons
• Baskets larger than 𝐾 have probability 0 by construction
• Model the probability to buy a set of product together
instead of the relevance of the additional item

1. Each target item, noted 𝜏, is model by its own kernel 𝑳 𝝉 ∈ ℝ 𝒑×𝒑
2. Item bias captured in a separate diagonal matrix
3. All those kernels form a cubic tensor 𝑳 ∈ ℝ 𝑝×𝑝×𝑝
which is assumed to be low rank
4. Conversion probability is obtained by applying a logistic like function
We introduce a logistic tensorized extension to low-rank DPP
Goal Directly model the relevance of buying an additional product instead of global coherence of the set

𝐿 𝜏 = 𝑉𝑅 𝜏
2
𝑉 𝑇
+ D2
We introduce a logistic tensorized extension to low-rank DPP
insure to have a valid kernel
Target 𝜏 kernel DPP Basket items latent
factors, common to
all tasks
Basket items biasTarget 𝜏
latent factors
𝑝(𝑦𝜏|ℬ) = 𝜙 ℬ y 𝜏 1 − 𝜙 ℬ
1−𝑦 𝜏
𝜙 ℬ = 1 − 𝑒−𝑤 𝑑𝑒𝑡 𝐿ℬ
= 𝜎(𝑤 𝑑𝑒𝑡 𝐿ℬ)
scaling parameter
Goal Directly model the relevance of buying an additional product instead of global coherence of the set

fifty-five confidential and proprietary 8
We validated our approach on four real world datasets
Unordered baskets Ordered baskets
• Amazon Baby Registries
• Diaper category: 100 products, 10k baskets, 2.4
products/basket
• Diaper+Apparel+Feedings: 3 disjoints categories,
300 products, 17k baskets, 2.6 products/basket
• Belgian Retail Supermarket
• 16,470 products, 88k baskets, 9.6 products/basket
• UK Retail
• 4,071 products, 22k baskets, 18.5 products/basket
• Some basket contains more than 100 products
• Instacart
• Ordered baskets
• Online grocery shopping dataset
• More than 200k users, 50k products, 3M baskets split
over three datasets: train, test, prior
• Filter out test and prior datasets, baskets with less
than 2 products and products that appeared less than
15 times
• Result: 10,531 products, 700k baskets

We adopt different testing protocols according to the type of baskets
one item is
remove at
random
training set: 70% of baskets
test set: remove one item at random, apply model on left
items. Compute performance on the removed item.
Three protocols
1
Remove one item at random.
For tensorized DPP, the removed item is the target
and is removed at random in both training and
test set.
2
Remove last added item.
For tensorized DPP, the removed item is the target
and is removed in both training and test set.
3
Only tensorized DPP
Training set: target chosen at random
Test set: target is the last added item
Unordered baskets Ordered baskets

And compared it with several baselines
• Our models
• Logistic DPP
• Multi Task DPP without bias (𝐷 ≡ 0)
• Multi Task DPP
• Baselines
• Poisson Factorization (PF): [Gopalan et al., 2013] is a probabilistic
matrix factorization model generally used for recommendation
applications with implicit feedback. one basket = one user.
• Recurrent Neural Network (RNN): [Hidasi et al., 2016] adapted for
session-based recommendations.
• Factorization Machines (FM): [Rendle, 2010] is a general approach
that models 𝑑th-order interactions using low-rank assumptions.
Usually 𝑑 = 2. one basket = one user.
• Low-Rank DPP: [Gartrell et al., 2017].
• Bayesian Low-Rank DPP: [Gartrell et al., 2016] Bayesian learning
of the low-rank DPP model.
• Associative Classifier

Model performance is evaluated according to Mean Percentile Rank
and Precision@k
=
All items in catalog,
and not in the
basket, sorted from
the most likely to the
less likely
Precision@kMean Percentile Rank
Percentile
rank of left
item
Percentile
rank of left
item
…
Averaged over all test set. The higher the better.
1 if in top 𝑘
0 otherwise
1 if in top 𝑘
0 otherwise
…
Averaged over all test set. The higher the better.
𝑘
𝑘

Unordered baskets | Performance result on Amazon Diaper dataset
model r MPR Precision@5 Precision@10 Precision@20
Associative Classifier - - 4.16 4.16 4.16
Poisson Factorization 40 50.3 4.78 10.03 19.9
Factorization Machines 60 67.92 24.01 32.62 46.25
Low Rank DPP 30 71.65 25.48 35.80 49.98
Bayesian Low Rank DPP 30 72.38 26.31 36.21 51.51
Logistic DPP 50 71.08 23.7 34.01 48.44
Multi Task DPP no bias 50 77.5 32.7 45.77 61.0
Multi Task DPP 50 78.41 34.73 47.42 62.58
MPR Precision@5 Precision@10 Precision@20
Multi Task DPP vs Low Rank DPP 9.43% 36.28% 32.47% 25.2%

Unordered baskets | Performance result on Amazon Diaper + Apparel +
Feedings dataset
Associative Classifier - - 16.66 16.66 16.66
Low Rank DPP 30 70.10 13.10 18.59 26.92
Logistic DPP 60 69.61 12.65 19.8 27.86
Multi Task DPP no bias 60 88.77 18.33 28.0 43.57
Multi Task DPP 60 89.80 20.53 30.86 45.79
Multi Task DPP vs Low Rank DPP 28.11% 56.71% 66.01% 70.11%

Ordered baskets | Performance result on Instacart
model Protocol MPR Precision@5 Precision@10 Precision@20
Factorization Machines (1) 61.10 4.55 6.3 7.67
Low Rank DPP (1) 76.46 7.37 8.07 9.23
Multi Task DPP (1) 80.46 4.62 7.23 10.51
Factorization Machines (2) 62.47 9.35 10.66 11.92
Low Rank DPP (2) 61.16 7.49 8.05 8.8
RNN (2) 73.31 1.08 1.99 3.2
Multi Task DPP (2) 90.07 9.91 13.67 19.97
Multi Task DPP (3) 80.65 5.23 6.05 9.72
𝑟 = 80 except for FM for which 𝑟 = 5
1
Remove one item at random.
For multi-task DPP, the removed item is the target
and is removed at random in both training and test
set.
2
Remove last added item.
For multi-task DPP, the removed item is the target
and is removed in both training and test set.
3
Only multi-task DPP
Training set: target chosen at random
Test set: target is the last added item

Contributions summary of Multi Task Logistic DPP
• Extension of low rank DPP to model effectively classification
problem on discrete data
• Showed effectiveness on the basket completion task
• Model can scale to large catalog thanks to the tensor low rank
formulation
• Training can be parallelized using mini batch gradient descent

Paris• London• Hong Kong•NewYork•Shanghai
Thank you for your attention
Do you have any questions?
www.fifty-five.com | romain@fifty-five.com

Unordered baskets | Performance result on Belgian Retail dataset
Associative Classifier - - X X X
Low Rank DPP 76 88.52 21.48 23.29 25.19
Logistic DPP 76 87.35 21.17 23.11 25.77
MultiTask DPP no bias 76 87.42 21.02 23.35 25.13
MultiTask DPP 76 87.72 21.46 23.37 25.57
MultiTask DPP vs Low Rank DPP -0.9% -0.1% 0.34% 1.52%

Unordered baskets | Performance result on UK Retail dataset
Associative Classifier - - X X X
Low Rank DPP 100 82.74 3.07 4.75 7.6
Logistic DPP 100 75.23 3.18 4.99 7.83
MultiTask DPP no bias 100 77.67 3.82 5.98 9.11
MultiTask DPP 100 78.25 4.0 6.2 9.4
MultiTask DPP vs Low Rank DPP -5.43% 30.29% 30.53% 23.68%

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five

Recommended

Recommended

More Related Content

Similar to Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five

Similar to Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five (20)

More from recsysfr

More from recsysfr (20)

Recently uploaded

Recently uploaded (20)

Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five