Determinantal point processes (DPPs) have received significant attention in the recent years as an elegant model for a variety of machine learning tasks, due to their ability to elegantly model set diversity and item quality or popularity. Recent work has shown that DPPs can be effective models for product recommendation and basket completion tasks. We present an enhanced DPP model that is specialized for the task of basket completion, the multi-task DPP. We view the basket completion problem as a multi-class classification problem, and leverage ideas from tensor factorization and multi-class classification to design the multi-task DPP model. We evaluate our model on several real-world datasets, and find that the multi-task DPP provides significantly better predictive quality than a number of state-of-the-art models.
Recombination DNA Technology (Nucleic Acid Hybridization )
Multi Task DPP for Basket Completion by Romain WARLOP, Fifty Five
1. Tensorized DPP for Basket
Completion
September, 27th 2018Romain WARLOP
With Jérémie Mary (Criteo) and Mike Gartrell (Criteo)
2. The objective of basket completion is to suggest to a user one or several items
according to items already in her cart
Associative Classifier DPP
Definition (Confidence rule)
𝑐𝑜𝑛𝑓 𝐴 → 𝐵 =
𝑠𝑢𝑝𝑝(𝐴 ∪ 𝐵)
𝑠𝑢𝑝𝑝(𝐴)
Definition (Lift rule)
𝑙𝑖𝑓𝑡 𝐴 → 𝐵 =
𝑠𝑢𝑝𝑝(𝐴 ∪ 𝐵)
𝑠𝑢𝑝𝑝 𝐴 𝑠𝑢𝑝𝑝(𝐵)
Let 𝐴, 𝐵 be set of items
Past baskets are then analyzed to compute all possible
confidence and lift.
A minimum support threshold, confidence threshold and
lift threshold are then define. All rules 𝐴 → 𝐵 that satisfy
three condition are selected for recommendation
Very heavy computation
Not scalable to large catalog
Kernel matrix
containing
item-item
similarity
item catalog
itemcatalog
𝐿 =
Let 𝐿 be the kernel matrix associated with the DPP
𝐿 defined a discrete DPP such that the probability to
observe the set 𝐴 is proportional to det 𝐿 𝐴 with 𝐿 𝐴 the
principal submatrix of 𝐿 indexed by items in 𝐴
𝑝 𝐴 =
det 𝐿 𝐴
det(𝐿 + 𝐼)
DPP are suitable to model co-
purchase probability
For a long time, associative classifiers have been the state-of-the-art for basket completion until Determinantal Point
Processes (DPPs) show significant improvement. One can also add constraints (e.g. lower price, different category) to classic CF
solutions.
3. Multiple reasons make DPP relevant for basket completion
Quadratic number of parameters in the number of items 𝑝 while the number of sets grows
exponentially with 𝑝
Entries of the matrix measure similarity between items
Enforce diversity in the sampled set
det 𝐿{1,2} = det
𝐿11 𝐿12
𝐿21 𝐿22
= 𝐿11 𝐿22 − 𝐿12
2
item 1 popularity
correlation between
items 1 and 2
4. Assuming a low-rank constraint on the kernel matrix allows fast training
[Gartrell et al., 2017]
Let 𝒑 be the number of items in the catalog. We assume that the kernel matrix associated with the DPP is
low-rank of rank 𝐾.
Thus there exist a matrix 𝑽 ∈ ℝ 𝒑×𝑲
such that
Learning
Let ℬ = ℬ1, ⋯ , ℬ 𝑀 a collection of observed baskets – that is subsets of items.
Maximizing the regularized log-likelihood by gradient descent allows to estimate matrix 𝑉
𝑓 𝑉 =
𝑚=1
𝑀
log 𝑝(ℬ 𝑚|𝑉) −
𝛼
2
𝑖=1
𝑝
𝜆𝑖 𝑉𝑖
2
=
𝑚=1
𝑀
log det(𝐿[𝑚]) − 𝑀 log det(𝐿 + 𝐼) −
𝛼
2
𝑖=1
𝑝
𝜆𝑖 𝑉𝑖
2
inversely proportional to
item popularity
𝐿 = 𝑉𝑉 𝑇
5. Pros
• Efficient learning
• Low memory (𝑝 × 𝐾 coefficient to store)
• Fast prediction
• Scalable to large datasets
Cons
• Baskets larger than 𝐾 have probability 0 by construction
• Model the probability to buy a set of product together
instead of the relevance of the additional item
6. 1. Each target item, noted 𝜏, is model by its own kernel 𝑳 𝝉 ∈ ℝ 𝒑×𝒑
2. Item bias captured in a separate diagonal matrix
3. All those kernels form a cubic tensor 𝑳 ∈ ℝ 𝑝×𝑝×𝑝
which is assumed to be low rank
4. Conversion probability is obtained by applying a logistic like function
We introduce a logistic tensorized extension to low-rank DPP
Goal Directly model the relevance of buying an additional product instead of global coherence of the set
7. 𝐿 𝜏 = 𝑉𝑅 𝜏
2
𝑉 𝑇
+ D2
We introduce a logistic tensorized extension to low-rank DPP
insure to have a valid kernel
Target 𝜏 kernel DPP Basket items latent
factors, common to
all tasks
Basket items biasTarget 𝜏
latent factors
𝑝(𝑦𝜏|ℬ) = 𝜙 ℬ y 𝜏 1 − 𝜙 ℬ
1−𝑦 𝜏
𝜙 ℬ = 1 − 𝑒−𝑤 𝑑𝑒𝑡 𝐿ℬ
= 𝜎(𝑤 𝑑𝑒𝑡 𝐿ℬ)
scaling parameter
Goal Directly model the relevance of buying an additional product instead of global coherence of the set
8. fifty-five confidential and proprietary 8
We validated our approach on four real world datasets
Unordered baskets Ordered baskets
• Amazon Baby Registries
• Diaper category: 100 products, 10k baskets, 2.4
products/basket
• Diaper+Apparel+Feedings: 3 disjoints categories,
300 products, 17k baskets, 2.6 products/basket
• Belgian Retail Supermarket
• 16,470 products, 88k baskets, 9.6 products/basket
• UK Retail
• 4,071 products, 22k baskets, 18.5 products/basket
• Some basket contains more than 100 products
• Instacart
• Ordered baskets
• Online grocery shopping dataset
• More than 200k users, 50k products, 3M baskets split
over three datasets: train, test, prior
• Filter out test and prior datasets, baskets with less
than 2 products and products that appeared less than
15 times
• Result: 10,531 products, 700k baskets
9. fifty-five confidential and proprietary 9
We adopt different testing protocols according to the type of baskets
one item is
remove at
random
training set: 70% of baskets
test set: remove one item at random, apply model on left
items. Compute performance on the removed item.
Three protocols
1
Remove one item at random.
For tensorized DPP, the removed item is the target
and is removed at random in both training and
test set.
2
Remove last added item.
For tensorized DPP, the removed item is the target
and is removed in both training and test set.
3
Only tensorized DPP
Training set: target chosen at random
Test set: target is the last added item
Unordered baskets Ordered baskets
10. And compared it with several baselines
• Our models
• Logistic DPP
• Multi Task DPP without bias (𝐷 ≡ 0)
• Multi Task DPP
• Baselines
• Poisson Factorization (PF): [Gopalan et al., 2013] is a probabilistic
matrix factorization model generally used for recommendation
applications with implicit feedback. one basket = one user.
• Recurrent Neural Network (RNN): [Hidasi et al., 2016] adapted for
session-based recommendations.
• Factorization Machines (FM): [Rendle, 2010] is a general approach
that models 𝑑th-order interactions using low-rank assumptions.
Usually 𝑑 = 2. one basket = one user.
• Low-Rank DPP: [Gartrell et al., 2017].
• Bayesian Low-Rank DPP: [Gartrell et al., 2016] Bayesian learning
of the low-rank DPP model.
• Associative Classifier
11. fifty-five confidential and proprietary 11
Model performance is evaluated according to Mean Percentile Rank
and Precision@k
=
All items in catalog,
and not in the
basket, sorted from
the most likely to the
less likely
Precision@kMean Percentile Rank
Percentile
rank of left
item
Percentile
rank of left
item
…
Averaged over all test set. The higher the better.
1 if in top 𝑘
0 otherwise
1 if in top 𝑘
0 otherwise
…
Averaged over all test set. The higher the better.
𝑘
𝑘
12. fifty-five confidential and proprietary 12
Unordered baskets | Performance result on Amazon Diaper dataset
model r MPR Precision@5 Precision@10 Precision@20
Associative Classifier - - 4.16 4.16 4.16
Poisson Factorization 40 50.3 4.78 10.03 19.9
Factorization Machines 60 67.92 24.01 32.62 46.25
Low Rank DPP 30 71.65 25.48 35.80 49.98
Bayesian Low Rank DPP 30 72.38 26.31 36.21 51.51
Logistic DPP 50 71.08 23.7 34.01 48.44
Multi Task DPP no bias 50 77.5 32.7 45.77 61.0
Multi Task DPP 50 78.41 34.73 47.42 62.58
MPR Precision@5 Precision@10 Precision@20
Multi Task DPP vs Low Rank DPP 9.43% 36.28% 32.47% 25.2%
14. fifty-five confidential and proprietary 14
Ordered baskets | Performance result on Instacart
model Protocol MPR Precision@5 Precision@10 Precision@20
Factorization Machines (1) 61.10 4.55 6.3 7.67
Low Rank DPP (1) 76.46 7.37 8.07 9.23
Multi Task DPP (1) 80.46 4.62 7.23 10.51
Factorization Machines (2) 62.47 9.35 10.66 11.92
Low Rank DPP (2) 61.16 7.49 8.05 8.8
RNN (2) 73.31 1.08 1.99 3.2
Multi Task DPP (2) 90.07 9.91 13.67 19.97
Multi Task DPP (3) 80.65 5.23 6.05 9.72
𝑟 = 80 except for FM for which 𝑟 = 5
1
Remove one item at random.
For multi-task DPP, the removed item is the target
and is removed at random in both training and test
set.
2
Remove last added item.
For multi-task DPP, the removed item is the target
and is removed in both training and test set.
3
Only multi-task DPP
Training set: target chosen at random
Test set: target is the last added item
15. Contributions summary of Multi Task Logistic DPP
• Extension of low rank DPP to model effectively classification
problem on discrete data
• Showed effectiveness on the basket completion task
• Model can scale to large catalog thanks to the tensor low rank
formulation
• Training can be parallelized using mini batch gradient descent
16. Paris• London• Hong Kong•NewYork•Shanghai
Thank you for your attention
Do you have any questions?
www.fifty-five.com | romain@fifty-five.com
17. fifty-five confidential and proprietary 17
Unordered baskets | Performance result on Belgian Retail dataset
model r MPR Precision@5 Precision@10 Precision@20
Associative Classifier - - X X X
Poisson Factorization 40 87.02 21.46 23.06 23.90
Factorization Machines 10 65.08 20.85 21.10 21.37
Low Rank DPP 76 88.52 21.48 23.29 25.19
Bayesian Low Rank DPP 76 89.08 21.43 23.10 25.12
Logistic DPP 76 87.35 21.17 23.11 25.77
MultiTask DPP no bias 76 87.42 21.02 23.35 25.13
MultiTask DPP 76 87.72 21.46 23.37 25.57
MPR Precision@5 Precision@10 Precision@20
MultiTask DPP vs Low Rank DPP -0.9% -0.1% 0.34% 1.52%
18. fifty-five confidential and proprietary 18
Unordered baskets | Performance result on UK Retail dataset
model r MPR Precision@5 Precision@10 Precision@20
Associative Classifier - - X X X
Poisson Factorization 100 73.12 1.77 2.31 3.01
Factorization Machines 5 56.91 0.47 0.83 1.5
Low Rank DPP 100 82.74 3.07 4.75 7.6
Bayesian Low Rank DPP 100 61.31 1.07 1.91 3.25
Logistic DPP 100 75.23 3.18 4.99 7.83
MultiTask DPP no bias 100 77.67 3.82 5.98 9.11
MultiTask DPP 100 78.25 4.0 6.2 9.4
MPR Precision@5 Precision@10 Precision@20
MultiTask DPP vs Low Rank DPP -5.43% 30.29% 30.53% 23.68%