Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Large Scale Recommendation: a view from the Trenches
1. Large scale
recommendation: a
view from the trenches
Anne-Marie Tousch
Senior Research Scientist
51èmes Journées de Statistiques de la SFdS
2. Outline
1. Context & problem setting,
2. One large-scale solution,
3. Open problems.
Large scale recommendation: a view from the trenches JDS'19 2 / 26
3. Context
What Criteo does: online personalized advertising.
Large scale recommendation: a view from the trenches JDS'19 3 / 26
5. Personalized advertising
We buy ad placements
We recommend
products
We sell clicks that lead to sales.
Large scale recommendation: a view from the trenches JDS'19 5 / 26
6. Context
Daily: 300B Bid requests; 4B Displays
Worldwide: 3 Billions shoppers; 1 Billion products
Large scale recommendation: a view from the trenches JDS'19 6 / 26
7. Recommendation
A user = timeline of products browsed.
Large scale recommendation: a view from the trenches JDS'19 7 / 26
8. Recommendation
A user = timeline of products browsed.
Task: find products she wants to buy
Large scale recommendation: a view from the trenches JDS'19 7 / 26
9. Recommendation
A user = timeline of products browsed on catalog A.
Task: find products she wants to buy, in catalog B.
Large scale recommendation: a view from the trenches JDS'19 8 / 26
11. Large-scale recommender systems
co-event counters, nearest neighbors easy, strong baseline,
matrix factorization (MF) now scales,
neural networks is state-of-the-art, but how does it scale?
Large scale recommendation: a view from the trenches JDS'19 10 / 26
12. Matrix factorization
Classical recommender system setting:
Products set P, m = |P|; a product: vj, j ∈ [m]
User ui = {vj1 , . . . , vji }
Interaction matrix Ai,j = δ[vj∈ui] or ratings, or counts, A ∈ Rn×m
Factorize A with truncated SVD to obtain user and product embeddings of
dimension k << min(m, n):
A = U · Σ · V∗
Large scale recommendation: a view from the trenches JDS'19 11 / 26
13. Large-scale MF
What if m ≈ n ≈ 107−9?
Large scale recommendation: a view from the trenches JDS'19 12 / 26
14. Large-scale MF
What if m ≈ n ≈ 107−9?
Idea: use sketching.
Large scale recommendation: a view from the trenches JDS'19 12 / 26
15. Large-scale MF
What if m ≈ n ≈ 107−9?
Idea: use sketching.
Johnson-Lindenstrauss lemma, 1984
Let ϵ ∈ (0, 1) and A be a set of n points in Rd . Let k be an integer and
k = O
(
ϵ−2 log n
)
. Then there exists a mapping f : Rd → Rk such that for any a,
b ∈ A:
(1 − ϵ)∥a − b∥2 ≤ ∥f(a) − f(b)∥2 ≤ (1 + ϵ)∥a − b∥2
Large scale recommendation: a view from the trenches JDS'19 12 / 26
17. Randomized SVD1
Stage A: Compute an approximate basis for the range of the input matrix A. In
other words, we require a matrix Q for which
Q has orthonormal columns and A ≈ QQ∗
A
1
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. “Finding structure with randomness:
Probabilistic algorithms for constructing approximate matrix decompositions”. In: SIAM review 53.2
(2011), pp. 217–288.
Large scale recommendation: a view from the trenches JDS'19 14 / 26
18. Randomized SVD1
Stage A: Compute an approximate basis for the range of the input matrix A. In
other words, we require a matrix Q for which
Q has orthonormal columns and A ≈ QQ∗
A
Stage B: Use Q to help compute a standard factorization (QR, SVD, etc.) of A.
Form the matrix B = Q∗
A.
Compute an SVD of the small matrix: U = UΣV∗
Form the orthonormal matrix U = QU
1
Halko, Martinsson, and Tropp, “Finding structure with randomness: Probabilistic algorithms for
constructing approximate matrix decompositions”.
Large scale recommendation: a view from the trenches JDS'19 14 / 26
19. Randomized SVD
Draw an n × ℓ standard Gaussian matrix Ω.
Form Y0 = AΩ and compute its QR factorization Y0 = Q0R0.
for j = 1, 2, . . . , q
Form Yj = A∗
Qj−1
Compute its QR factorization Yj = QjRj
Form Yj = AQj
Compute its QR factorization Yj = QjRj
Q = Qq
Apply stage B:
B := QTA; BT = ˜QR = ˜Q
(
ˆVSˆUT
)
U := QˆU
Large scale recommendation: a view from the trenches JDS'19 15 / 26
20. Randomized decomposition
Draw an n × ℓ standard Gaussian matrix Ω.
Form Y0 = AΩ and compute its QR factorization Y0 = Q0R0.
for j = 1, 2, . . . , q
Normalize rows of Qj−1
Form ˜Yj = A∗
Qj−1
Compute its QR factorization Yj = QjRj
Normalize rows of Qj
Form Yj = AQj
Compute its QR factorization Yj = QjRj
Q = Qq
Skip stage B.
Large scale recommendation: a view from the trenches JDS'19 16 / 26
21. Matrix factorization vs. Word2Vec
“For a negative-sampling value of k = 1, the Skip-Gram objective is factorizing a
word-context matrix in which the association between a word and its context is
measured by f(w, c) = PMI(w, c)2
.”
We approximate Skip-Gram by factorizing a PMI matrix with:
P = A∗
A ∈ Rm×m
PMIi,j := log
Pi,j
∑
i′,j′ Pi′,j′
∑
j′ Pi,j′
∑
i′ Pj,i′
2
Omer Levy and Yoav Goldberg. “Neural word embedding as implicit matrix factorization”. In:
Advances in neural information processing systems. 2014, pp. 2177–2185.
Large scale recommendation: a view from the trenches JDS'19 17 / 26
22. Approximate nearest neighbors
Project user in embedding space,
Recommend top-k nearest neighbors to user in product space.
Problem: if different catalogs are not aligned, nearest neighbors are almost
always the same.
Large scale recommendation: a view from the trenches JDS'19 18 / 26
23. Open questions
Pb1: the popularity biases
Eg: Recommending high-frequency items is a strong baseline strategy.
=> fairness and diversity issues.
Large scale recommendation: a view from the trenches JDS'19 19 / 26
24. Open questions
Pb1: the popularity biases
Eg: Recommending high-frequency items is a strong baseline strategy.
=> fairness and diversity issues.
high-frequency users, big vs. small advertisers, ...
Large scale recommendation: a view from the trenches JDS'19 19 / 26
25. Open questions
Pb2: the organic traffic bias
Metric: predict next item?
Large scale recommendation: a view from the trenches JDS'19 20 / 26
26. Open questions
Pb2: the organic traffic bias
Metric: predict next item?
Large scale recommendation: a view from the trenches JDS'19 20 / 26
27. Open questions
Pb2: the organic traffic bias
Metric: predict next item?
But: we want to predict incremental sales. What if we had not recommended
this product, would the user still have bought it?
3
Stephen Bonner and Flavian Vasile. “Causal embeddings for recommendation”. In: Proceedings of
the 12th ACM Conference on Recommender Systems. ACM. 2018, pp. 104–112.
Large scale recommendation: a view from the trenches JDS'19 21 / 26
28. Open questions
Pb2: the organic traffic bias
Metric: predict next item?
But: we want to predict incremental sales. What if we had not recommended
this product, would the user still have bought it?
Idea: learn embeddings to optimize individual treatment effects3.
3
Bonner and Vasile, “Causal embeddings for recommendation”.
Large scale recommendation: a view from the trenches JDS'19 21 / 26
29. Open questions
Pb2: the organic traffic bias
Simulation environment4: https://github.com/criteo-research/reco-gym
4
David Rohde et al. “RecoGym: A Reinforcement Learning Environment for the problem of Product
Recommendation in Online Advertising”. In: arXiv preprint arXiv:1808.00720 (2018).
Large scale recommendation: a view from the trenches JDS'19 22 / 26
30. Open questions
Pb3: the unbounded number of products
Large scale neural networks: variational auto-encoder example
“[Use...] function fθ(·) ∈ RI
to produce a probability distribution
over m items π (zu) ...a
.”
What if I = 107, 109?
a
Dawen Liang et al. “Variational autoencoders for collaborative filtering”. In: Proceedings of t
2018 World Wide Web Conference on World Wide Web. International World Wide Web
Conferences Steering Committee. 2018, pp. 689–698.
Large scale recommendation: a view from the trenches JDS'19 23 / 26
31. Open questions
Pb3: the unbounded number of products
Idea: use group testing scheme with binary p × m matrix H
h(y) = H ∨ y
=> work as with p pseudo-items.
“Theorem: Suppose we wish to recover a k sparse binary vector y ∈ Rm
. A random
binary {0, 1} matrix A where each entry is 1 with probability ρ = 1/k recovers 1 − ε
proportion of the support of y correctly with high probability, for any ε > 0, with
p = O(k log m). This matrix will also detect e = Ω(p) errors.5
”
5
Shashanka Ubaru and Arya Mazumdar. “Multilabel classification with group testing and codes”.
In: Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org.
2017, pp. 3492–3501.
Large scale recommendation: a view from the trenches JDS'19 24 / 26
32. Open questions
Pb3: the unbounded number of products
Idea: use group testing scheme with binary p × m matrix H
h(y) = H ∨ y
=> work as with p pseudo-items.
“Theorem: Suppose we wish to recover a k sparse binary vector y ∈ Rm
. A random
binary {0, 1} matrix A where each entry is 1 with probability ρ = 1/k recovers 1 − ε
proportion of the support of y correctly with high probability, for any ε > 0, with
p = O(k log m). This matrix will also detect e = Ω(p) errors.5
”
Question: Can we do better knowing the item frequency follows a power law?
5
Ubaru and Mazumdar, “Multilabel classification with group testing and codes”.
Large scale recommendation: a view from the trenches JDS'19 24 / 26
33. Thanks! Questions?
Reach out to me at:
am.tousch@criteo.com or on Twitter @amy8492
Large scale recommendation: a view from the trenches JDS'19 25 / 26
34. Bonner, Stephen and Flavian Vasile. “Causal embeddings for recommendation”. In:
Proceedings of the 12th ACM Conference on Recommender Systems. ACM. 2018,
pp. 104–112.
Halko, Nathan, Per-Gunnar Martinsson, and Joel A Tropp. “Finding structure with
randomness: Probabilistic algorithms for constructing approximate matrix
decompositions”. In: SIAM review 53.2 (2011), pp. 217–288.
Levy, Omer and Yoav Goldberg. “Neural word embedding as implicit matrix
factorization”. In: Advances in neural information processing systems. 2014,
pp. 2177–2185.
Liang, Dawen et al. “Variational autoencoders for collaborative filtering”. In:
Proceedings of the 2018 World Wide Web Conference on World Wide Web.
International World Wide Web Conferences Steering Committee. 2018, pp. 689–698.
Rohde, David et al. “RecoGym: A Reinforcement Learning Environment for the
problem of Product Recommendation in Online Advertising”. In: arXiv preprint
arXiv:1808.00720 (2018).
Ubaru, Shashanka and Arya Mazumdar. “Multilabel classification with group testing
and codes”. In: Proceedings of the 34th International Conference on Machine
Learning-Volume 70. JMLR. org. 2017, pp. 3492–3501.
Large scale recommendation: a view from the trenches JDS'19 26 / 26