SlideShare a Scribd company logo
1 of 27
Download to read offline
Efficient Similarity Computation
for Collaborative Filtering in Dynamic Environments
Olivier Jeunen1
, Koen Verstrepen2
, Bart Goethals1,2
September 18th, 2019
1Adrem Data Lab, University of Antwerp
2Froomle
olivier.jeunen@uantwerp.be
1
Introduction & Motivation
Setting the scene




















u1 i1 t1
u1 i2 t2
u1 i3 t3
u2 i4 t4
u2 i2 t5
u3 i1 t6
u2 i5 t7
u2 i7 t8
u3 i6 t9
. . . . . . . . .




















We deal with implicit feedback: a set of (user, item,
timestamp)-triplets, representing clicks, views, sales,
…
Suppose we have a set of pageviews of this form.
2
Problem statement
In neighbourhood-based collaborative
filtering1, we need to compute similarity
between pairs of items.
Items are represented as sparse,
high-dimensional columns in the
user-item matrix P.




















0 0 0 . . . 0 1 0
1 0 0 . . . 0 0 1
0 0 0 . . . 1 0 0
0 0 1 . . . 0 0 0
. . . . . . . . . . . . . . . . . . . . .
0 1 0 . . . 0 0 0
0 0 0 . . . 0 1 0
0 1 1 . . . 0 0 0
0 0 0 . . . 1 0 0
1 0 1 . . . 0 1 0




















1
Still a very competitive baseline, but often deemed unscalable
3
A need for speed
Typically, the model is periodically recomputed.
For ever-growing datasets, these iterative updates can become very
time-consuming and model recency is often sacrificed.
0 20 40 60 80 100 120 140 160
time
0
10
20
30
40
50
60
runtime
∆t
∆t
...
Iterative model updates over time
4
Previous work
Existing approaches tend to speed up computations through
• Approximation.
• Parallellisation.
• Incremental computation.
But currently existing exact solutions do not exploit the sparsity that is
inherent to implicit-feedback data streams.
5
Contribution & Methodology
Incremental Similarity Computation
In the binary setting, cosine-similarity simplifies to the number of users
that have seen both items, divided by the square root of their individual
numbers.
cos(i, j) =
|Ui ∩ Uj|
|Ui| |Uj|
=
Mi,j
√
Ni Nj
As such, we can compute these building blocks incrementally instead of
recomputing the entire similarity with every update:
N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|.
6
Dynamic Index
Existing approaches tend to build inverted indices in a preprocessing
step… we do this on-the-fly!
Initialise a simple inverted index for every user, to hold their
histories.
For every pageview (u, i):
1. Increment item co-occurence for i and other items seen by u.
2. Update the item’s count.
3. Add the item to the user’s inverted index.
7
Online Learning
As Dynamic Index consists of a single for-loop over the pageviews,
it can naturally handle streaming data.
0 1 2 3 4 5 6
|P| ×103
0
1
2
3
4
runtime(s)
×107
|∆P|
∆t
ti ti+1
Impact of Online Learning
8
Parallellisation Procedure
We adopt a MapReduce-like parallellisation framework:
• Mapping is the Dynamic Index algorithm.
• Reducing two models M = {M, N, L} and M = {M , N , L } is:
1. Summing up M, M and N, N
2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L [u].
Step 2 is obsolete if M and M are computed on disjoint sets of users!
9
Recommendability
Often, the set of items that should be considered as recommendations is
constrained by recency, stock, licenses, seasonality, … We denote Rt as
the set of recommendable items at time t, and argue that it is often
much smaller than the full item collection.
Rt I
As such, we only need an up-to-date similarity sim(i, j) if either i or j is
recommendable:
i ∈ Rt ∨ j ∈ Rt
To keep up-to-date with recommendability updates:
add a second inverted index for every user. 10
Experimental Results
Datasets
Table 1: Experimental dataset characteristics.
Movielens* Netflix* News Outbrain
# “events” 20e6 100e6 96e6 200e6
# users 138e3 480e3 5e6 113e6
# items 27e3 18e3 297e3 1e6
mean items per user 144.41 209.25 18.29 1.76
mean users per item 747.84 5654.50 242.51 184.50
sparsity user-item matrix 99.46% 98.82% 99.99% 99.99%
sparsity item-item matrix 59.90% 0.22% 99.83% 99.98%
11
RQ1: Are we more efficient than the baselines?
0.0 0.5 1.0 1.5 2.0
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
runtime(s)
×103 Movielens
0.00 0.25 0.50 0.75 1.00
×108
0.00
0.15
0.30
0.45
0.60
0.75
0.90
1.05
×104 Netflix
0.00 0.25 0.50 0.75 1.00
|P| ×108
0
1
2
3
4
5
6
7
runtime(s)
×103 News
0.0 0.5 1.0 1.5 2.0
|P| ×108
0.0
0.2
0.4
0.6
0.8
1.0
1.2
×103 Outbrain
Sparse Baseline Dynamic Index 12
RQ1: Are we more efficient than the baselines?
Observations
• More efficient if M is sparse
• More efficient if users have shorter histories
• Average number of processed interactions per second
ranges from 14 500 to 834 000
13
RQ2: How effective is parallellisation?
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
×107
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4runtime(s) ×103 Movielens
0.0 0.2 0.4 0.6 0.8 1.0
×108
0
1
2
3
4
5
6
7
8
×103 Netflix
0.0 0.2 0.4 0.6 0.8 1.0
|P| ×108
0
1
2
3
4
runtime(s)
×103 News
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00
|P| ×108
0.0
0.5
1.0
1.5
2.0
2.5
3.0
×102 Outbrain
n = 1 n = 2 n = 4 n = 8 14
RQ2: How effective is parallellisation?
Observations
• Speedup factor of > 4 for Netflix and News datasets with 8 cores
• Incremental updates complicate reduce procedure:
• For sufficiently large batches, performance gains are tangible.
• For small batches, single-core updates are preferred.
15
RQ3: What is the effect of constrained recommendability?
101
102
103
runtime(s)
News (n = 8)
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75
time (h) ×102
103
104
105
|Rt|
δ = 6h
δ = 12h
δ = 18h
δ = 24h
δ = 48h
δ = 96h
δ = 168h
δ = ∞ 16
RQ3: What is the effect of constrained recommendability?
Observations
• Clear efficiency gains for lower values of δ:
• 48h only needs < 10% of the runtime needed without
restrictions.
• 24h < 5%
• 6h 1.6%
• Slope of increasing runtime with more data is flattened,
improving scalability.
17
Conclusion & Future Work
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
• is easily parallellisable.
18
Conclusion
We introduce Dynamic Index, which:
• is faster than the state-of-the art in exact similarity computation
for sparse and high-dimensional data.
• computes incrementally by design.
• is easily parallellisable.
• naturally handles and exploits recommendability of items.
18
Questions?
Source code is available:
Academics hire too!
PhD students + Post-docs
19
Future Work
• More advanced similarity measures:
• Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,…
are all dependent on the co-occurrence matrix M.
• Beyond item-item collaborative filtering:
• With relatively straightforward extensions…
(e.g. including a value in the inverted indices to allow for non-binary data)
…we can tackle more general Information Retrieval use-cases.
20

More Related Content

What's hot

Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrapPaul Gardner
 
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Edureka!
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methodsPaul Gardner
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RShirin Elsinghorst
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headachesBenoît Rostykus
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapFrancesco Casalegno
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection철 김
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_KimSundong Kim
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareGlobalLogic Ukraine
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Spakov.2011.comparison of gaze to-objects mapping algorithms
Spakov.2011.comparison of gaze to-objects mapping algorithmsSpakov.2011.comparison of gaze to-objects mapping algorithms
Spakov.2011.comparison of gaze to-objects mapping algorithmsmrgazer
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveGabriele Sottocornola
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Kazuto Fukuchi
 

What's hot (20)

Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
Naive Bayes Classifier Tutorial | Naive Bayes Classifier Example | Naive Baye...
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Usage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in HealthcareUsage of Generative Adversarial Networks (GANs) in Healthcare
Usage of Generative Adversarial Networks (GANs) in Healthcare
 
Introduction
IntroductionIntroduction
Introduction
 
Spakov.2011.comparison of gaze to-objects mapping algorithms
Spakov.2011.comparison of gaze to-objects mapping algorithmsSpakov.2011.comparison of gaze to-objects mapping algorithms
Spakov.2011.comparison of gaze to-objects mapping algorithms
 
Multi-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspectiveMulti-Armed Bandit: an algorithmic perspective
Multi-Armed Bandit: an algorithmic perspective
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
 

Similar to Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments

Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationQualcomm Research
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...ETS Asset Management Factory
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Sneha Ravikumar
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfTulasiramKandula1
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare EventsTaegyun Jeon
 
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...sugiuralab
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR modelsNisha Arankandath
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Tiziano De Matteis
 

Similar to Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments (20)

Enabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through QuantizationEnabling Power-Efficient AI Through Quantization
Enabling Power-Efficient AI Through Quantization
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
 
Fa19_P1.pptx
Fa19_P1.pptxFa19_P1.pptx
Fa19_P1.pptx
 
report
reportreport
report
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETSFAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events[PR12] PR-036 Learning to Remember Rare Events
[PR12] PR-036 Learning to Remember Rare Events
 
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...
A Study of Wearable Accelerometers Layout for Human Activity Recognition(Asia...
 
Performance evaluation of IR models
Performance evaluation of IR modelsPerformance evaluation of IR models
Performance evaluation of IR models
 
AA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptxAA_Unit 1_part-I.pptx
AA_Unit 1_part-I.pptx
 
Python faster for loop
Python faster for loopPython faster for loop
Python faster for loop
 
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
Parallel Patterns for Window-based Stateful Operators on Data Streams: an Alg...
 

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 

Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments

  • 1. Efficient Similarity Computation for Collaborative Filtering in Dynamic Environments Olivier Jeunen1 , Koen Verstrepen2 , Bart Goethals1,2 September 18th, 2019 1Adrem Data Lab, University of Antwerp 2Froomle olivier.jeunen@uantwerp.be 1
  • 3. Setting the scene                     u1 i1 t1 u1 i2 t2 u1 i3 t3 u2 i4 t4 u2 i2 t5 u3 i1 t6 u2 i5 t7 u2 i7 t8 u3 i6 t9 . . . . . . . . .                     We deal with implicit feedback: a set of (user, item, timestamp)-triplets, representing clicks, views, sales, … Suppose we have a set of pageviews of this form. 2
  • 4. Problem statement In neighbourhood-based collaborative filtering1, we need to compute similarity between pairs of items. Items are represented as sparse, high-dimensional columns in the user-item matrix P.                     0 0 0 . . . 0 1 0 1 0 0 . . . 0 0 1 0 0 0 . . . 1 0 0 0 0 1 . . . 0 0 0 . . . . . . . . . . . . . . . . . . . . . 0 1 0 . . . 0 0 0 0 0 0 . . . 0 1 0 0 1 1 . . . 0 0 0 0 0 0 . . . 1 0 0 1 0 1 . . . 0 1 0                     1 Still a very competitive baseline, but often deemed unscalable 3
  • 5. A need for speed Typically, the model is periodically recomputed. For ever-growing datasets, these iterative updates can become very time-consuming and model recency is often sacrificed. 0 20 40 60 80 100 120 140 160 time 0 10 20 30 40 50 60 runtime ∆t ∆t ... Iterative model updates over time 4
  • 6. Previous work Existing approaches tend to speed up computations through • Approximation. • Parallellisation. • Incremental computation. But currently existing exact solutions do not exploit the sparsity that is inherent to implicit-feedback data streams. 5
  • 8. Incremental Similarity Computation In the binary setting, cosine-similarity simplifies to the number of users that have seen both items, divided by the square root of their individual numbers. cos(i, j) = |Ui ∩ Uj| |Ui| |Uj| = Mi,j √ Ni Nj As such, we can compute these building blocks incrementally instead of recomputing the entire similarity with every update: N ∈ Nn : Ni = |Ui| and M ∈ Nn×n : Mi,j = |Ui ∩ Uj|. 6
  • 9. Dynamic Index Existing approaches tend to build inverted indices in a preprocessing step… we do this on-the-fly! Initialise a simple inverted index for every user, to hold their histories. For every pageview (u, i): 1. Increment item co-occurence for i and other items seen by u. 2. Update the item’s count. 3. Add the item to the user’s inverted index. 7
  • 10. Online Learning As Dynamic Index consists of a single for-loop over the pageviews, it can naturally handle streaming data. 0 1 2 3 4 5 6 |P| ×103 0 1 2 3 4 runtime(s) ×107 |∆P| ∆t ti ti+1 Impact of Online Learning 8
  • 11. Parallellisation Procedure We adopt a MapReduce-like parallellisation framework: • Mapping is the Dynamic Index algorithm. • Reducing two models M = {M, N, L} and M = {M , N , L } is: 1. Summing up M, M and N, N 2. Cross-referencing (u, i)-pairs from L[u] with (u, j)-pairs from L [u]. Step 2 is obsolete if M and M are computed on disjoint sets of users! 9
  • 12. Recommendability Often, the set of items that should be considered as recommendations is constrained by recency, stock, licenses, seasonality, … We denote Rt as the set of recommendable items at time t, and argue that it is often much smaller than the full item collection. Rt I As such, we only need an up-to-date similarity sim(i, j) if either i or j is recommendable: i ∈ Rt ∨ j ∈ Rt To keep up-to-date with recommendability updates: add a second inverted index for every user. 10
  • 14. Datasets Table 1: Experimental dataset characteristics. Movielens* Netflix* News Outbrain # “events” 20e6 100e6 96e6 200e6 # users 138e3 480e3 5e6 113e6 # items 27e3 18e3 297e3 1e6 mean items per user 144.41 209.25 18.29 1.76 mean users per item 747.84 5654.50 242.51 184.50 sparsity user-item matrix 99.46% 98.82% 99.99% 99.99% sparsity item-item matrix 59.90% 0.22% 99.83% 99.98% 11
  • 15. RQ1: Are we more efficient than the baselines? 0.0 0.5 1.0 1.5 2.0 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2 runtime(s) ×103 Movielens 0.00 0.25 0.50 0.75 1.00 ×108 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.05 ×104 Netflix 0.00 0.25 0.50 0.75 1.00 |P| ×108 0 1 2 3 4 5 6 7 runtime(s) ×103 News 0.0 0.5 1.0 1.5 2.0 |P| ×108 0.0 0.2 0.4 0.6 0.8 1.0 1.2 ×103 Outbrain Sparse Baseline Dynamic Index 12
  • 16. RQ1: Are we more efficient than the baselines? Observations • More efficient if M is sparse • More efficient if users have shorter histories • Average number of processed interactions per second ranges from 14 500 to 834 000 13
  • 17. RQ2: How effective is parallellisation? 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 ×107 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4runtime(s) ×103 Movielens 0.0 0.2 0.4 0.6 0.8 1.0 ×108 0 1 2 3 4 5 6 7 8 ×103 Netflix 0.0 0.2 0.4 0.6 0.8 1.0 |P| ×108 0 1 2 3 4 runtime(s) ×103 News 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 |P| ×108 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ×102 Outbrain n = 1 n = 2 n = 4 n = 8 14
  • 18. RQ2: How effective is parallellisation? Observations • Speedup factor of > 4 for Netflix and News datasets with 8 cores • Incremental updates complicate reduce procedure: • For sufficiently large batches, performance gains are tangible. • For small batches, single-core updates are preferred. 15
  • 19. RQ3: What is the effect of constrained recommendability? 101 102 103 runtime(s) News (n = 8) 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 time (h) ×102 103 104 105 |Rt| δ = 6h δ = 12h δ = 18h δ = 24h δ = 48h δ = 96h δ = 168h δ = ∞ 16
  • 20. RQ3: What is the effect of constrained recommendability? Observations • Clear efficiency gains for lower values of δ: • 48h only needs < 10% of the runtime needed without restrictions. • 24h < 5% • 6h 1.6% • Slope of increasing runtime with more data is flattened, improving scalability. 17
  • 22. Conclusion We introduce Dynamic Index, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. 18
  • 23. Conclusion We introduce Dynamic Index, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. 18
  • 24. Conclusion We introduce Dynamic Index, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. • is easily parallellisable. 18
  • 25. Conclusion We introduce Dynamic Index, which: • is faster than the state-of-the art in exact similarity computation for sparse and high-dimensional data. • computes incrementally by design. • is easily parallellisable. • naturally handles and exploits recommendability of items. 18
  • 26. Questions? Source code is available: Academics hire too! PhD students + Post-docs 19
  • 27. Future Work • More advanced similarity measures: • Jaccard index, Pointwise Mutual Information (PMI), Pearson correlation,… are all dependent on the co-occurrence matrix M. • Beyond item-item collaborative filtering: • With relatively straightforward extensions… (e.g. including a value in the inverted indices to allow for non-binary data) …we can tackle more general Information Retrieval use-cases. 20