SlideShare a Scribd company logo
1 of 30
GRU4Rec v2
Recurrent neural networks with top-k gains for
session-based recommendations
Balázs Hidasi
@balazshidasi
At a glance
• Improvements on GRU4Rec [Hidasi et. al, 2015]
 Session-based recommendations with RNNs
• General
 Negative sampling strategies
 Loss function design
• Specific
 Constrained embeddings
 Implementation details
• Offline tests: up to 35% improvement
• Online tests & observations
GRU4Rec overview
Context: session-based recommendations
• User identification
 Feasibility
 Privacy
 Regulations
• User intent
 Disjoint sessions
o Need
o Situation (context)
o „Irregularities”
• Session-based recommendations
• Permanent user cold-start
Preliminaries
• Next click prediction
• Top-N recommendation (ranking)
• Implicit feedback
Recurrent Neural Networks
• Basics
 Input: sequential information ( 𝑥 𝑡 𝑡=1
𝑇
)
 Hidden state (ℎ 𝑡):
o representation of the sequence so far
o influenced by every element of the sequence up to t
 ℎ 𝑡 = 𝑓 𝑊𝑥 𝑡 + 𝑈ℎ 𝑡−1 + 𝑏
• Gated RNNs (GRU, LSTM & others)
 Basic RNN is subject to the exploding/vanishing gradient problem
 Use ℎ 𝑡 = ℎ 𝑡−1 + Δ 𝑡 instead of rewriting the hidden state
 Information flow is controlled by gates
• Gated Recurrent Unit (GRU)
 Update gate (z)
 Reset gate (r)
 𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧
 𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟
 ℎ 𝑡 = tanh 𝑊𝑥 𝑡 + 𝑟𝑡 ∘ 𝑈ℎ 𝑡−1 + 𝑏
 ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ ℎ 𝑡
ℎ
r
IN
OUT
z
+
GRU4Rec
• GRU trained on session data, adapted to the
recommendation task
 Input: current item ID
 Hidden state: session representation
 Output: likelihood of being the next item
• Session-parallel mini-batches
 Mini-batch is defined over sessions
 Update with one step BPTT
o Lots of sessions are very short
o 2D mini-batching, updating on longer sequences (with
or without padding) didn’t improve accuracy
• Output sampling
 Computing scores for all items (100K – 1M) in every
step is slow
 One positive item (target) + several samples
 Fast solution: scores on mini-batch targets
o Items of the other mini-batch are negative samples for
the current mini-batch
• Loss functions: cross-entropy, BPR, TOP1
GRU layer
One-hot vector
Weighted output
Scores on items
f()
One-hot vector
ItemID (next)
ItemID
Negative sampling & loss
function design
Negative sampling
• Training step
 Score all items
 Push target items forward (modify model parameters)
• Many training steps & many items
 Not scalable
 𝑂 𝑆𝐼 𝑁+
• Sample negative examples instead  𝑂 𝐾𝑁+
• Sampling probability
 Must be quick to calculate
 Two popular choices
o Uniform  many unnecessary steps
o Proportional to support  better in practice, fast start, some
relations are not examined enough
 Optimal choice
o Data dependent
o Changing during training might help
Mini-batch based negative sampling
• Target items of other examples
from the mini-batch  as
negative samples
• Pros
 Efficient & simple implementation
on GPU
 Sampling probability proportional
to support
• Cons
 Number of samples is tied to the
batch size
o Mini-batch training: smaller
batches
o Negative sampling: larger
batches
 Sampling probability is always the
same
𝑖1,1 𝑖1,2 𝑖1,3 𝑖1,4
𝑖2,1 𝑖2,2 𝑖2,3
𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6
𝑖4,1 𝑖4,2
𝑖5,1 𝑖5,2 𝑖5,3
Session1
Session2
Session3
Session4
Session5
𝑖1,1 𝑖1,2 𝑖1,3
𝑖2,1 𝑖2,2
𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5
𝑖4,1
𝑖5,1 𝑖5,2
Input
Target
…
𝑖1,2 𝑖1,3 𝑖1,4
𝑖2,2 𝑖2,3
𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6
𝑖4,2
𝑖5,2 𝑖5,3
…
𝑖1 𝑖5 𝑖8
𝑦1
1
𝑦2
1
𝑦3
1 𝑦4
1
𝑦5
1
𝑦6
1
𝑦7
1
𝑦8
1
𝑦1
3
𝑦2
3
𝑦3
3
𝑦4
3
𝑦5
3
𝑦6
3
𝑦7
3
𝑦8
3
𝑦1
2
𝑦2
2
𝑦3
2 𝑦4
2
𝑦5
2
𝑦6
2
𝑦7
2
𝑦8
2
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0
Solution: add more samples!
• Add extra samples
• Shared between mini-batches
• Sampling probability: 𝑝𝑖 ∼ supp 𝛼
 𝛼 = 0  uniform
 𝛼 = 1  popularity based
• Implementation trick
 Sampling interrupts GPU computations
 More efficient in parallel
 Sample store (cache)
o Precompute 10-100M samples
o Resample is we used them all
Another look the loss functions
• Listwise losses on the target+negative samples
 Cross-entropy + softmax
o Cross-entropy in itself is pointwise
o Target should have the maximal score
o 𝑋𝐸 = − log 𝑠𝑖 , 𝑠𝑖 =
𝑒 𝑟 𝑖
𝑗=1
𝐾 𝑒
𝑟 𝑗
o Unstable in previous implementation
– Rounding errors
– Fixed
 Average BPR-score
o BPR in itself is pairwise
o Target should have higher score than all negative samples
o 𝐵𝑃𝑅 = −
1
𝐾 𝑗=1
𝐾
log 𝜎 𝑟𝑖 − 𝑟𝑗
 TOP1 score
o Heuristic loss, idea is similar to the average BPR
o Score regularization part
o 𝑇𝑂𝑃1 =
1
𝐾 𝑗=1
𝐾
𝜎 𝑟𝑗 − 𝑟𝑖 + 𝜎 𝑟𝑗
2
• Earlier results
 Similar performance
 TOP1 is slightly better
Unexpected behaviour of pairwise losses
• Unexpected behaviour
 Several negative samples  good results
 Many negative samples  bad results
• Gradient (BPR wrt. target score)

𝜕𝐿 𝑖
𝜕𝑟 𝑖
=
1
𝐾 𝑗=1
𝐾
1 − 𝜎 𝑟𝑖 − 𝑟𝑗
• Irrelevant negative sample
 Whose score is already lower than 𝑟𝑖
o Changes during training
 Doesn’t contribute to optimization: 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 ∼ 0
 Number of irrelevant samples increases as training progresses
• Averaging with many negative samples  gradient vanishes
 Target will be pushed up for a while
 Slows down as approaches the top
 More samples: slows down earlier
Pairwise-max loss functions
• The target score should be higher than the maximum
score amongst the negative samples
• BPR-max
 𝑃 𝑟𝑖 > 𝑟MAX|𝜃 = 𝑗=1
𝐾
𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 𝑃 𝑟𝑗 = 𝑟MAX|𝜃
 Use continuous approximations
o 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 = 𝜎 𝑟𝑖 − 𝑟𝑗
o 𝑃 𝑟𝑗 = 𝑟MAX|𝜃 = softmax 𝑟𝑘 𝑘=1
𝐾
=
𝑒
𝑟 𝑗
𝑘=1
𝐾 𝑒 𝑟 𝑘
= 𝑠𝑗
– Softmax over negative samples only
 Minimize negative log probability
 Add ℓ2 score regularization
• 𝐿𝑖 = − log 𝑗=1
𝐾
𝑠𝑗 𝜎 𝑟𝑖 − 𝑟𝑗 + 𝜆 𝑗=1
𝐾
𝑟𝑗
2
Gradient of pairwise max losses
• BPR-max (wrt. 𝑟𝑖)

𝜕𝐿 𝑖
𝜕𝑟 𝑖
=
𝑗=1
𝐾
𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 1−𝜎 𝑟 𝑖−𝑟 𝑗
𝑗=1
𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗
 Weighted average of BPR gradients
o Relative importance of samples:
𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗
𝑠 𝑘 𝜎 𝑟 𝑖−𝑟 𝑘
=
𝑒
𝑟 𝑗+𝑒
𝑟 𝑗+𝑟 𝑘−𝑟 𝑖
𝑒 𝑟 𝑘+𝑒
𝑟 𝑗+𝑟 𝑘−𝑟 𝑖
o Smoothed softmax
– If 𝑟𝑖 ≫ max 𝑟𝑗, 𝑟𝑘  behaves like softmax
– Stronger smoothing otherwise
– Uniform  softmax as 𝑟𝑖 is pushed to the top
Number of samples: training times &
performance
• Training times on GPU
don’t increase until the
parallelization limit is
reached
• Around the same place
• Significant improvements
up to a certain point
• Diminishing returns after
The effect of the α parameter
• Data & loss dependent
 Cross-entropy: favours lower values
 BPR-max: 0.5 is usually a good choice (data dependent)
 There is always a popularity based part of the samples
o Original mini-batch examples
o Removing these will result in higher optimal 𝛼
• CLASS dataset (cross-entropy, BPR-max)
Constrained embeddings &
offline tests
Unified item representations in GRU4Rec (1/2)
• Hidden state x 𝑊𝑦  scores
 One vector for each item  „item feature matrix”
• Embedding
 A vector for each item  another „item feature matrix”
 Can be used as the input of the GRU layer instead of the one-hot
vector
 Slightly decreases offline metrics
• Constrained embeddings (unified item representations)
 Use the same matrix for both input and output embedding
 Unified representations  faster convergence
 Embedding size tied to hidden the size of the last hidden state
Unified item representations in GRU4Rec (2/2)
• Model size: largest components scale with the items
 One-hot input: 𝑋
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
 Embedding input: 𝑋/2
o 𝐸, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
 𝑆 𝐸 × 𝑆 𝐻
o 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
 Constrained embedding: 𝑋/4
o 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
, 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
GRU
layer
H
ItemID
dot(𝐻,𝑊𝑦)
scores
GRU
layer
H
ItemIDs
dot(𝐻,𝑊𝑦)
scores
E[ItemIDs
]
embedding
GRU
layer
H
ItemIDs
dot(𝐻,𝑊𝑦)
scores
𝑊𝑦[ItemIDs]
embedding
Offline results
• Over item-kNN
 +25-52% in recall@20
 +35-55% in MRR@20
• Over the original GRU4Rec
 +18-35% in recall@20
 +27-37% in MRR@20
• BPR-max vs. (fixed) cross-entropy
 +2-6% improvement in 2 of 4 cases
 No statistically significant difference in the other 2 case
• Constrained embedding
 Most cases: slightly worse MRR & better recall
 Huge improvements on the CLASS datasset (+18.74% in recall, +29.44% in MRR)
Online A/B tests
A/B test - video service (1/3)
• Setup
 Video page
 Recommended videos on a strip
 Autoplay functionality
 Recommendations are NOT recomputed if the user clicks on any of
the recommended videos or autoplay loads a new video
 User based A/B split
• Algorithms
 Original algorithm: previous recommendation logic
 GRU4Rec next best: N guesses for the next item
 GRU4Rec sequence: sequence of length N as the continuation of
the current session
o Greedy generation
A/B test - video service (2/3)
• Technical details
 User based A/B split
 GRU4Rec serving from a single GPU using a single thread
o Score computations for ~500K items in 1-2ms (next best)
 Constant retraining
o GRU4Rec: ~1.5 hours on ~30M events (including data collection and
preprocessing)
o Original logic: different parts with different frequency
 GRU4Rec falls back to the original logic if it can’t recommend or times
out
• KPIs (relative to the number of recommendation requests)
 Watch time
 Videos played
 Recommendations clicked
• Bots and power users are excluded from the KPI computations
A/B test - video service (3/3)
A/B test – long term effects (1/3)
• Watch time
0.96
0.98
1
1.02
1.04
1.06
Rel.improvement
(watchtime/rec)
Next best
Sequence
Watchtime/rec
Baseline
Next best
Sequence
A/B test – long term effects (2/3)
• GRU4Rec: strong generalization capabilities
 Finds hidden gems
 Unlike counting based approaches
o Not obvious in offline only testing
• Feedback loop
 Baseline trains also sees the feedback generated for recommendations of other groups
 Learns how to recommend hidden gems
• GRU4Rec maintains some lead
 New items are constantly uploaded
• Comparison of different countries
Watchtime/rec
(increase)
A/B test – long term effects (3/3)
• Videos played
 Next best and sequence switched places
 Sequence mode: great for episodic content, can suffer
otherwise
 Next best mode: more diverse, better for non-episodic
content
 Feedback loop: next best learns some of the episodic
recommendations
1
1.01
1.02
1.03
1.04
1.05
1.06
Relativeimprovement
(videosplayed/rec)
Next best
Sequence
A/B test - online marketplace
• Differences in setup
 On home page
 Next best mode only
 KPI: CTR
• Items have limited lifespan
 Will GRU4Rec keep its 19-20% lead?
CTR
Baseline
GRU4Rec
Thank you!
Q&A
Check out the code (free for research):
https://github.com/hidasib/GRU4Rec
Read the preprint: https://arxiv.org/abs/1706.03847

More Related Content

What's hot

Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsBenjamin Le
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsRoelof van Zwol
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksZimin Park
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Paolo Cremonesi
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 

What's hot (20)

Recommender system
Recommender systemRecommender system
Recommender system
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Session-Based Recommender Systems
Session-Based Recommender SystemsSession-Based Recommender Systems
Session-Based Recommender Systems
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 

Similar to GRU4Rec v2: Improved Recurrent Neural Networks for Session-Based Recommendations

Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
A deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouA deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouAnne(Lijie) Zhou
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102Linaro
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentGiuliana Carullo
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisGabor Szabo, CQE
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxSyed Zaid Irshad
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...ActiveEon
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learningSeungHyeok Baek
 

Similar to GRU4Rec v2: Improved Recurrent Neural Networks for Session-Based Recommendations (20)

Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
A deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouA deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhou
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence Alignment
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems Analysis
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 

More from Balázs Hidasi

Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelEgyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelBalázs Hidasi
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityBalázs Hidasi
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
 
Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Balázs Hidasi
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorizationBalázs Hidasi
 
Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Balázs Hidasi
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
 
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Balázs Hidasi
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...Balázs Hidasi
 
Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Balázs Hidasi
 
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)Balázs Hidasi
 
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)Balázs Hidasi
 
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)Balázs Hidasi
 

More from Balázs Hidasi (16)

Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelEgyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
 
Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
 
Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)
 
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
 
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
 
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
 

Recently uploaded

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 

Recently uploaded (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 

GRU4Rec v2: Improved Recurrent Neural Networks for Session-Based Recommendations

  • 1. GRU4Rec v2 Recurrent neural networks with top-k gains for session-based recommendations Balázs Hidasi @balazshidasi
  • 2. At a glance • Improvements on GRU4Rec [Hidasi et. al, 2015]  Session-based recommendations with RNNs • General  Negative sampling strategies  Loss function design • Specific  Constrained embeddings  Implementation details • Offline tests: up to 35% improvement • Online tests & observations
  • 4. Context: session-based recommendations • User identification  Feasibility  Privacy  Regulations • User intent  Disjoint sessions o Need o Situation (context) o „Irregularities” • Session-based recommendations • Permanent user cold-start
  • 5. Preliminaries • Next click prediction • Top-N recommendation (ranking) • Implicit feedback
  • 6. Recurrent Neural Networks • Basics  Input: sequential information ( 𝑥 𝑡 𝑡=1 𝑇 )  Hidden state (ℎ 𝑡): o representation of the sequence so far o influenced by every element of the sequence up to t  ℎ 𝑡 = 𝑓 𝑊𝑥 𝑡 + 𝑈ℎ 𝑡−1 + 𝑏 • Gated RNNs (GRU, LSTM & others)  Basic RNN is subject to the exploding/vanishing gradient problem  Use ℎ 𝑡 = ℎ 𝑡−1 + Δ 𝑡 instead of rewriting the hidden state  Information flow is controlled by gates • Gated Recurrent Unit (GRU)  Update gate (z)  Reset gate (r)  𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧  𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟  ℎ 𝑡 = tanh 𝑊𝑥 𝑡 + 𝑟𝑡 ∘ 𝑈ℎ 𝑡−1 + 𝑏  ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ ℎ 𝑡 ℎ r IN OUT z +
  • 7. GRU4Rec • GRU trained on session data, adapted to the recommendation task  Input: current item ID  Hidden state: session representation  Output: likelihood of being the next item • Session-parallel mini-batches  Mini-batch is defined over sessions  Update with one step BPTT o Lots of sessions are very short o 2D mini-batching, updating on longer sequences (with or without padding) didn’t improve accuracy • Output sampling  Computing scores for all items (100K – 1M) in every step is slow  One positive item (target) + several samples  Fast solution: scores on mini-batch targets o Items of the other mini-batch are negative samples for the current mini-batch • Loss functions: cross-entropy, BPR, TOP1 GRU layer One-hot vector Weighted output Scores on items f() One-hot vector ItemID (next) ItemID
  • 8. Negative sampling & loss function design
  • 9. Negative sampling • Training step  Score all items  Push target items forward (modify model parameters) • Many training steps & many items  Not scalable  𝑂 𝑆𝐼 𝑁+ • Sample negative examples instead  𝑂 𝐾𝑁+ • Sampling probability  Must be quick to calculate  Two popular choices o Uniform  many unnecessary steps o Proportional to support  better in practice, fast start, some relations are not examined enough  Optimal choice o Data dependent o Changing during training might help
  • 10. Mini-batch based negative sampling • Target items of other examples from the mini-batch  as negative samples • Pros  Efficient & simple implementation on GPU  Sampling probability proportional to support • Cons  Number of samples is tied to the batch size o Mini-batch training: smaller batches o Negative sampling: larger batches  Sampling probability is always the same 𝑖1,1 𝑖1,2 𝑖1,3 𝑖1,4 𝑖2,1 𝑖2,2 𝑖2,3 𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6 𝑖4,1 𝑖4,2 𝑖5,1 𝑖5,2 𝑖5,3 Session1 Session2 Session3 Session4 Session5 𝑖1,1 𝑖1,2 𝑖1,3 𝑖2,1 𝑖2,2 𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖4,1 𝑖5,1 𝑖5,2 Input Target … 𝑖1,2 𝑖1,3 𝑖1,4 𝑖2,2 𝑖2,3 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6 𝑖4,2 𝑖5,2 𝑖5,3 … 𝑖1 𝑖5 𝑖8 𝑦1 1 𝑦2 1 𝑦3 1 𝑦4 1 𝑦5 1 𝑦6 1 𝑦7 1 𝑦8 1 𝑦1 3 𝑦2 3 𝑦3 3 𝑦4 3 𝑦5 3 𝑦6 3 𝑦7 3 𝑦8 3 𝑦1 2 𝑦2 2 𝑦3 2 𝑦4 2 𝑦5 2 𝑦6 2 𝑦7 2 𝑦8 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
  • 11. Solution: add more samples! • Add extra samples • Shared between mini-batches • Sampling probability: 𝑝𝑖 ∼ supp 𝛼  𝛼 = 0  uniform  𝛼 = 1  popularity based • Implementation trick  Sampling interrupts GPU computations  More efficient in parallel  Sample store (cache) o Precompute 10-100M samples o Resample is we used them all
  • 12. Another look the loss functions • Listwise losses on the target+negative samples  Cross-entropy + softmax o Cross-entropy in itself is pointwise o Target should have the maximal score o 𝑋𝐸 = − log 𝑠𝑖 , 𝑠𝑖 = 𝑒 𝑟 𝑖 𝑗=1 𝐾 𝑒 𝑟 𝑗 o Unstable in previous implementation – Rounding errors – Fixed  Average BPR-score o BPR in itself is pairwise o Target should have higher score than all negative samples o 𝐵𝑃𝑅 = − 1 𝐾 𝑗=1 𝐾 log 𝜎 𝑟𝑖 − 𝑟𝑗  TOP1 score o Heuristic loss, idea is similar to the average BPR o Score regularization part o 𝑇𝑂𝑃1 = 1 𝐾 𝑗=1 𝐾 𝜎 𝑟𝑗 − 𝑟𝑖 + 𝜎 𝑟𝑗 2 • Earlier results  Similar performance  TOP1 is slightly better
  • 13. Unexpected behaviour of pairwise losses • Unexpected behaviour  Several negative samples  good results  Many negative samples  bad results • Gradient (BPR wrt. target score)  𝜕𝐿 𝑖 𝜕𝑟 𝑖 = 1 𝐾 𝑗=1 𝐾 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 • Irrelevant negative sample  Whose score is already lower than 𝑟𝑖 o Changes during training  Doesn’t contribute to optimization: 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 ∼ 0  Number of irrelevant samples increases as training progresses • Averaging with many negative samples  gradient vanishes  Target will be pushed up for a while  Slows down as approaches the top  More samples: slows down earlier
  • 14. Pairwise-max loss functions • The target score should be higher than the maximum score amongst the negative samples • BPR-max  𝑃 𝑟𝑖 > 𝑟MAX|𝜃 = 𝑗=1 𝐾 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 𝑃 𝑟𝑗 = 𝑟MAX|𝜃  Use continuous approximations o 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 = 𝜎 𝑟𝑖 − 𝑟𝑗 o 𝑃 𝑟𝑗 = 𝑟MAX|𝜃 = softmax 𝑟𝑘 𝑘=1 𝐾 = 𝑒 𝑟 𝑗 𝑘=1 𝐾 𝑒 𝑟 𝑘 = 𝑠𝑗 – Softmax over negative samples only  Minimize negative log probability  Add ℓ2 score regularization • 𝐿𝑖 = − log 𝑗=1 𝐾 𝑠𝑗 𝜎 𝑟𝑖 − 𝑟𝑗 + 𝜆 𝑗=1 𝐾 𝑟𝑗 2
  • 15. Gradient of pairwise max losses • BPR-max (wrt. 𝑟𝑖)  𝜕𝐿 𝑖 𝜕𝑟 𝑖 = 𝑗=1 𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 1−𝜎 𝑟 𝑖−𝑟 𝑗 𝑗=1 𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗  Weighted average of BPR gradients o Relative importance of samples: 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 𝑠 𝑘 𝜎 𝑟 𝑖−𝑟 𝑘 = 𝑒 𝑟 𝑗+𝑒 𝑟 𝑗+𝑟 𝑘−𝑟 𝑖 𝑒 𝑟 𝑘+𝑒 𝑟 𝑗+𝑟 𝑘−𝑟 𝑖 o Smoothed softmax – If 𝑟𝑖 ≫ max 𝑟𝑗, 𝑟𝑘  behaves like softmax – Stronger smoothing otherwise – Uniform  softmax as 𝑟𝑖 is pushed to the top
  • 16. Number of samples: training times & performance • Training times on GPU don’t increase until the parallelization limit is reached • Around the same place • Significant improvements up to a certain point • Diminishing returns after
  • 17. The effect of the α parameter • Data & loss dependent  Cross-entropy: favours lower values  BPR-max: 0.5 is usually a good choice (data dependent)  There is always a popularity based part of the samples o Original mini-batch examples o Removing these will result in higher optimal 𝛼 • CLASS dataset (cross-entropy, BPR-max)
  • 19. Unified item representations in GRU4Rec (1/2) • Hidden state x 𝑊𝑦  scores  One vector for each item  „item feature matrix” • Embedding  A vector for each item  another „item feature matrix”  Can be used as the input of the GRU layer instead of the one-hot vector  Slightly decreases offline metrics • Constrained embeddings (unified item representations)  Use the same matrix for both input and output embedding  Unified representations  faster convergence  Embedding size tied to hidden the size of the last hidden state
  • 20. Unified item representations in GRU4Rec (2/2) • Model size: largest components scale with the items  One-hot input: 𝑋 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0 , 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻  Embedding input: 𝑋/2 o 𝐸, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0  𝑆 𝐸 × 𝑆 𝐻 o 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻  Constrained embedding: 𝑋/4 o 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0 , 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻 GRU layer H ItemID dot(𝐻,𝑊𝑦) scores GRU layer H ItemIDs dot(𝐻,𝑊𝑦) scores E[ItemIDs ] embedding GRU layer H ItemIDs dot(𝐻,𝑊𝑦) scores 𝑊𝑦[ItemIDs] embedding
  • 21. Offline results • Over item-kNN  +25-52% in recall@20  +35-55% in MRR@20 • Over the original GRU4Rec  +18-35% in recall@20  +27-37% in MRR@20 • BPR-max vs. (fixed) cross-entropy  +2-6% improvement in 2 of 4 cases  No statistically significant difference in the other 2 case • Constrained embedding  Most cases: slightly worse MRR & better recall  Huge improvements on the CLASS datasset (+18.74% in recall, +29.44% in MRR)
  • 23. A/B test - video service (1/3) • Setup  Video page  Recommended videos on a strip  Autoplay functionality  Recommendations are NOT recomputed if the user clicks on any of the recommended videos or autoplay loads a new video  User based A/B split • Algorithms  Original algorithm: previous recommendation logic  GRU4Rec next best: N guesses for the next item  GRU4Rec sequence: sequence of length N as the continuation of the current session o Greedy generation
  • 24. A/B test - video service (2/3) • Technical details  User based A/B split  GRU4Rec serving from a single GPU using a single thread o Score computations for ~500K items in 1-2ms (next best)  Constant retraining o GRU4Rec: ~1.5 hours on ~30M events (including data collection and preprocessing) o Original logic: different parts with different frequency  GRU4Rec falls back to the original logic if it can’t recommend or times out • KPIs (relative to the number of recommendation requests)  Watch time  Videos played  Recommendations clicked • Bots and power users are excluded from the KPI computations
  • 25. A/B test - video service (3/3)
  • 26. A/B test – long term effects (1/3) • Watch time 0.96 0.98 1 1.02 1.04 1.06 Rel.improvement (watchtime/rec) Next best Sequence Watchtime/rec Baseline Next best Sequence
  • 27. A/B test – long term effects (2/3) • GRU4Rec: strong generalization capabilities  Finds hidden gems  Unlike counting based approaches o Not obvious in offline only testing • Feedback loop  Baseline trains also sees the feedback generated for recommendations of other groups  Learns how to recommend hidden gems • GRU4Rec maintains some lead  New items are constantly uploaded • Comparison of different countries Watchtime/rec (increase)
  • 28. A/B test – long term effects (3/3) • Videos played  Next best and sequence switched places  Sequence mode: great for episodic content, can suffer otherwise  Next best mode: more diverse, better for non-episodic content  Feedback loop: next best learns some of the episodic recommendations 1 1.01 1.02 1.03 1.04 1.05 1.06 Relativeimprovement (videosplayed/rec) Next best Sequence
  • 29. A/B test - online marketplace • Differences in setup  On home page  Next best mode only  KPI: CTR • Items have limited lifespan  Will GRU4Rec keep its 19-20% lead? CTR Baseline GRU4Rec
  • 30. Thank you! Q&A Check out the code (free for research): https://github.com/hidasib/GRU4Rec Read the preprint: https://arxiv.org/abs/1706.03847