Probabilistic Models in Recommender Systems: Time Variant Models

2015-12-10
Eliezer de Souza da Silva (State-space models, Dynamic PMF vis HDP)
Tomasz Kuśmierczyk (Tensor factorization)
Session 3: Time variant models
Tensor factorization
State-space models
Dynamic Bayesian PMF (via HDP)
Approximate and Scalable Inference for Complex
Probabilistic Models in Recommender Systems
Part 1: Models and Representations

Literature / Sources
● Temporal Collaborative Filtering with Bayesian Probabilistic Tensor
Factorization.-- Xiong, L., Chen, X., Huang, T. K., Schneider, J. G., &
Carbonell, J. G. 2010. SDM Proceedings.
● Dynamic Matrix Factorization: A State-Space Approach -- John Z. Sun, Kush
R. Varshney and Karthik Subbian. 2012. ICASSP.
● Dynamic Bayesian Probabilistic Matrix Factorization -- Sotirios P. Chatzis.
2014. AAAI.

Temporal Collaborative Filtering
with
Bayesian Probabilistic Tensor Factorization

Matrix Factorization (previous cases)
M Items
NUsers
latent 1 latent D
Ratings (normalized)

Matrix Factorization (previous cases)
Users
(N x D)
Items
(M x D)

Tensors generalization (multi-way data)
- P-mode tensor of dimensions M1 x … x Mp (example: observations x
measurements x time x equipments).
- Multiple relationships between multidimensional variables
- Focus on 3-way (canonical decomposition or parallel factor analysis - CP)

CP Tensor Factorization (current case: 3 way
analysis)
M Items
NUsers
K
Contexts
latent 1 latent D

CP Tensor Factorization (current case)
Users
(N x D)
Items
(M x D) Context values
(K x D)

M Items
NUsers
K
Contexts
latent 1 latent D
CP Tensor Factorization (current case)

Temporal ...
● 1 additional type of contexts = time
(3D tensor instead of 2D matrix R)
● In practice:
○ ECCO sales: two context values per season (early/late
season)
○ Netflix, Movielens: one context value per month

MAP Approach: what’s new to PMF

MAP Approach
argmax log p(U,V,T,T0| R)
argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)

MAP Approach
argmax log p(U,V,T,T0| R)
argmax log p(R|U,V,T,T0) + log p(U,V,T,T0)
argmax

MAP Approach
● Four params (lambdas)
● SGD
● Block Coordinate Descent

Predictions for
unobserved
Integrate over all params
A posteriori
distribution of
params
Observed
evidence
Bayesian approach: Expectation over posterior dist

Bayesian approach: MCMC estimate
Sample from
posterior
distribution

Linear state-space approach
- User latent factors are time dependent
- gaussian assumptions for the dynamics allows exact inference

- User latent factors are time dependent
- User latent factors are hidden states in a state-space system
time dependent
user features

- items latent factors are stationary
- ratings are time dependent and observed
Stationary items
factors
time dependent
ratings
time dependent
user features

Kalman filters: combining new information

System dynamics
Prediction
Kalman gain
Update

PMF meets Kalman
Stationary items
factors
time dependent
ratings
time dependent
user features

PMF meets Kalman
- Parameters are time-independent
- Initial state iid zero mean gaussian for all users with similar scaling of preferences σU
- process (time evolution of user preferences) and measurement (estimation of rating from user and item latent
factors) noise are iid zero mean gaussians, σQ
,σR
- Transitions (A) and measurements (items latent factors H) can be calculated to maximize the log-likelihood.

PMF meets Kalman: learning the parameters
- EM with expected joint likelihood maximization
- Other approaches: minimizing the residual prediction error, maximizing the prediction likelihood, maximizing the
measurement likelihood, optimizing the performance after smoothing.

Dynamic Bayesian Probabilistic Matrix Factorization

Dynamic Bayesian Probabilistic Matrix Factorization
- User patterns changing over time
- Groups of users share latent structure (clustering of user features)
- Capture the dynamics of the generative process of the group structure
- dHDP - dynamic hierarchical dirichlet process

Dirichlet process
- Distribution of distributions (infinite distribution of discrete distributions)
- Clustering effect: rich gets richer
- Chinese Restaurant process.

Hierarchical Dirichlet Process (HDP)

Bayesian PMF
dHDP
Groups of users
Bayesian PMF

Probabilistic Models in Recommender Systems: Time Variant Models

Recommended

Recommended

More Related Content

Similar to Probabilistic Models in Recommender Systems: Time Variant Models

Similar to Probabilistic Models in Recommender Systems: Time Variant Models (20)

More from Tomasz Kusmierczyk

More from Tomasz Kusmierczyk (8)

Recently uploaded

Recently uploaded (20)

Probabilistic Models in Recommender Systems: Time Variant Models