Cao et al. ICML 2010
Presented by Danushka Bollegala.
 Predict links (relations) between entities
 Recommend items for users (MovieLens, Amazon)
 Recommend users for users (...
Transfer Learning+
Collective Link Prediction
(this paper)
Gaussian
Process for Regression
(GPR)
(PRML Sec. 6.4)
Link pred...
 Link matrix X (xi,j is the rating given by user I to item j)
 Xi,j is modeled by f(ui, vj, ε)
 f: link function
 ui: ...
Revision (PRML Section 6.4)
 We can view a function as an infinite dimensional
vector
 f(x): (f(x1), f(x2),...)T
 Each point in the domain is mappe...
 Linear regression model
 We get different output functions y for
different weight vectors w.
 Let us impose a Gaussian...
 When we impose a Gaussian prior over the
weight vector, then the target y is also
Gaussian.
 K: Kernel matrix (Gram mat...
 Gaussian process is defined as a probability
distribution over functions y(x) such that the set
of values y(x) evaluated...
 Predict outputs with noise
x y
e
t
 PMF can be seen as a Gaussian Process with latent variables
(GP-LVM) [Lawrence & Utrasun ICML 2009]
Generalized matrix a...
 GP model for each task
 A single model for all tasks
 Known as Kronecker product for two
matrices (e.g., numpy,kron(a,b))
 Each task might have a different rating
distribution.
 c, α, b are parameters that must be estimated
from the data.
 W...
 Similar to GPR prediction
 Predicting y= g(x)
 Predicting x
 Compute the likelihood of the dataset
 Use Stochastic Gradient Descent for
optimization
 Non-convex optimization
 Sen...
 Setting
 Use each dataset and predict multiple items
 Datasets
 MovieLens
▪ 100000 ratings, 1-5 scale ratings, 943 us...
 Evaluation measure
 Mean Absolute Error (MAE)
 Baselines
 I-GP: Independent Link Prediction using GP
 CMF: Collectiv...
Note: (1) Smaller values are better
(2) with(+)/without(-) link function.
Good
 Romance and Drama are very similar
 Action and Comedy are very dissimilar
 Elegant model and well-written paper
 Few parameters (latent space dimension k)
need to be specified
 All other parame...
Transfer learningforclp
Transfer learningforclp
Upcoming SlideShare
Loading in...5
×

Transfer learningforclp

1,254

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,254
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transfer learningforclp

  1. 1. Cao et al. ICML 2010 Presented by Danushka Bollegala.
  2. 2.  Predict links (relations) between entities  Recommend items for users (MovieLens, Amazon)  Recommend users for users (social recommendation)  Similarity search (suggest similar web pages)  Query suggestion (suggest related queries by other users)  Collective Link Prediction (CLP)  Perform multiple prediction tasks for the same set of users simultaneously ▪ Predict/recommend multiple item types (books and movies)  Pros  Prediction tasks might not be independent, one can benefit from another (books vs. movies vs. food)  Less affected by data sparseness (cold start problem)
  3. 3. Transfer Learning+ Collective Link Prediction (this paper) Gaussian Process for Regression (GPR) (PRML Sec. 6.4) Link prediction = matrix factorization Probabilistic Principal Component Analysis (PPCA) (Bishop &Tipping, 1999) PRML Chapter 12. Probabilistic non-linear matrix factorization Lawrence & Utrasun, ICML 2009 Task similarity Matrix,T
  4. 4.  Link matrix X (xi,j is the rating given by user I to item j)  Xi,j is modeled by f(ui, vj, ε)  f: link function  ui: latent representation of a user i  vj: latent representation of an item j  ε: noise term  Generalized matrix approximation  Assumption: E is Gaussian noise N(0, σ2I)  Use Y = f-1(X)  Then, Y follows a multivariate Gaussian distribution.
  5. 5. Revision (PRML Section 6.4)
  6. 6.  We can view a function as an infinite dimensional vector  f(x): (f(x1), f(x2),...)T  Each point in the domain is mapped by f to a dimension in the vector  In machine learning we must find functions (e.g. linear predictors) that map input values to their corresponding output values  We must also avoid over-fitting  This can be visualized as sampling from a distribution over functions with certain properties  Preference bias (cf. restriction bias)
  7. 7.  Linear regression model  We get different output functions y for different weight vectors w.  Let us impose a Gaussian prior over w  Train dataset: {(x1,y1),...,(xN,yN)}  Targets: y=(y1,...,yN)T  Design matrix
  8. 8.  When we impose a Gaussian prior over the weight vector, then the target y is also Gaussian.  K: Kernel matrix (Gram matrix)  k: kernel function
  9. 9.  Gaussian process is defined as a probability distribution over functions y(x) such that the set of values y(x) evaluated at an arbitrary set of points x1,...,xN jointly have a Gaussian distribution.  p(x1,...,xN) is Gaussian.  Often the mean is set to zero  Non-informative prior  Then the kernel function fully defines the GP.  Gaussian kernel:  Exponential Kernel:
  10. 10.  Predict outputs with noise x y e t
  11. 11.  PMF can be seen as a Gaussian Process with latent variables (GP-LVM) [Lawrence & Utrasun ICML 2009] Generalized matrix approximation model Y=f-1(X) follows a multivariate Gaussian distribution A Gaussian prior is set on U Probabilistic PCA model by Tipping & Bishop (1999) Non-linear version Mapping back to X
  12. 12.  GP model for each task  A single model for all tasks
  13. 13.  Known as Kronecker product for two matrices (e.g., numpy,kron(a,b))
  14. 14.  Each task might have a different rating distribution.  c, α, b are parameters that must be estimated from the data.  We can relax the constraint α > 0 if we have no prior knowledge regarding the negativity of the skewness of the rating distribution.
  15. 15.  Similar to GPR prediction  Predicting y= g(x)  Predicting x
  16. 16.  Compute the likelihood of the dataset  Use Stochastic Gradient Descent for optimization  Non-convex optimization  Sensitive to initial conditions
  17. 17.  Setting  Use each dataset and predict multiple items  Datasets  MovieLens ▪ 100000 ratings, 1-5 scale ratings, 943 users, 1682 movies, 5 popular genres  Book-Crossing ▪ 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4 most general Amazon book categories  Douban ▪ A social network-based recommendation serivce ▪ 10000 users, 200000 items ▪ Movies, books, music
  18. 18.  Evaluation measure  Mean Absolute Error (MAE)  Baselines  I-GP: Independent Link Prediction using GP  CMF: Collective matrix factorization ▪ non GP, classical NMF  M-GP: Joint Link prediction using multi-relational GP ▪ Does not consider the similarity between tasks  Proposed method = CLP-GP
  19. 19. Note: (1) Smaller values are better (2) with(+)/without(-) link function.
  20. 20. Good
  21. 21.  Romance and Drama are very similar  Action and Comedy are very dissimilar
  22. 22.  Elegant model and well-written paper  Few parameters (latent space dimension k) need to be specified  All other parameters can be learnt  Applicable to a wide range of tasks  Cons:  Computational complexity ▪ Predictions require kernel matrix inversion ▪ SGD updates might not converge ▪ The problem is non-convex...
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×