Your SlideShare is downloading. ×
Transfer learningforclp
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Transfer learningforclp

1,200

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,200
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cao et al. ICML 2010 Presented by Danushka Bollegala.
  • 2.  Predict links (relations) between entities  Recommend items for users (MovieLens, Amazon)  Recommend users for users (social recommendation)  Similarity search (suggest similar web pages)  Query suggestion (suggest related queries by other users)  Collective Link Prediction (CLP)  Perform multiple prediction tasks for the same set of users simultaneously ▪ Predict/recommend multiple item types (books and movies)  Pros  Prediction tasks might not be independent, one can benefit from another (books vs. movies vs. food)  Less affected by data sparseness (cold start problem)
  • 3. Transfer Learning+ Collective Link Prediction (this paper) Gaussian Process for Regression (GPR) (PRML Sec. 6.4) Link prediction = matrix factorization Probabilistic Principal Component Analysis (PPCA) (Bishop &Tipping, 1999) PRML Chapter 12. Probabilistic non-linear matrix factorization Lawrence & Utrasun, ICML 2009 Task similarity Matrix,T
  • 4.  Link matrix X (xi,j is the rating given by user I to item j)  Xi,j is modeled by f(ui, vj, ε)  f: link function  ui: latent representation of a user i  vj: latent representation of an item j  ε: noise term  Generalized matrix approximation  Assumption: E is Gaussian noise N(0, σ2I)  Use Y = f-1(X)  Then, Y follows a multivariate Gaussian distribution.
  • 5. Revision (PRML Section 6.4)
  • 6.  We can view a function as an infinite dimensional vector  f(x): (f(x1), f(x2),...)T  Each point in the domain is mapped by f to a dimension in the vector  In machine learning we must find functions (e.g. linear predictors) that map input values to their corresponding output values  We must also avoid over-fitting  This can be visualized as sampling from a distribution over functions with certain properties  Preference bias (cf. restriction bias)
  • 7.  Linear regression model  We get different output functions y for different weight vectors w.  Let us impose a Gaussian prior over w  Train dataset: {(x1,y1),...,(xN,yN)}  Targets: y=(y1,...,yN)T  Design matrix
  • 8.  When we impose a Gaussian prior over the weight vector, then the target y is also Gaussian.  K: Kernel matrix (Gram matrix)  k: kernel function
  • 9.  Gaussian process is defined as a probability distribution over functions y(x) such that the set of values y(x) evaluated at an arbitrary set of points x1,...,xN jointly have a Gaussian distribution.  p(x1,...,xN) is Gaussian.  Often the mean is set to zero  Non-informative prior  Then the kernel function fully defines the GP.  Gaussian kernel:  Exponential Kernel:
  • 10.  Predict outputs with noise x y e t
  • 11.  PMF can be seen as a Gaussian Process with latent variables (GP-LVM) [Lawrence & Utrasun ICML 2009] Generalized matrix approximation model Y=f-1(X) follows a multivariate Gaussian distribution A Gaussian prior is set on U Probabilistic PCA model by Tipping & Bishop (1999) Non-linear version Mapping back to X
  • 12.  GP model for each task  A single model for all tasks
  • 13.  Known as Kronecker product for two matrices (e.g., numpy,kron(a,b))
  • 14.  Each task might have a different rating distribution.  c, α, b are parameters that must be estimated from the data.  We can relax the constraint α > 0 if we have no prior knowledge regarding the negativity of the skewness of the rating distribution.
  • 15.  Similar to GPR prediction  Predicting y= g(x)  Predicting x
  • 16.  Compute the likelihood of the dataset  Use Stochastic Gradient Descent for optimization  Non-convex optimization  Sensitive to initial conditions
  • 17.  Setting  Use each dataset and predict multiple items  Datasets  MovieLens ▪ 100000 ratings, 1-5 scale ratings, 943 users, 1682 movies, 5 popular genres  Book-Crossing ▪ 56148 ratings, 1-10 scale, 28503 users, 9909 books, 4 most general Amazon book categories  Douban ▪ A social network-based recommendation serivce ▪ 10000 users, 200000 items ▪ Movies, books, music
  • 18.  Evaluation measure  Mean Absolute Error (MAE)  Baselines  I-GP: Independent Link Prediction using GP  CMF: Collective matrix factorization ▪ non GP, classical NMF  M-GP: Joint Link prediction using multi-relational GP ▪ Does not consider the similarity between tasks  Proposed method = CLP-GP
  • 19. Note: (1) Smaller values are better (2) with(+)/without(-) link function.
  • 20. Good
  • 21.  Romance and Drama are very similar  Action and Comedy are very dissimilar
  • 22.  Elegant model and well-written paper  Few parameters (latent space dimension k) need to be specified  All other parameters can be learnt  Applicable to a wide range of tasks  Cons:  Computational complexity ▪ Predictions require kernel matrix inversion ▪ SGD updates might not converge ▪ The problem is non-convex...

×