Your SlideShare is downloading. ×
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

1,056

Published on

Presentation slides from the International Conference on Web Intelligence 2014.

Presentation slides from the International Conference on Web Intelligence 2014.

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,056
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. SemanticSVD++: INCORPORATING SEMANTIC TASTE EVOLUTION FOR PREDICTING RATINGS DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK International Conference on Web Intelligence 2014 Warsaw, Poland
  • 2. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 1 1 2 3 1 4* 4* 2* 2 5* ? 1* 3 5* 4* 1* 1 2 3 1 4* 4* 2* 2 5* 4* 1* 3 5* 4* 1*Induce Model and Predict Ratings Predicting Ratings
  • 3. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 2 1 … f 1 2 3 1 2 3 1 4* 4* 2* 2 5* ? 1* 3 5* 4* 1* 1 2 3 1 … f ≈ Latent Factor Models: Factor Consistency Problem •  Cannot ‘accurately’ align latent factors •  Cannot tell how users’ taste have evolved F = #factors (a priori) Time ? ? ? ?
  • 4. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 3 1 … c 1 2 3 1 2 3 1 4* 4* 2* 2 5* ? 1* 3 5* 4* 1* ≈ i <URI> {<SKOS_CATEGORY>} Time c = Dimensionality of category space Solution: Semantic Categories Preference for category c at time s √" √"
  • 5. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 4 Semantic Alignment of Datasets SPARQL Query for Candidate URIs from Movie’s title Get Semantic Categories of each candidate Disambiguate based on Movie’s YearFor each movie item {(ItemID,<URI>)} May Jul Sep Nov Jan Time Numberof 040, (a) MovieLens Mar May Jul Sep Time Numberof 0400 (b) MovieTweetings Fig. 1. Distribution of reviews per day across the MovieLens and Movi- eTweetings datasets. The first dashed blue line indicates the cutoff point for the training set, and the dashed red line indicates the cutoff point for the test set - i.e. every rating after that point is placed in the test set. The validation set contains the ratings between the blue and red dashed lines. released in 1979, which we shall now use as a running example, the following categories are found: <h t t p : / / dbpedia . org / r e s o u r c e / Alien ( film)> dcterms : s u b j e c t c a t e g o r y : Alien ( f r a n c h i s e ) f i l m s ; dcterms : s u b j e c t c a t e g o r y :1979 h o r r o r f i l m s . In this work we use DBPedia URIs, given their relation th B re o it d w a a th in w T in e li fo th la
  • 6. Semantic alignment = fewer elements Time-ordered datasets split for experiments: •  80%/10%/10% for training/validation/testing SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 5 Reduced Recommendation Datasets RI that w can hod for e used stances ained a e URI he set matches e from Leven- iprocal hold to the dataset. We also note that the reduction in the number of ratings is not as great, this suggests two things: (i) mapped items are popular, and thus dominate the ratings; and (ii) obscure items are present within the data. TABLE I. STATISTICS OF THE REVISED REVIEW DATASETS USED FOR OUR ANALYSIS AND EXPERIMENTS. REDUCTION OVER THE ORIGINAL DATASETS ARE SHOWN IN PARENTHESES. Dataset #Users #Items #Ratings MovieLens 5,390 (-11%) 3,231 (-12.1%) 841,602 (-6.7%) MovieTweetings 2,357 (-89%) 7,913 (-30.8%) 73,397 (-38.2%) Total 7,747 11,144 914,999 As Table I suggests, certain more ‘obscure’ movies do not have DBPedia URIs; despite our use of the most recent DBPedia datasets (i.e. version 3.9) coverage is still limited in certain places. The reason for this lack of coverage for certain items is largely due to the obscurity of the film not having a wikipedia page. For instance, for the MovieLens dataset we fail to map the three movies ‘Never Met Picasso’, ‘Diebinnen’ and ‘Follow the Bitch’, despite these films having IMDB pages they have no wikipedia page, and hence no DBPedia entry. For the Movie Tweetings dataset we fail to map ‘Summer Coda’ and Hipster Dilemma: Occurs when obscure movie items cannot be aligned to semantic web URIs!
  • 7. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 6 Forming Semantic Taste Profiles Split user’s training ratings into 5-stages Derive the user’s average rating per semantic category Calculate the probability of the user rating the category highly For each stage… Pu s d how their tastes have evolved, at this e. From this point onwards we reserve aracters for set notations, as follows: s, and i, j denote items. own rating value (where r 2 [1, 5] or ˆr denotes a predicted rating value. rovided as quadruples of the form where t denotes the time of the rating, nted into training (Dtrain), validation st (Dtest) sets by the above mentioned mantic category that an item has been cats(i) is a convenience function that of semantic categories of item i. les es describe the preferences that a user time for given semantic categories. derstanding how a profile at one point profile at an earlier point in time, taste evolution has taken place. In y and Leskovec [5] the assessment of n the context of review platforms (e.g. Review) demonstrated the propensity From these definitions we then derived the discrete prob- ability distribution of the user rating the category favourably as follows, defining the set Cu,s train as containing all unique categories of items rated by u in stage s: Pr(c|Du,s train) = avrating(Du,s,c train) X c02Cu,s train avrating(Du,s,c0 train ) (4) When implementing this approach, we only consider the categories that item URIs are directly mapped to; that is, only those categories that are connected to the URI by the dbterms:subject predicate. Prior work by Ostuni et al. [8] performed a mapping where grandparent categories were mapped to URIs, however we chose the parent categories in this instance to open up the possibility of other mappings in the future - i.e. via linked data node vertex kernels. B. User Taste Evolution: From Prior Taste Profiles We now turn to looking at the evolution of users’ tastes over time in order to understand how their preferences change. Given our use of probability distributions to model the lifecycle stage specific taste profile of each user, we can apply infor- mation theoretic measures based on information entropy. One such measure is conditional entropy, it enables one to assess the user’s ratings distribution per semantic category within the allotted time window (provided by the lifecycle stage of the user as this denotes a closed interval - i.e. s = [t, t0 ], t < t0 ). We formed a discrete probability distribution for category c at time period s 2 S (where S is the set of 5 lifecycle stages) by interpolating the user’s ratings within the distribution. We first defined two sets, the former (Du,s,c train) corresponding to the ratings by u during period/stage s for items from category c, and the latter (Du,s train) corresponding to ratings by u during s, hence Du,s,c train ✓ Du,s train, these sets are formed as follows: Du,s,c train = {(u, i, r, t) : (u, i, r, t) 2 Dtrain, t 2 s, c 2 cats(i)} (1) Du,s train = {(u, i, r, t) : (u, i, r, t) 2 Dtrain, t 2 s} (2) We then defined the function avrating to derive the average rating value from all rating quadruples in a given set: avrating(Du,s train) = 1 |Du,s train| X (u,i,r,t)2Du,s train r (3) the increase in users’ lifecycles however the inc the semantic ta previous prefer have follow the categories C. User Taste E Our second in general hav modelling user- development as entropy to asses step (s) has bee and global taste (s 1). For the probability dist
  • 8. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 7 Taste Evolution from Taste Profiles 0.2250.2350.245 Lifecycle Stages ConditionalEntropy ● ● ● ● 1 2 3 4 5 (a) MovieLens 0.2750.2800.2850.290 Lifecycle Stages ConditionalEntropy ● ● ● ● 1 2 3 4 5 (b) MovieTweetings Fig. 2. Conditional entropy between consecutive lifecycle stages (e.g. H(P2|P3)) across the datasets, together with the bounds of the 95% con- fidence interval for the derived means. users who posted ratings within the time interval of stage s. Now, assume that we have a random variable that describes the local categories that have been reviewed at the current stage (Ys), a random variable of local categories at the previous stage (Ys 1). and a third random variable of global categories at the previous stage (Xs 1), we then define the transfer entropy of one lifecycle stage to another as follows [11]: TX!Y = H(Ys|Ys 1) H(Ys|Ys 1, Xs 1) (6) Fig H( fide na lie is gre Prior Tastes Comparison •  Computed conditional entropy between consecutive profiles •  Increase: divergence from prior tastes •  Both datasets’ users diverge from prior tastes cle Stages ● ● ● 3 4 5 ieLens 0.2750.2800.2850.290 Lifecycle Stages ConditionalEntropy ● ● ● ● 1 2 3 4 5 (b) MovieTweetings entropy between consecutive lifecycle stages (e.g. datasets, together with the bounds of the 95% con- derived means. 0.1200.1220.124 Lifecycle Stages TransferEntropy ● ● ● ● 1 2 3 4 5 (a) MovieLens 0.1120.1140.116 Lifecycle Stages TransferEntropy ● ● ● ● 1 2 3 4 5 (b) MovieTweetings Fig. 3. Transfer entropy between consecutive lifecycle stages (e.g. H(P2|P3)) across the datasets, together with the bounds of the 95% con- fidence interval for the derived means. Global Influence •  Computed transfer entropy of how global tastes have influenced users tastes •  Decrease: global tastes have a stronger influence than prior tastes •  Difference between datasets in global influence’s role
  • 9. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 8 Putting it all together: SemanticSVD++! 95% con- stage s. ibes the nt stage us stage es at the tropy of (6) alculate al prob- ariables (7) H(P2|P3)) across the datasets, together with the bounds of the 95% con- fidence interval for the derived means. named SemanticSV D++ , an extension of Koren et al.’s ear- lier SV D++ model [2]. The predictive function of the model is shown in full in Eq. 8, we now explain each component in greater detail. ˆrui = Static Biases z }| { µ + bi + bu + Category Biases z }| { ↵ibi,cats(i) + ↵ubu,cats(i) + Personalisation Component z }| { q| i pu + |R(u)| 1 2 X j2R(u) yj + |cats(R(u))| 1 2 X c2cats(R(u)) zc ! (8) A. Static Biases Modified version of SVD++ with: •  User taste evolution captured in semantic category biases •  Semantic personalisation component c latent factor vectors for each of the rated categories by the user
  • 10. egan from: 1 4 k 4X s=k Qs+1(c) Qs(c) Qs(c) (9) then calculated the conditional probability y being rated highly by accounting for the ng preference for the category as follows: +|c) = Prior Rating z }| { Q5(c) + Change Rate z }| { cQ5(c) (10) his over all categories for the item i we can ving item bias from the provided training i) = 1 |cats(i)| X c2cats(i) Pr(+|c) (11) Towards Categories: In the previous sec- er-user discrete probability distributions that bility of the user u rating a given category c Given that a single item can be linked to many categories on the web of linked data, we take the average across all categories as the bias of the user given the categories of the item: bu,cats(i) = 1 |cats(i)| X c2cats(i) Pr(+|c, u) (15) Other schemes for calculating the biases towards categories (both item and user) could be used, e.g. choosing the maximum bias, however we use the average as an initial scheme. 3) Weighting Category Biases: The above category biases are derived as static features within the recommendation model (Eq. 8) mined from the provided training portion, however each user may be influenced by these factors in different ways when performing their ratings. To this end we included two weights, one for each category bias, defined as ↵i and ↵u for the item biases to categories and the user biases to categories respectively. As we will explain below, these weights are then learnt during the training phase of inducing the model. C. Personalisation Component SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 9 shown in full in Eq. 8, we now explain each component in eater detail. ˆrui = Static Biases z }| { µ + bi + bu + Category Biases z }| { ↵ibi,cats(i) + ↵ubu,cats(i) + Personalisation Component z }| { q| i pu + |R(u)| 1 2 X j2R(u) yj + |cats(R(u))| 1 2 X c2cats(R(u)) zc ! (8) Static Biases The static biases include the general bias of the given taset (µ), which is the mean rating score across all ratings category c began from: c = 1 4 k 4X s=k Qs+1(c) Qs(c) Qs(c) (9) om this we then calculated the conditional probability given category being rated highly by accounting for the e rate of rating preference for the category as follows: Pr(+|c) = Prior Rating z }| { Q5(c) + Change Rate z }| { cQ5(c) (10) y averaging this over all categories for the item i we can ate the evolving item bias from the provided training ent: bi,cats(i) = 1 |cats(i)| X c2cats(i) Pr(+|c) (11) Given that a single item ca on the web of linked data, w categories as the bias of the u item: bu,cats(i) = 1 |cats(i) Other schemes for calculati (both item and user) could be u bias, however we use the aver 3) Weighting Category Bia are derived as static features w (Eq. 8) mined from the prov each user may be influenced b when performing their ratings weights, one for each category the item biases to categories a respectively. As we will expla transfer entropy for each user over time and modelling this as global influence factor u . We derive this as follows, based o measuring the proportional change in transfer entropy startin from lifecycle period k that produced a monotonic increase o decrease in transfer entropy: u = 1 4 k 4X s=k T s+1|s Q!P T s|s 1 Q!P T s|s 1 Q!P (13 By combining the average change rate ( u c ) of the use highly rating a given category c with the global influence facto ( u ), we then derived the conditional probability of a use rating a given category highly as follows, where Pu 5 denote the taste profile of the user observed for the final lifecycl stage (5): Pr(+|c, u) = Prior Rating z }| { Pu 5 (c) + Change Rate z }| { u c Pu 5 (c) + Global Influence z }| { u Q5(c) (14 Of global category rating probability Average change in Transfer Entropy of the User Incorporating Taste Evolution with Biases From this we then calculated the conditional probability of a given category being rated highly by accounting for the change rate of rating preference for the category as follows: Pr(+|c) = Prior Rating z }| { Q5(c) + Change Rate z }| { cQ5(c) (10) By averaging this over all categories for the item i we can calculate the evolving item bias from the provided training egment: bi,cats(i) = 1 |cats(i)| X c2cats(i) Pr(+|c) (11) 2) User Biases Towards Categories: In the previous sec- ion, we induced per-user discrete probability distributions that captured the probability of the user u rating a given category c highly during lifecycle stage s: Pu s (c). Given that users’ taste evolve, our goal is to estimate the probability of the user rating an item highly given its categories by capturing how the user’s preferences for each category have changed in past (decaying or growing). To capture the development of a user’s preference or a category we derived the average change rate ( u c ) over he k lifecycle periods coming before the final lifecycle stage n the training set. The parameter k is the number of stages back in the training segment from which either a monotonic ncrease or decrease in the probability of rating category c began from. We define the change rate ( u c ) as follows: |cats(i)| c2cats(i) Other schemes for calculating the bias (both item and user) could be used, e.g. ch bias, however we use the average as an 3) Weighting Category Biases: The a are derived as static features within the re (Eq. 8) mined from the provided traini each user may be influenced by these fac when performing their ratings. To this e weights, one for each category bias, defi the item biases to categories and the use respectively. As we will explain below, th learnt during the training phase of induc C. Personalisation Component The personalisation component of th model builds on the existing SV D++ m [2]. The modified model has four latent fa denotes the f latent factors associated wit denotes the f latent factors associated wit denotes the f latent factors for item j f items by user u: R(u); and we have defin Rf which captures the latent factor vec for a given semantic category c. We den tional component as the category factor General Category Biases User Biases to Categories
  • 11. Evaluation Setup SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 10 ¨  Tested three models (trained using Stochastic Gradient Descent) ¤  SVD++ (baseline) ¤  SB-SVD++: SVD++ with Semantic Category Biases ¤  S-SVD++ (SB-SVD++ with personalisation component) ¨  Tuned hyperparameters over the validation splits ¨  Model testing: ¤  Trained models with tuned hyperparameters using both training and validation splits ¤  Applied to held-out final 10% of reviews ¨  Evaluation measure: Root Mean Square Error
  • 12. Evaluation Results SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 11 ¨  Significantly outperformed the SVD++ baseline ¨  MovieLens: ¤  Full model (S-SVD++) produces significantly superior performance ¨  MovieTweetings: ¤  Marginal difference between SB-SVD++ and S-SVD++ TABLE III. ROOT MEAN SQUARE ERROR (RMSE) OF THE THREE MODELS ACROSS THE TWO DATASETS. EACH DATASET’S BEST MODEL IS HIGHLIGHTED IN BOLD WITH THE P-VALUE FROM THE MANN-WHITNEY WITH THE NEXT BEST MODEL. Model MovieLens MovieTweetings SV D++ 1.520 0.969 SB SV D++ 1.517 0.963 S SV D++ 1.513 (< 0.001) 0.963 (< 0.1)
  • 13. Conclusions SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 12 ¨  Semantic taste profiles can track users’ tastes: ¤  Overcomes the factor consistency problem ¤  Enables modelling of global taste influence ¤  SemanticSVD++ boosts recommendation performance ¨  Semantic categories are limited however: ¤  Hipster dilemma ¤  Cold-start Categories
  • 14. SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings 13 dbpedia:c1! dbpedia:c3! dbpedia:c4! Cold-start Categories dbpedia:c5! 5* 4* ? Transferring Semantic Categories with Vertex Kernels: Recommendations with SemanticSVD++. M Rowe. To appear in the proceedings of the International Semantic Web Conference. Trentino, Italy. (2014) dcterms:subject! dbpedia:c2! Unrated Categories
  • 15. @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ Questions?14 SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings

×