[UMAP2013] Recommendation with Differential Context Weighting


Published on

Context-aware recommender systems (CARS) adapt their recommendations to users’ specific situations. In many recommender systems, particularly those based on collaborative filtering, the contextual constraints may lead to sparsity: fewer matches between the current user context and previous situations. Our earlier work proposed an approach called differential context relaxation (DCR), in which different subsets of contextual features were applied in different components of a recommendation algorithm. In this paper, we expand on our previous work on DCR, proposing a more general approach — differential context weighting (DCW), in which contextual features are weighted. We compare DCR and DCW on two real-world datasets, and DCW demonstrates improved accuracy over DCR with comparable coverage. We also show that particle swarm optimization (PSO) can be used to efficiently determine the weights for DCW.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

[UMAP2013] Recommendation with Differential Context Weighting

  1. 1. Recommendation withDifferential Context WeightingRecommendation withDifferential Context WeightingYong ZhengRobin BurkeBamshad MobasherCenter for Web IntelligenceDePaul UniversityChicago, IL USAYong ZhengRobin BurkeBamshad MobasherCenter for Web IntelligenceDePaul UniversityChicago, IL USAConference on UMAPJune 12, 2013
  2. 2. Overview• Introduction (RS and Context-aware RS)• Sparsity of Contexts and Relevant Solutions• Differential Context Relaxation & Weighting• Experimental Results• Conclusion and Future Work
  3. 3. Introduction• Recommender Systems• Context-aware Recommender Systems
  4. 4. Recommender Systems (RS)• Information Overload  Recommendations
  5. 5. Context-aware RS (CARS)• Traditional RS: Users × Items  Ratings• Context-aware RS: Users × Items × Contexts RatingsCompanionExample of Contexts in different domains: Food: time (lunch, dinner), occasion (business lunch, family dinner) Movie: time (weekend, weekday), location (home, cinema), etc Music: time (morning, evening), activity (study, sports, party), etc Book: a book as a gift for kids or mother, etcRecommendation cannot live alone without considering contexts.
  6. 6. Research Problems• Sparsity of Contexts• Relevant Solutions
  7. 7. Sparsity of Contexts• Assumption of Context-aware RS: It is better to usepreferences in the same contexts for predictions inrecommender systems.• Same contexts? How about multiple contexts & sparsity?An example in the movie domain:Are there rating profiles in the contexts <Weekday, Home, Sister>?User Movie Time Location Companion RatingU1 Titanic Weekend Home Girlfriend 4U2 Titanic Weekday Home Girlfriend 5U3 Titanic Weekday Cinema Sister 4U1 Titanic Weekday Home Sister ?
  8. 8. Relevant SolutionsContext Matching  The same contexts <Weekday, Home, Sister>?1.Context Selection  Use the influential dimensions only2.Context Relaxation  Use a relaxed set of dimensions, e.g. time3.Context Weighting  We can use all dimensions, but measure howsimilar the contexts are! (to be continued later)Differences between context selection and context relaxation: Context selection is conducted by surveys or statistics; Context relaxation is directly towards optimization on predictions; Optimal context relaxation/weighting is a learning process!User Movie Time Location Companion RatingU1 Titanic Weekend Home Girlfriend 4U2 Titanic Weekday Home Girlfriend 5U3 Titanic Weekday Cinema Sister 4U1 Titanic Weekday Home Sister ?
  9. 9. DCR and DCW• Differential Context Relaxation (DCR)• Differential Context Weighting (DCW)• Particle Swarm Intelligence as Optimizer
  10. 10. Differential Context RelaxationDifferential Context Relaxation (DCR) is our first attempt to alleviatethe sparsity of contexts, and differential context weighting (DCW) is afiner-grained improvement over DCR.• There are two notion in DCR “Differential” Part  Algorithm Decomposition Separate one algorithm into different functional components; Apply appropriate context constraints to each component; Maximize the global contextual effects together; “Relaxation” Part  Context RelaxationWe use a set of relaxed dimensions instead of all of them.• References Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-awareTravel Recommendation". In EC-WEB, 2012 Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-AwareRecommendation using Differential Relaxation". In RecSys Workshop on CARS, 2012
  11. 11. DCR – Algorithm DecompositionTake User-based Collaborative Filtering (UBCF) for example.Pirates of theCaribbean 4Kung Fu Panda2Harry Potter6Harry Potter7U1 4 4 2 2U2 3 4 2 1U3 2 2 4 4U4 4 4 1 ?Standard Process in UBCF (Top-K UserKNN, K=1 for example):1). Find neighbors based on user-user similarity2). Aggregate neighbors’ contribution3). Make final predictions
  12. 12. DCR – Algorithm DecompositionTake User-based Collaborative Filtering (UBCF) for example.1.Neighbor Selection 2.Neighbor contribution3.User baseline 4.User SimilarityAll components contribute to the final predictions, wherewe assume appropriate contextual constraints can leverage thecontextual effect in each algorithm component.e.g. use neighbors who rated in same contexts.
  13. 13. DCR – Context RelaxationUser Movie Time Location Companion RatingU1 Titanic Weekend Home Girlfriend 4U2 Titanic Weekday Home Girlfriend 5U3 Titanic Weekday Cinema Sister 4U1 Titanic Weekday Home Sister ?Notion of Context Relaxation:• Use {Time, Location, Companion}  0 record matched!• Use {Time, Location}  1 record matched!• Use {Time}  2 records matched!In DCR, we choose appropriate context relaxation for each component.# of matched ratings best performances & least noisesBalance
  14. 14. DCR – Context Relaxation3.User baseline 4.User Similarity2.Neighbor contribution1.Neighbor Selectionc is the original contexts, e.g. <Weekday, Home, Sister>C1, C2, C3, C4 are the relaxed contexts.The selection is modeled by a binary vector.E.g. <1, 0, 0> denotes we just selected the first context dimensionTake neighbor selection for example:Originally select neighbors by users who rated the same item.DCR further filter those neighbors by contextual constraint C1i.e.. C1 = <1,0,0>  Time=Weekday u must rated i on weekdays
  15. 15. DCR – Drawbacks3.User baseline 4.User Similarity2.Neighbor contribution1.Neighbor Selection1. Context relaxation is still strict, especially when data is sparse.2. Components are dependent. For example, neighbor contribution isdependent with neighbor selection. E.g. neighbors are selected byC1: Location = Cinema, it is not guaranteed, neighbor has ratingsunder contexts C2: Time = WeekendA finer-grained solution is required!!  Differential Context Weighting
  16. 16. Differential Context WeightingUser Movie Time Location Companion RatingU1 Titanic Weekend Home Girlfriend 4U2 Titanic Weekday Home Girlfriend 5U3 Titanic Weekday Cinema Sister 4U1 Titanic Weekday Home Sister ? Goal: Use all dimensions, but we measure the similarity of contexts. Assumption: More similar two contexts are given, the ratings may bemore useful for calculations in predictions.c and d are two contexts. (Two red regions in the Table above.)σ is the weighting vector <w1, w2, w3> for three dimensions.Assume they are equal weights, w1 = w2 = w3 = 1.J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3Similarity of contexts is measured byWeighted Jaccard similarity
  17. 17. Differential Context Weighting3.User baseline4.User Similarity2.Neighbor contribution1.Neighbor Selection1.“Differential” part  Components are all the same as in DCR.2.“Context Weighting” part (for each individual component): σ is the weighting vector ϵ is a threshold for the similarity of contexts.i.e., only records with similar enough (≥ ϵ) contexts can be included.3.In calculations, similarity of contexts are the weights, for example2.NeighborcontributionIt is similar calculation for the other components.
  18. 18. Particle Swarm Optimization (PSO)The remaining work is to find optimal context relaxation vectors forDCR and context weighting vectors for DCW. PSO is derived fromswarm intelligence which helps achieve a goal by collaborativeFish Birds BeesWhy PSO?1). Easy to implement as a non-linear optimizer;2). Has been used in weighted CF before, and was demonstratedto work better than other non-linear optimizer, e.g. genetic algorithm;3). Our previous work successfully applied BPSO for DCR;
  19. 19. Particle Swarm Optimization (PSO)Swarm = a group of birdsParticle = each bird ≈ each run in algorithmVector = bird’s position in the space ≈ Vectors we needGoal = the location of pizza ≈ Lower prediction errorSo, how to find goal by swam?1.Looking for the pizzaAssume a machine can tell the distance2.Each iteration is an attempt or move3.Cognitive learning from particle itselfAm I closer to the pizza comparing withmy “best ”locations in previous history?4.Social Learning from the swarmHey, my distance is 1 mile. It is the closest!. Follow me!! Then other birds move towards here.DCR – Feature selection – Modeled by binary vectors – Binary PSODCW – Feature weighting – Modeled by real-number vectors – PSOHow it works? Take DCR and Binary PSO for example:Assume there are 4 components and 3 contextual dimensionsThus there are 4 binary vectors for each component respectivelyWe merge the vectors into a single one, the vector size is 3*4 = 12This single vector is the particle’s position vector in PSO process.
  20. 20. Experimental Results• Data Sets• Predictive Performance• Performance of Optimizer
  21. 21. Context-aware Data SetsAIST Food Data Movie Data# of Ratings 6360 1010# of Users 212 69# of Items 20 176# of ContextsReal hunger(full/normal/hungry)Virtual hungerTime (weekend, weekday)Location (home, cinema)Companions (friends, alone, etc)OtherFeaturesUser genderFood genre, Food styleFood stuffUser genderYear of the movieDensity Dense SparseContext-aware data sets are usually difficult to get….Those two data sets were collected from surveys.
  22. 22. Evaluation ProtocolsMetric: root-mean-square error (RMSE) and coverage whichdenotes the percentage we can find neighbors for a prediction.Our goal: improve RMSE (i.e. less errors) within a decentcoverage. We allow a decline in coverage, because applyingcontextual constraints usually bring low coverage (i.e. the sparsityof contexts!).Baselines: context-free CF, i.e. the original UBCF contextual pre-filtering CF which just apply the contextualconstraints to the neighbor selection component – no othercomponents in DCR and DCW.Other settings in DCR & DCW: K = 10 for UserKNN evaluated on 5-folds cross-validation T = 100 as the maximal iteration limit in the PSO process Weights are ranged within [0, 1] We use the same similarity threshold for each component,which was iterated from 0.0 to 1.0 with 0.1 increment in DCW
  23. 23. Predictive PerformancesBlue bars are RMSE values, Red lines are coverage curves.Findings:1) DCW works better than DCR and two baselines;2) Significance t-test shows DCW works significantly in movie data,but DCR was not significant over two baselines; DCW can furtheralleviate sparsity of contexts and compensate DCR;3) DCW offers better coverage over baselines!
  24. 24. Performances of OptimizerRunning time is in seconds.Using 3 particles is the best configuration for two data sets here!Factors influencing the running performances: More particles, quicker convergence but probably more costs; # of contextual variables: more contexts, probably slower; Density of the data set: denser, more calculations in DCW;Typically DCW costs more than DCR, because it uses allcontextual dimensions and the calculation for similarity of contextsis time-consuming, especially for dense data, like the Food data.
  25. 25. Other Results (Optional)1.The optimal threshold for similarity of contextsFor Food data set, it is 0.6;For Movie data set, it is 0.1;2.The optimal weighting vectors (e.g. Movie data)Note: Darker  smaller weights; Lighter  Larger weights
  26. 26. It is gonna end…• Conclusions• Future Work
  27. 27. Conclusions We propose DCW which is a finer-grained improvement over DCR; It can further improve predictive accuracy within decent coverage;PSO is demonstrated to be the efficient optimizer; We found underlying factors influencing running time of optimizer;Stay TunedDCR and DCW are general frameworks (DCM, i.e. differential contextmodeling as the name of this framework), and they can be applied toany recommendation algorithms which can be decomposed intomultiple components.We have successfully extend its applications to item-basedcollaborative filtering and slope one recommender.ReferencesY. Zheng, R. Burke, B. Mobasher. "Differential Context Modeling inCollaborative Filtering ". In SOCRS-2013, Chicago, IL USA 2013
  28. 28. AcknowledgementStudent Travel Support from US NSF (UMAP Platinum Sponsor)Future Work Try other similarity of contexts instead of the simple Jaccard one; Introduce semantics into the similarity of contexts to further alleviatethe sparsity of contexts, e.g., Rome is closer to Florence than Paris. Parallel PSO or put PSO on MapReduce to speed up optimizer;See u later…The 19th ACM SIGKDD Conference on Knowledge Discovery andData Mining (KDD), Chicago, IL USA, Aug 11-14, 2013
  29. 29. Thank You!Center for Web Intelligence, DePaul University, Chicago, IL USA