Context Aware User Modeling for Recommendation


Published on

Tutorial at the 21st Conference on User Modeling, Adaptation and Personalization (UMAP 2013). Rome, Italy, June 10-14, 2013.

Published in: Technology, Education
1 Comment
  • For data visualization,data analyticsand data intelligence tools online training with job placements, register at
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Context Aware User Modeling for Recommendation

  1. 1. Context-Aware User Modeling forRecommendationBamshad MobasherCenter for Web IntelligenceSchool of ComputingDePaul University, Chicagomobasher@cs.depaul.eduUMAP 2013, Rome, Italy
  2. 2. Special Thanks To: Yong Zheng, DePaul UniversityEspecially for help with material related to item/user splitting,differential context modeling, and statistics relates to context-awarerecommendation in related conferencesMake sure you go to his talk: Wednesday afternoon session Negar Hariri, DePaul UniversityEspecially for material related to context-aware songrecommendation approach and query-driven context-awarerecommendation Francesco Ricci, Free University of Bozen, BolzanoEspecially for material related to InCarMusic recommendersystems, and other assorted material2
  3. 3. 3Context in RecommendationRecommendation ScenarioSteve’s purchases on Amazon:mystery-detective fiction “Da Vinci Code” (for himself)“Python Programming” (for work)“Green Eggs and Ham” (gift for his daughter)How should we represent Steve’s interest in books?System needs to know the difference between childrenbooks and computer books, i.e., the contexts in whichSteve interacts with the system.What should be recommended if Steve is readingreviews for a book on Perl Scripting?
  4. 4. Anatomy of a Famous Example Jack buys a book on pregnancy for his friend Jill who isexpecting The purchase becomes part of Jack’s profile and insubsequent visits, he gets recommendations about babycloths, child rearing, etc. Amazon’ approach: distinguish between the task of giftbuying versus buying items for oneself.4
  5. 5. Anatomy of a Famous Example Goal of identifying gifts, isto exclude them from profilenot to change context Once excluded, no contextis used for the actual userIs this aproblem ofcontext?Or, a problem inuser profiling?Are they thesame thing? Even if “gift” were to betaken as a context, it wouldhave to be handleddifferently for eachrecipient5
  6. 6. 6Context in RecommendationSome forms ofrecommendationmay be morecontextual thanothers
  7. 7. Factors influencing Holiday DecisionDecisionPersonalMotivatorsPersonalityDisposableIncomeHealthFamilycommitmentsPast experienceWorkscommitmentsHobbies andinterestsKnowledge ofpotentialholidaysLifestyle Attitudes,opinions andperceptionsInternal to the tourist External to the touristAvailability ofproducts Advice of travelagentsInformation obtainedfrom tourismorganization andmediaWord-of-mouthrecommendationsPolitical restrictions:visa, terrorism,Health problemsSpecial promotionand offersClimate[Swarbrooke & Horner, 2006]Slide from F. RicciUMAP 2012
  8. 8. Relevant Questions "... it is difficult to find a relevant definition satisfying in anydiscipline. Is context a frame for a given object? Is it the setof elements that have any influence on the object? Is itpossible to define context a priori or just state the effects aposteriori? Is it something static or dynamic? Someapproaches emerge now in Artificial Intelligence. InPsychology, we generally study a person doing a task in agiven situation. Which context is relevant for our study?The context of the person? The context of the task? Thecontext of the interaction? The context of the situation?When does a context begin and where does it stop? Whatare the real relationships between context and cognition?”- Bazire and Brezillon, 20058
  9. 9. 9Context in RecommendationRecommendation Scenario - RevisitedSteve’s purchases on Amazon:mystery-detective fiction “Da Vinci Code” (for himself)“Python Programming” (for work)“Green Eggs and Ham” (gift for his daughter)Context seems to be tied to a particular interaction of user withthe systemUtility of an item in a given context may be in conflict with overallpreferences of the userContext may change during one visit or interaction (e.g., whencontext is tied to task, location, etc.)System needs to distinguish between different longer-termpreferences and possible short-term interests
  10. 10. Outline General views of context and their relevance torecommendation problem Representational versus Interactional Views Representation and acquisition of context System knowledge about context Key Concepts in Contextual Aware Recommendation General Background on Recommender Systems Architectures for integrating context in recommender systems Highlighted ApproachesItem / User SplittingApproaches based on Matrix FactorizationDifferential Contextual Modeling Implementation Frameworks & Example ImplementationsRepresentational  Multi-dimensional recommendationInteractional  A Framework based on human memory10
  11. 11. What is Context?By exampleLocation, time, identities of nearby users …By synonymSituation, environment, circumstanceBy dictionary [WordNet]the set of facts or circumstances that surround asituation or eventProblems:New situations don’t fit examplesHow to use in practice?
  12. 12. Types of Context Physical contexttime, position, and activity of the user,weather, light, and temperature ... Social contextthe presence and role of other people around theuser Interaction media contextthe device used to access the system and the typeof media that are browsed and personalized (text,music, images, movies, …) Modal contextThe state of mind of the user, the user’s goals,mood, experience, and cognitive capabilities.12[Fling, 2009]
  13. 13. Defining ContextEntities interact with their environment through“situated actions”“Any information that can be used to characterise the situation ofentities.” (Dey et al., 2001)Context of an entity exist independently and outside ofthe entity’s actionsEverything that affects computation except its explicit input andoutput.” (Lieberman and Selker, 2000)Intensionality versus Extensionality
  14. 14. Different Views of ContextDourish (2004) distinguishes between two views ofcontext:representational view and the interactional viewRepresentational view, makes four key assumptions:Context is a form of information, it is delineable, stable andseparable from the activityImplications:Context is information that can be described using a set of“appropriate” attributes that can be observedThese attributes do not change and are clearly distinguishable fromfeatures describing the underlying activity undertaken by the userwithin the contextNo “situated action”14
  15. 15. Different Views of ContextInteractional View of Context (Dourish, 2004)Contextuality is a relational property, i.e. someinformation may or may not be relevant to some activityThe scope of contextual features is defineddynamically, and is occasioned rather than staticRather than assuming that context defines the situationwithin which an activity occurs, there is a cyclicalrelationship between context and activity:Context gives rise to the activity and activitychanges the context15
  16. 16. Representational View:Assumptions & Implications Context can be represented as an explicit, enumerated setof static attributes (i.e., it’s “extensional”)Typically attributes are predefined based on the characteristics ofthe domain and environmentE.g., time, date, location, mood, task, device, etc.Contextual variable can have associated structureE.g., Sunday < Weekend Implications:Must identify and acquire contextual information as part of datacollection before actual recommendations are made.Relevant contextual variables (and their structures) must beidentified at the design stage. DrawbackThe “qualification problem” – similar to the outstanding problemfrom AI.
  17. 17. Example of Contextual Variables Temporal context: a temporal hierarchy with multiple temporalrelationships of varying granularity, e.g., Time (2008.10.19 11:59:59pm)  Date (2008.10.19)  Year (2008) Time (2008.10.19 11:59:59pm)  Hour (11pm)  TimeOfDay (evening) Date (2008.10.19)  Month (October)  Season (Fall) Date (2008.10.19)  DayOfWeek (Sunday)  TimeOfWeek (Weekend) Purchase Context:17Purchase Context KAdomavicius,Tuzhilin, 2008
  18. 18. Representing Contextual VariablesFormally, contextual information can be defined as avector of contextual variables c = (c1,…,cn), where ci ∈CiSometimes we refer to c as a contextual conditionC = C1×…×Cn denotes the space of possible values for agiven contextEach component Ci may have additional structure (e.g., a tree): itcan be defined as a hierarchical set of nodes (concepts)If ci ∈Ci , then ci represents one of the nodes in the hierarchy CiExample:C = PurchaseContext × TemporalContextc = (work, weekend), i.e., purchasing something for work on aweekend is the contextual condition18
  19. 19. Interactional View: Assumptions &ImplicationsProperties of ContextContext gives rise to a behaivor that is observable, thoughcontext itself may not be observable (it’s “intensional”)Context exists (usually implicitly) in relation to the ongoinginteraction of the user with the systemnot staticCan be derived: a stochastic process with d states {c1,c2,…,cd}representing different contextual conditionsContext aware recommendationExplicit representation of context may not be as important asrecognizing behavior arising from the contextadapting to the needs of the user within the context Drawback: Ability to explain recommendations
  20. 20. Obtaining ContextExplicitly specified by the userE.g., “I want to watch a movie at home with my parents”Observed or deduced by the systemTime (from system clock)Location (from GPS)Deduced from user’s behavior (e.g., shopping for business orpleasure)Etc.Significant research literature on obtaining, inferring, andpredicting context (e.g., for mobile computing)We will only discuss as part of the description of somehighlighted implementations20
  21. 21. Relevance of Contextual InformationNot all contextual information is relevant for generatingrecommendationsE.g., which contextual information is relevant whenrecommending a book?For what purpose is the book bought? (Work, leisure, …)When will the book be read? (Weekday, weekend, …)Where will the book be read? (At home, at school, on a plane, …)How is the stock market doing at the time of the purchase?Determining relevance of contextual information:Manually, e.g., using domain knowledge of the recommendersystem’s designerAutomatically, e.g., using feature selection procedures or statisticaltests based on existing data21
  22. 22. What the system knows aboutcontext22See: Adomavicius, Mobasher, Ricci, and Tuzhilin. Context AwareRecommender Systems. AI Magazine, Fall 2011
  23. 23. 23Recommender Systems Basics
  24. 24. The Recommendation TaskBasic formulation as a prediction problemTypically, the profile Pu contains preference scores byu on some other items, {i1, …, ik} different from itpreference scores on i1, …, ik may have been obtained explicitly(e.g., movie ratings) or implicitly (e.g., time spent on a productpage or a news article)24Given a profile Pu for a user u, and a target item it,predict the preference score of user u on item it
  25. 25. Recommendation as Rating Prediction Two types of entities: Users and Items Utility of item i for user u is represented by some rating r (wherer ∈ Rating) Each user typically rates a subset of items Recommender system then tries to estimate the unknownratings, i.e., to extrapolate rating function R based on the knownratings: R: Users × Items → Rating I.e., two-dimensional recommendation framework The recommendations to each user are made by offering his/herhighest-rated items25
  26. 26. Traditional RecommendationApproachesCollaborative FilteringGive recommendations to a user based on preferences of “similar”usersContent-Based FilteringGive recommendations to a user based on items with “similar”content in the user’s profileRule-Based (Knowledge-Based) FilteringProvide recommendations to users based on predefined (orlearned) rulesage(x, 25-35) and income(x, 70-100K) and children(x, >=3) recommend(x, Minivan)Combined or Hybrid Approaches26
  27. 27. 27Content-Based RecommendersPredictions for unseen (target) items are computedbased on their similarity (in terms of content) toitems in the user profile.E.g., user profile Pu containsrecommend highly: and recommend “mildly”:
  28. 28. 28Content-Based Recommenders:: more examples Music recommendations Play list generationExample: Pandora
  29. 29. 29Collaborative Recommenders Predictions for unseen (target) items are computed basedthe other users’ with similar ratings on items in user u’sprofile i.e. users with similar tastes (aka “nearest neighbors”) requires computing correlations between user u and other usersaccording to interest scores or ratings k-nearest-neighbor (knn) strategy User-based collaborative filteringItem-based Collaborative Filteringitem-item similarities are computed in space of ratingstwo items are similar if they are rated similarly by many usersprediction for target item is computed based on user u’s ownratings on the most similar items
  30. 30. 30Example Collaborative SystemItem1 Item 2 Item 3 Item 4 Item 5 Item 6 Correlationwith AliceAlice 5 2 3 3 ?User 1 2 4 4 1 -1.00User 2 2 1 3 1 2 0.33User 3 4 2 3 2 1 .90User 4 3 3 2 3 1 0.19User 5 3 2 2 2 -1.00User 6 5 3 1 3 2 0.65User 7 5 1 5 1 -1.00BestmatchPredictionUsing k-nearest neighbor with k = 1
  31. 31. 31Item-Based Collaborative FilteringItem1 Item 2 Item 3 Item 4 Item 5 Item 6Alice 5 2 3 3 ?User 1 2 4 4 1User 2 2 1 3 1 2User 3 4 2 3 2 1User 4 3 3 2 3 1User 5 3 2 2 2User 6 5 3 1 3 2User 7 5 1 5 1Itemsimilarity0.76 0.79 0.60 0.71 0.75BestmatchPrediction
  32. 32. CollaborativeRecommenderSystems32
  33. 33. 33CollaborativeRecommenderSystems
  34. 34. 34Collaborative Recommender Systems
  35. 35. 35Context In RecommenderSystems
  36. 36. Context-Aware RS (CARS)Traditional RS: Users × Items  RatingsContextual RS: Users × Items × Contexts Ratings There is always a context Recommendations are not “usable” apart from contextCompanion
  37. 37. Statistics on CARS Related Papers37024681012140246810121416Year2004Year2005Year2006Year2007Year2008Year2009Year2010Year2011Year2012Year2013 By ConferenceBy Year 2013 only includes:UMAP, WWW, ECWEB
  38. 38. Statistics on CARS Related Papers38 By Application DomainBy Context Type 02468101214160510152025
  39. 39. Statistics on CARS Related Papers39 By Evaluation MetricBy Algorithm 02468101214160246810121416
  40. 40. CARS Architectural Models Three types of Architecture for using context inrecommendation (Adomavicius, Tuzhilin, 2008)Contextual Pre-filteringContext information used to select relevant portions of dataContextual Post-filteringContextual information is used to filter/constrain/re-rank final set ofrecommendationsContextual ModelingContext information is used directly as part of learning preferencemodels Variants and combinations of these are possible Originally introduced based on the representational view Though these architectures are also generally applicable in theinteractional view40
  41. 41. CARS Architectural Models41From Adomavicius, Tuzhilin, 2008
  42. 42. Contextual Pre-Filtering Pre-Filtering: using contextualinformation to select the mostrelevant data for generatingrecommendations Context c serves as a query to selectrelevant ratings data Data(User, Item,Rating, Context), i.e., SELECT User, Item, RatingFROM DataWHERE Context = c Example: if a person wants to seea movie on Saturday, only theSaturday rating data is used torecommend movies42
  43. 43. Contextual Pre-Filtering ChallengesContext Over-SpecificationUsing an exact context may be too narrow:Watching a movie with a girlfriend in a movie theater onSaturdayCertain aspects of the overly specific context may not be significant(e.g., Saturday vs. weekend)Sparsity problem: overly specified context may not have enoughtraining examples for accurate predictionPre-Filter GeneralizationDifferent Approaches“Roll up” to higher level concepts in context hierarchiesE.g., Saturday  weekend, or movie theater  any locationUse latent factors models or dimensionality reduction approaches(Matrix factorization, LDA, etc.)43
  44. 44. Contextual Post-Filtering Ignore context in the data selectionand modeling phases, but filter or(re-rank) recommendations basedon contextual information Example: Context  Watching amovie with family Suppose the user generally watchescomedies and dramas when going totheater with her family. First, generate recommendations usingstandard recommender. Then filter out action movies from therecommendation list.44
  45. 45. Contextual Post-Filtering Contextual Post-Filtering is generally heuristic in nature Basic Idea: Treat the context as an additional constraint Many different approaches are possible Example: Filtering Based on Context Similarity Can be represented as a set of features commonly associated with thespecified context Adjust the recommendation list by favoring those items that have more ofthe relevant features Similarity-based approach (but the space of features may be different thanthe one describing the items) Example: Filtering Based on Social/Collaborative ContextRepresentation Mine social features (e.g., annotations, tags, tweets, reviews, etc.)associated with the item and users in a given context C Promote items with frequently occurring social features from C45
  46. 46. Contextual Modeling Using contextual information directly in themodeling learning phase Multi-dimensional recommendation models Contextual variables are added as dimensionsD1,…, Dn in the feature space in addition to theUsers and Items dimensions R: U × I × D1 × … × Dn → Rating Example: Dimensions for movierecommendation application User Movie Time (weekday, weekend) Company (alone, partner, family, etc.) Place (movie theater, at home)46
  47. 47. 47Two Dimensional ModeluseritemratingsUserfeaturesItemfeatures
  48. 48. 48Multidimensional Model[Adomavicius et al., 2005]
  49. 49. Contextual Modeling Many different approaches in recent years, both inrepresentational and interactional frameworks Extensions of standard collaborative filteringCF after Item / user splitting pre-filtersDifferential Context Relaxation Heuristic distance-based approachesExtend items-item, user-user similarities to contextual dimensionsRequires, possibly domain specific, similarity/distance metrics forvarious contextual dimensions Approaches based on matrix/tensor factorizationModel the data as a tensor and apply higher-order factorizationtechniques (HoSVD, PARAFAC, HyPLSA, etc) to model context in alatent spaceContext-Aware Matrix Factorization Probabilistic latent variable context models49
  50. 50. 50Highlighted Approach:Item / User Splitting
  51. 51. Context-Aware Splitting Approaches Generally based on contextual pre-filtering May be combined with contextual modeling techniques Goal: produce a 2D data set that incorporates contextinformation associated with preference scores Advantage: can use a variety of well-known traditional recommendationalgorithms in the modeling phase Disadvantages:Determining the variables based on which to splitMay lead to too much sparsity There are three approaches to splitting: Item Splitting (Baltrunas et al., 2009, RecSys) User Splitting (Baltrunas et al., 2009, CARS) UI Splitting (Zheng et al., 2013)
  52. 52. Item Splitting and User Splitting Item Splitting Assumption: the nature of an item, from the users point of view, maychange in different contextual conditions (values of contextual variables) Hence we may consider the item as multiple items – one for eachcontextual condition User splitting It may be useful to consider one user as multiple users, if he or shedemonstrates significantly different preferences in different contexts Good deal of recent work on these approaches: L. Baltrunas and F. Ricci. Context-based splitting of item ratings in collaborativefiltering. RecSys 2009 L. Baltrunas and X. Amatriain. Towards time-dependent recommendation based onimplicit feedback. RecSys 2009 Workshop on CARS A. Said, E. W. De Luca, and S. Albayrak. Inferring contextual user profiles –improving recommender performance. RecSys 2011 Workshop on CARS
  53. 53. Example: Item SplittingUser Movie Rating Time Location CompanionU1 M1 3 Weekend Home FriendU1 M1 5 Weekend Theater SpouseU1 M1 ? Weekday Home FamilyAssume Location (Home vs. Theater) is the best split conditionUser Item RatingU1 M11 3U1 M12 5U1 M11 ?M11: M1 seen at home; M12 = M1 seen not at home
  54. 54. Example: User SplittingUser Movie Rating Time Location CompanionU1 M1 3 Weekend Home FriendU1 M1 5 Weekend Theater AloneU1 M1 ? Weekday Home FamilyAssume Companion (Family vs. Non-Family) is the best split conditionU11: U1 saw the movie with family; U12 = U1 saw the movie alone or with a friendUser Item RatingU12 M1 3U12 M1 5U11 M1 ?
  55. 55. User-Item (UI) Splitting New approach combining User and Item splitting The process is simply an application of item splitting followed by usersplitting on the resulting output Y. Zheng, B. Mobasher, R. Burke. Splitting approaches for Context-awareRecommendation. (To appear) Using the same conditions as previous example:User Item RatingU12 M11 3U12 M12 5U11 M11 ?User Item RatingU1 M11 3U1 M12 5U1 M11 ?User Item RatingU12 M1 3U12 M1 5U11 M1 ?User SplittingUI SplittingItem Splitting
  56. 56. Determining the Best Conditionsfor Splitting Impurity Criteria Statistical criteria to evaluate whether items are being rated significantlydifferently under an alternative contextual condition e.g. the location  home vs. not-home Commonly used criteria tmean (t-test) tprop (z-test) tchi (chi-square test) tIG (Information gain) Thresholds: P-value is used to judge significance
  57. 57. Splitting Criteria Example: tmean Uses the two-sample t test and computes how significantly differentare the means of the ratings in the two rating subsets, when the splitc (context condition) is used S is the rating variance, and n is the number of ratings in the givencontextual condition, c and c- denote alternative conditions The bigger the t value of the test is, the more likely the difference ofthe means in the two partitions is significant This process is iterated over all contextual conditions
  58. 58. Splitting Criteria Exampletmean ; condition: time= weekend and not weekendUser Item Rating Time Location CompanionU1 T1 3 Weekend Home SisterU1 T1 5 Weekend Cinema GirlfriendU2 T1 4 Weekday Home FamilyU3 T1 2 Weekday Home Sister mean1=4 , mean2 =3 , s1 =1 , s2 =1 , n1= 2, n2=2 Impurity criteria tmean = (4-3)/1 = 1 P-value of t-test used to determine significance (0.05 as threshold)
  59. 59. Other Splitting Criteriatprop uses the two proportion z test and determines whether there is asignificant difference between the proportions of high and lowratings when a contextual condition is used Usually rating = 3 is used as the threshold to divide proportions ifthe rating scale is 1-5.tchi  similar as tprop, but use chi square test instead of z testtIG  uses high/low proportion, but measures information gain
  60. 60. Example ResultsFood Data Movie DataRatings 6360 1010Users 212 69Items 20 176ContextsReal hunger(full/normal/hungry)Virtual hungerTime (weekend, weekday)Location (home, cinema)Companions (friends, alone, etc)Contexts-linkedFeaturesUser genderfood genre, food style,food stuffUser gender, year of the movieDensity Dense in contexts Sparse in contexts
  61. 61. Example Results (RMSE)Results based on splitting followed by Matrix Factorization (discussed next)
  62. 62. 62Highlighted Approach:Contextual Modeling using MatrixFactorization
  63. 63. Contextual Modeling via Factorization Recent approaches to contextual modeling attempt to fit thedata using various regression models Prominent example: Tensor Factorization (TF) Tensor Factorization Extends the two-dimensional matrix factorization problem into an multi-dimensional version of the same problem Multi-dimensional matrix is factored into lower-dimensional representation,where the user, the item and each contextual dimension are representedwith a lower dimensional feature vector Problem: TF can introduce a huge number of model parametersthat must be learned using the training data the number of model parameters grow exponentially with the number ofcontextual factors Simpler models, with less parameters, can often perform as well One Solution: Context-Aware Matrix Factorization63
  64. 64. Matrix Factorization - Summary From SVD to MF In SVD, A is a rating matrix which can be decomposed into U, S, VT Standard SVD requires filling out empty entries by zeros which addsnoise and may distort the data
  65. 65. Matrix Factorization - Summary In MF, A is decomposed into two factor matricesrui = qiT pupu is the user-factor vector, and qi is the item-factor vector Typically, user and item vectors are initialized and then learned from thenon-zero entries in the rating matrix The optimization is usually performed using Stochastic Gradient Descent(SGD) or Alternating Least Squares (ALS)65q prui
  66. 66. Matrix Factorization - Summary Learning the factor vectors Minimize errors on known ratings However, this is typically done after standardization by removing the globalmean from the rating data:66Actual rating by user u on item iPredicted rating by user u on item iMinimizing Cost Function(Least Squares Problem)
  67. 67. Matrix Factorization - Summary Learning the factor vectors Adding user and item bias to the factor model Regularization is used to prevent over-fitting:67rui : actual rating of user u on item im : training rating averagebu : user u user biasbi : item i item biasqi : latent factor vector of item ipu : later factor vector of user ul : regularization Parameters
  68. 68. Matrix Factorization - SummaryStandard MF vs. BiasMFStandard MF: Predict(u, i) = qiT puBiasMF: Predict(u, i) = qiT pu + µ + bu + bi bu and bi are user bias and item bias respectively, µ is the globalmean ratingThe reason why to add biasStandard MF tries to capture the interactions between users anditems that produce the different rating valuesHowever, much of the observed variation in rating values is due toeffects associated with either user or items, known as biases,independent of any interactions.Adding user and item biases can help explain the rating valuesbetter than only relying on the interaction of the form qiT pu .
  69. 69. Basic Steps in MF-Based ModelsConstruct User-Item Matrix (sparse data structure)Define factorization model - Cost functionTake out global meanDecide what parameters in the modelItem or user bias, preference factor, time dependent factors,context, etc.Minimizing cost function - model fittingStochastic gradient descentAlternating least squaresAssemble the predictionsEvaluate predictions (RMSE, MAE etc..)Continue to tune the model69
  70. 70. Context-Aware Matrix Factorization Recall the difference between MF and BiasMF: Standard MF: Predict(u, i) = qiT pu BiasMF: Predict(u, i) = qiT pu + µ + bu + bi bu and bi are user bias and item bias respectively, µ is the global meanrating Context-Aware MF CAMF replaces the simple item bias, bi, by the interaction between itemand contextual conditionsPredicted rating will now be also a function of contextual conditions, c1,…, ck giving rise to a particular contextThe item bias is modeled by how that item is rated in different contextsi.e., sum of biases for the given item across all contextual conditions,c1, …, ckDifferent levels of granularity in aggregating item biases can lead todifferent variants of CAMF70
  71. 71. Context-Aware MF (CAMF)(Baltrunas, Ludwig, Ricci, RecSys 2011) There are three models derived based on the interaction ofcontext and items CAMF-C assumes that each contextualcondition has a global influence on theratings - independently from the item CAMF-CI introduces one parameter pereach contextual condition and item pair CAMF-CC introduces one modelparameter for each contextual conditionand item category (music genre).GlobalItemGenreSlide from F. RicciUMAP 201271
  72. 72. Context-Aware MF (CAMF)(Baltrunas, Ludwig, Ricci, RecSys 2011)vu and qi are d dimensional real valued vectorsrepresenting the user u and the item i is the average of the item i ratingsbu is a baseline parameter for user u CAMF-C: only one parameter for each context condition, CAMF-CI: In Bijcj, i denotes item, j is the jth context dimension, and cj isthe condition in this dimension CAMF-CC: it considers the item category; that is, if item i and f fall intothe same category, then72
  73. 73. CAMF Example CAMF-C only one parameter for each context condition, e.g. time as a context dimension hastwo conditions: weekend and weekday. the item bias is equal to the bias in each context condition e.g., the interaction <T1, weekend> and <T2, weekend> is the same bias value CAMF-CI the finest-grained interaction modeling <T1, weekend> and <T2, weekend> may be different values and the value islearned in the matrix factorization process CAMF-CC considers the item category if items i and f fall into the same category, thenUser Item Rating Time Location CompanionU1 T1 3 Weekend Home AloneU1 T2 5 Weekend Cinema FriendU2 T2 4 Weekday Home FamilyU3 T1 2 Weekday Home Alone73
  74. 74. CAMF – Example Results(Baltrunas, Ludwig, Ricci, RecSys 2011)74
  75. 75. Optimization in CAMF Similar to the BiasMF Optimization based on SGD Optimization: Parameters updates as follows:
  76. 76. Example Implementation -InCarMusicDetecting relevant contextual factors – basedon user survey (expected utility)Acquiring ratings in contextGenerating rating predictions with context-aware matrix factorization76Slide from F. RicciUMAP 2012
  77. 77. Android Application[Baltrunas et al., 2011]77Slide from F. Ricci, UMAP 2012
  78. 78. Methodological Approach1. Identifying potentially relevant contextual factors Heuristics, consumer behavior literature2. Ranking contextual factors Based on subjective evaluations (what if scenario)3. Measuring the dependency of the ratings from the contextualconditions and the users Users rate items in imagined contexts4. Modeling the rating dependency from context Extended matrix factorization model5. Learning the prediction model Stochastic gradient descent6. Delivering context-aware rating predictions and itemrecommendation78Slide from F. Ricci, UMAP 2012
  79. 79. Contextual Factors driving style (DS): relaxed driving, sport driving road type(RT): city, highway, serpentine landscape (L): coast line, country side,mountains/hills, urban sleepiness (S): awake, sleepy traffic conditions (TC): free road, many cars,traffic jam mood (M): active, happy, lazy, sad weather (W): cloudy, snowing, sunny, rainy natural phenomena (NP): day time, morning,night, afternoon79Slide from F. Ricci, UMAP 2012
  80. 80. Determine Context Relevance Web based application 2436 evaluations from 59 users80Slide from F. Ricci, UMAP 2012
  81. 81. User Study Results Normalized Mutual Information of the contextualcondition on the Influence variable (1/0/-1) The higher the MI the larger the influenceBlues MI Classical MI Country MI Disco MI Hip Hop MIdriving style 0.32 driving style 0.77 sleepiness 0.47 mood 0.18trafficconditions 0.19road type 0.22 sleepiness 0.21 driving style 0.36 weather 0.17 mood 0.15sleepiness 0.14 weather 0.09 weather 0.19 sleepiness 0.15 sleepiness 0.11trafficconditions 0.12naturalphenomena 0.09 mood 0.13trafficconditions 0.13naturalphenomena 0.11naturalphenomena 0.11 mood 0.09 landscape 0.11 driving style 0.10 weather 0.07landscape 0.11 landscape 0.06 road type 0.11 road type 0.06 landscape 0.05weather 0.09 road type 0.02trafficconditions 0.10naturalphenomena 0.05 driving style 0.05mood 0.06trafficconditions 0.02naturalphenomena 0.04 landscape 0.05 road type 0.0181Slide from F. Ricci, UMAP 2012
  82. 82. In Context Ratings Contextual conditions are sampled with probability proportional tothe MI of the contextual factor and music genre82Slide from F. Ricci, UMAP 2012
  83. 83. Influence on the Average Ratingno-context contextIn the No-Context condition users are evaluating rating in thedefault contextThe default context is the context where consuming the itemsmakes sense – best context83Slide from F. Ricci, UMAP 2012
  84. 84.  vu and qi are d dimensional real valued vectors representing theuser u and the item i is the average of the item i ratings bu is a baseline parameter for user u bgjc is the baseline of the contextual condition cj (factor j) andgenre gi of item iassume that context influences uniformly all the tracks with agiven genre If a contextual factor is unknown, i.e., cj = 0, then thecorresponding baseline bgjc is set to 0.Predictive Model84Slide from F. Ricci, UMAP 2012
  85. 85. Predicting Expected Utility inContextItem averageGlobal ItemGenre[Baltrunas et al., 2011]85
  86. 86. InCarMusic Approach LimitationsFramework is personalizedUsing the extended MF modelBut, it requires users’ ratings in different contextualconditions to be trainedAlternative ApproachUse relationships between items and the user’s context usingother information sourcesSemantic information associated with items and contextsSocial cues relevant to items and users (e.g., tags)We’ll discuss some examples later86
  87. 87. Some Additional References for MFBased Approaches Y. Koren, Factorization meets the neighborhood: a multifacetedcollaborative filtering model. KDD 2008 Yehuda Koren, Collaborative filtering with temporal dynamics. KDD 2009 A. Karatzoglou , X. Amatriain , L. Baltrunas , N. Oliver. Multiverserecommendation: n-dimensional tensor factorization for context-awarecollaborative filtering. RecSys 2010 L. Baltrunas, B. Ludwig, and F. Ricci. Matrix factorization techniques forcontext aware recommendation. RecSys 2011 L. Baltrunas, M. Kaminskas, B. Ludwig, O. Moling, F. Ricci, A. Aydin, K.-H. Luke, and R. Schwaiger. InCarMusic: Context-aware musicrecommendations in a car. ECWeb 2011 Rendle, Gantner, Freudenthaler, Schmidt-Thieme: Fast context-awarerecommendations with factorization machines. SIGIR 201187
  88. 88. 88Highlighted Approach:Differential Context Modeling
  89. 89. Differential Context Modeling Basic assumption of many Context-aware RS It is better to use preferences within the same context to make predictionsrelevant to that context. Problem: sparsity Especially when multiple contextual variables define a contextual condition Are there rating profiles in the context <Weekday, Home, Friend>?User Movie Time Location Companion RatingU1 Titanic Weekend Home Family 4U2 Titanic Weekday Home Family 5U3 Titanic Weekday Cinema Friend 4U1 Titanic Weekday Home Friend ?89
  90. 90. Possible Solutions Context Matching  only the exact context <Weekday, Home, Friend>? Context Selection  use only the most relevant contexts Context Relaxation  relax set of constraints (dimensions) defining thecontext, e.g. use only time Context Weighting  use all dimensions, but weight them according toco-occurrence relationships among contextsUser Movie Time Location Companion RatingU1 Titanic Weekend Home Family 4U2 Titanic Weekday Home Family 5U3 Titanic Weekday Cinema Friend 4U1 Titanic Weekday Home Friend ?90
  91. 91. Differential Context ModelingThere are two parts in DCM“Differential” PartSeparate one algorithm into different functional components;Apply differential context constraints to each component;Maximize the global contextual effects by algorithmcomponents;“Modeling” PartIt can be performed by context relaxation or context weightingDifferential Context Relaxation (DCR)Differential Context Weighting (DCW)91
  92. 92. DCR – Algorithm Decomposition Example: User-based Collaborative Filtering (UBCF)Standard Process in UBCF:1). Find neighbors based on user-user similarity2). Aggregate neighbors’ contributions3). Make final predictionsPirates of theCaribbean 4Kung FuPanda 2Harry Potter6Harry Potter 7U1 4 4 1 2U2 3 4 2 1U3 2 2 4 4U4 4 4 1 ?92
  93. 93. DCR – Algorithm Decomposition User-based Collaborative Filtering Predictions:1.Neighbor Selection 2.Neighbor contribution3.User baseline 4.User SimilarityStandard UBCF  all components are assumed to range over the sameset of contextual conditionsUBCF with DCR  find the appropriate contextual condition for eachcomponent93
  94. 94. DCR – Context Relaxation Example Context Relaxation: Use {Time, Location, Companion}  0 records matched! Use {Time, Location}  1 record matched! Use {Time}  2 records matched! In DCR, we choose appropriate context relaxation for each componentUser Movie Time Location Companion RatingU1 Titanic Weekend Home Family 4U2 Titanic Weekday Home Family 5U3 Titanic Weekday Cinema Friend 4U1 Titanic Weekday Home Friend ?94
  95. 95. DCR – Context Relaxation Examplec is the original context, e.g. <Weekday, Home, Friend>C1, C2, C3, C4 are the relaxed contextsThey could be the full c or partial constraints from cThe selection is modeled by a binary vector.E.g. <1, 0, 0> denotes we just selected the first context dimensionThe optimal relaxations are found through optimization3.User baseline 4.User Similarity2.Neighbor contribution1.Neighbor Selection95
  96. 96. DCR – Drawbacks Context relaxation may still be too strict when data is too sparse Components are dependent E.g., neighbor contribution is dependent on neighbor selection C1: Location = Cinema, is not guaranteed, but, neighbor has ratings undercontexts C2: Time = Weekend Need a finer-grained solution  Differential Context Weighting3.User baseline 4.User Similarity2.Neighbor contribution1.Neighbor Selection96
  97. 97. Differential Context Weighting Goal: Use all dimensions, but weight them based on the similarityof contexts Assumption: the more similar two contexts are, the more similar the ratings will be inthose contexts Similarity can be measured by Weighted Jaccard similarity Example:c and d are two contexts (two red regions in the Table)σ is the weighting vector <w1, w2, w3> for three dimensions.Assume they are equal weights, w1 = w2 = w3 = 1J(c, d, σ) = # of matched dimensions / # of all dimensions = 2/3User Movie Time Location Companion RatingU1 Titanic Weekend Home Friend 4U2 Titanic Weekday Home Friend 5U3 Titanic Weekday Cinema Family 4U1 Titanic Weekday Home Family ?97
  98. 98. Differential Context Weighting3.User baseline4.User Similarity2.Neighbor contribution1.Neighbor Selection“Differential” part  Components are all the same“Context Weighting” part:σ is the weighting vector, and ϵ is a threshold for the similarity ofcontextsi.e., only records with similar enough contexts (≥ ϵ) can be included inthe calculationsNeed to find optimal weighting vectors98
  99. 99. Particle Swarm Optimization (PSO) PSO is derived from swarm intelligence achieve a goal by collaborative work via a swarmFish Birds Bees99
  100. 100. Particle Swarm Optimization (PSO)Swarm = a group of birdsParticle = each bird ≈ each run in algorithmVector = bird’s position in the space ≈ Vectors for DCR/DCWGoal = the location of pizza ≈ RMSESo, how to find goal by swam?1.Looking for the pizzaAssume a machine can tell the distance2.Each iteration is an attempt or move3.Cognitive learning from particle itselfAm I closer to the pizza comparing withmy “best” locations in previous steps?4.Social Learning from the swarmHey, my distance is 1 mile. It is the closest!Follow me!! Then other birds move towards hereDCR – Feature selection – Modeled by binary vectors – Binary PSODCW – Feature weighting – Modeled by real-number vectors – PSOHow it works? DCR and Binary PSO example:Assume there are 3 components and 4 contextual dimensionsThus there are 3 binary vectors for each component respectivelyWe integrate the vectors into a single one, the vector size is 3*4 = 12This single vector is the particle’s position vector in PSO process100
  101. 101. Predictive PerformanceBlue bars are RMSE values, Red lines are coverage curvesFindings:1) DCW works better than DCR and two baselines2) t-test shows DCW works better significantly in movie data3) but DCR was not significant over two baselines4) DCW can further alleviate sparsity of contexts5) DCW offers better coverage over baselines!101
  102. 102. Performance of PSO Optimizer Running time in seconds Factors influencing the running performances: More particles, quicker convergence but probably more cost # of contextual variables: more contexts, slower Density of the data set: denser, more calculations Typically DCW costs more than DCR, because it uses all contextualdimensions and the calculation for similarity of contexts especially time consuming for dense data, like the Food data102
  103. 103. DCR in Item-based CF (IBCF) Algorithm Decomposition and Context Relaxation (DCR)Put IBCF on DCR123103
  104. 104. DCW in Item-Based CF (IBCF) Algorithm Decomposition and Context Weighting (DCW)Put IBCF on DCWσ is the weighting vector, and ϵ is a threshold for the similarity ofcontexts, i.e., only records that are similar enough (≥ ϵ) can beincluded in the predictions104
  105. 105. Predictive Performances (RMSE)DCW works the best, where pre-filtering is better than DCR but very low coverage!105
  106. 106. Differential Context Modeling Some relevant work Y. Zheng, R. Burke, B. Mobasher. "Differential Context Relaxation for Context-aware Travel Recommendation". In EC-WEB, 2012 [DCR] Y. Zheng, R. Burke, B. Mobasher. "Optimal Feature Selection for Context-AwareRecommendation using Differential Relaxation". In ACM RecSys Workshop onCARS, 2012 [DCR + Optimizer] Y. Zheng, R. Burke, B. Mobasher. "Recommendation with Differential ContextWeighting". In UMAP, 2013 [DCW] Y. Zheng, R. Burke, B. Mobasher. “Differential Context Modeling in CollaborativeFiltering". In SOCRS-2013, DePaul University, Chicago, IL, May 31, 2013 [DCM] Future Work Try other similarity of contexts instead of the simple Jaccard Introduce semantics into the similarity of contexts to further alleviate the sparsity ofcontexts, e.g., Rome is closer to Florence than Paris Parallel PSO or put PSO on MapReduce to speed up optimizer Additional recommendation algorithms on DCR or DCW106
  107. 107. 107An Interactional Architecture forContext-Aware Recommendation
  108. 108. GenerateRecommendations CueGeneratorShort-TermMemoryuserprofilesUpdateLTMMergeLong-TermMemoryMemoryObjectsCollaborative Semantic BehavioralDomainKnowledgeRetrievedPreferenceModelsAn Interactional Model forContextual RecommendationInspired by Atkinson and Shriffin’s model ofhuman memory[Anand and Mobasher, 2007]108
  109. 109. Contextual RecommendationGeneration Explicit or implicit preferences for items from the activeinteraction are stored in the STM Contextual cues are derived from this data and used to retrieverelevant preference models from LTM Relevant = belong to the same context as the active interaction. Merged with STM preferences and used to predict preferences forunseen items New Observations used to update preference models in LTM Lots of variations: LTM objects can be organized based on ontological or semanticrelationships LTM preference models may be aggregate objects based on similaritiesamong users Identifying relevant LTM objects can be done in a variety ways (typicallyusing appropriate similarity functions or probabilistic approaches)109
  110. 110. Retrieving Preference ModelsUsing Contextual Cues Task of retrieving memory objects from LTM can be viewedas estimating:Where Li are memory objects stored in LTM and CCj arecontextual cues generated from STM The calculation Pr(STM|CCj) would be highly dependent onthe particular type of cue being used. Pr(STM|CCj) may be estimated based on collaborative, semantic, orbehavioral observations E.g., Pr(STM|CCj) could be a weight associated with a concept (such as thefraction of positive ratings in STM associated with items in a given category110 jjjjiiCCCCSTMCCLSTMSTML )Pr()|Pr()|Pr()Pr(1)|Pr(
  111. 111. Type of Contextual Cues Collaborative Cues represent items as vectors over user ratings Memory objects from LTM with preference models that have a similarity greaterthan a particular threshold are retrieved and used in the recommendationgeneration Semantic Cues Retrieve LTM preference models based on semantic similarity with user preferencemodel from the active interaction. Assume the existence of an item knowledge base (or textual feature space fordocuments) and use item semantics to compute similarity between items. Behavioral Cues Can use various implicit metrics for user preferences. Similarity between these metrics computed for the active interaction and LTMpreference models are used as the basis for retrieving objects from LTM. Another approach is to extract latent factors that drive user choice, for example,impact values associated with item attributes extracted from an ontology, or factorsderived from user actions representing various tasks or topics.111
  112. 112. Characteristics of the Framework Different, but not in contradiction to the three architecturalmodels for contextual filtering The Framework Emphasizes The distinction between local, transient preference models in STM andthe long-term established models in LTM The importance of user’s interaction with the system in deriving contextualcues The mutually reinforcing relationship between user activity and the contextmodelThis, in turn, emphasizes the dynamic nature of context Does Not Emphasize Explicit knowledge-based representation of contextual attributes A rigid formulation of contextual modeling approachesVery general framework and many implementations are possible (wewill look at several next)112
  113. 113. Example Implementations:Contextual Collaborative ModelsInclusive Memory ModelUses all ratings in the LTM and STM of ua to defineneighborhoodTemporal Memory ModelUses ratings from STM and the last k ratings from LTMContextual Memory ModelUses ratings from STM and those ratings from LTMrated within the same context as current context113
  114. 114. Example Implementation: ItemKnowledge Bases and ContextGiven user behavioraldata and an itemknowledge basesDiscover differentuser behaviors thatcan be associatedwith different userinteraction contexts115
  115. 115. A High Level ViewOne visitor may have multiple such profilesIf they are distinct enough, they would represent a different contextfor the user visitClustering of these profiles using identified 27 distinct clusters(contexts) within 15,000 user visitsOntologicalProfileGenerator0.50.150.750.3 10.20.05116
  116. 116. Measuring Impact Defined based on an observed (f(x)) and an expecteddistribution (g(x)) of instances of the concept The greater the divergence (Kullback-Leibler) between these distributions,the greater the impact Assumes g(x) to be uniform All instances on the concept are equally likely to be viewed by the user s is the number of unique instances of the concept and H(f(x)) is theentropy of f(x)117
  117. 117. Measuring Impact (II)impI assumes g(x) is the likelihood ofinstances of the concept being viewedwithin a random sampleSimulatedusing the item knowledge base, assuming each item has anequal probability of being selectedPopularity of the item across all users (takes temporalfactors into account)118
  118. 118. Some Evaluation Results119
  119. 119. 120Next: Focus on Several OtherImplementations of theInteractional Model
  120. 120. 121Example Implementation: Conceptsas Context Models Ontological user profile is an instance of the referenceontologyE.g., Amazon’s Book TaxonomyEach concept is annotated with an interest score System maintains and updates the ontological profilesbased on the user behavior and ongoing interactionInteractional / Representational hybridRepresentational part due to existence of a pre-specified ontology Profile NormalizationRelative importance of concepts in the profile reflect the changinginterests and varied information contexts of the userSieg, Mobasher, Burke, 2007 ** For more information visit:
  121. 121. 122Updating User Context by SpreadingActivationInterest scoreIndicates the importance of a concept for the userGets incremented or decremented based on the user’s behaviorover many interactionsSpreading ActivationOntological User Profile is treated as the semantic networkInterest scores updated based on activation valuesInitial set of concepts is assigned an initial activation value basedon similarity to user’s short-term interestsActivate other concepts based on a set of weighted relationsRelationship between adjacent concepts is determined based on thedegree of overlapObtain a set of concepts and their respective activations
  122. 122. 123Profile Updating Illustrated
  123. 123. 124Augmenting Collaborative Filtering withthe Context ModelCollaborative Filtering with OntologicalProfilesUser similarities are computed based on their interest scoresacross ontology concepts, instead of their ratings on individualitemsThis also helps broaden the recommendations and alleviatetypical problems with CF: “cold start,” “diversity,” “serendipity”Additional filtering is performed by selecting only neighbors thathave significant interest in the concept of the “target item”This helps in identifying the relevant “information accesscontext” and improves accuracy
  124. 124. 125Ontology-Based CollaborativeRecommendation Semantic Neighborhood Generation Compare the ontological user profiles for each user to form semanticneighborhoods Euclidean Distance C - set of all concepts in the reference ontology IS(Cj,u) – interest score for concept Cj for target user u IS(Cj,v) – interest score for concept Cj for target user v Normalize the distance Calculate similarity based on the inverse of the normalized distance2u ,v j,u j,vj Cd istan ce = (IS (C ) IS (C ))
  125. 125. 126Ontology-Based CollaborativeRecommendation Prediction Computation Compute the prediction for an item i for target user u Select most similar k neighbors Concept-based filtering on the neighbors Variation of Resnick’s standard prediction formula We use concept-based mean ratings for the target user and specificneighbors V – set of k similar users, ,,,* ( )u v v i vv Vu i uu vv Vsim r rp rsim In fact a functionof user, item andconcept
  126. 126. 127Experimental Setting Reference Ontology Amazon’s Book Taxonomy ISBN – unique identifier for each book Category, title, URL, and editorial reviews 4,093 concepts and 75,646 distinct books Evaluation using the book ratings collected by Ziegler 4-week crawl from the BookCrossing community 72,582 book ratings belonging to users with 20 or more ratings Training data utilized for spreading activation Test data used for predicting ratings K-fold cross validation, k = 5
  127. 127. 128Experimental Results Mean Absolute Error, k=200 ANOVA significance test with 99% confidence interval, p-Value < 0.01 (6.9E-11)
  128. 128. 129Experimental Results Recommendation Diversity Personalization uniqueness of different users’ recommendation lists based on inter-listdistance qij - number of common items in thetop N recommendations for two givenusers i and j Surprisal unexpectedness of a recommended item relative to its overall popularity i – given item in a user’srecommendation list frequencyi – number of overall positiveratings for i divided by the total numberof users( )( ) 1ijijq Nd NN 21lo g ( )iiIfreq u en cy
  129. 129. 130Experimental Results Recommendation Diversity Improved Personalization Improved SurprisalFor more information related to this work visit:
  130. 130. 131Example Implementation: LatentVariable Context ModelsGenerative approach to modeling user contextBasic assumption:users’ interactions involve a relatively small set of contextualstates that can “explain” users’ behavioruseful when dealing with applications involving user’sperforming informational or functional tasksContexts correspond to tasks and are derived as latent factors inthe observational data collected in the short-term memory. Probabilistic Latent Semantic Analysis (PLSA) can be used toautomatically learn and characterize these tasks, as well as therelationships between the tasks and items or usersAlgorithm based on Bayesian updating to discover individual user’stask transition patterns and generating task-level user models.Jin, Zhou, Mobasher, 2005
  131. 131. 132Latent Variable Models Assume the existence of a set of latent (unobserved) variables (orfactors) which “explain” the underlying relationships betweentwo sets of observed variables.p1 p2 pnu1 u2 ump3u3z1 z2 zkItems User profiles Latent Factors Advantage of PLSA:Probabilistically determinethe association betweeneach latent factor anditems, or between eachfactor and users.In navigational data, thelatent factors correspond todistinguishable patternsusually associated withperforming certaininformational or functionaltasks. Context = Task!
  132. 132. 133PLSA Model – Behavioral Observations represented as a matrix (UPmxn) each entry UPij corresponds to a weight of item j within auser interaction i. The weight can be binary, or based onthe various implicit or explicit measures of interest.p1 p2 p3 p4 p5 . . .User 1 1 0 0 1 1 . . .User 20 1 1 0 1 . . .… … … … … … …Note: similar models can be built using other types of observation data,e.g., <users, query terms>, <pages, keywords>, etc.
  133. 133. 134PLSA Model Consider each single observation (ui, pj) as a generativeprocess:1. select a user profile ui with Pr(ui)2. select a latent factor zk associated with ui with Pr(zk | ui)3. given the factor zk, pick an item pj with Pr(pj|zk)each observation is represented asusing Bayes’ rule, we can also obtain: kzkjikijizpuzupu )|Pr()|Pr()Pr(),Pr()|Pr()|Pr()Pr(),Pr( kjzkikjizpzuzpuk 
  134. 134. 135Model Fitting Maximizing the likelihood of user observations Estimating parameters using Expectation-Maximization(EM) algorithm Output of the EM algorithm:Pr(ui | zk), Pr(pj | zk), Pr(zk) Using Bayes’ rule, we can also obtain:Pr(zk | ui), Pr(zk | pj)  ji kji pu zkjkikjipujiijzpzuzUPpuUPPUL,,)|Pr()|Pr()Pr(log),Pr(log),(
  135. 135. 136Context-Based PatternsIdentify user segments For each context zk, find top users with the highest Pr(u|zk) as a usersegment. Applications: collaborative recommendation; market segmentationIdentify characteristic items or users w.r.t.each task Characteristic items: {pch : Pr(pch|zk)*Pr(zk|pch)  a} Characteristic user profiles: {uch : Pr(uch|zk)*Pr(zk|uch) } Applications: task/context based search; user or item classificationIdentify a given user’s context For each user (interaction) u, find tasks with the highest Pr(z|u) Allows for performing higher-level behavioral analysis (based ondiscovered tasks or contexts)
  136. 136. 137Methodological Note Two Web navigational data sets CTI: about 21,000 sessions, 700 pages, 30 tasks (defined), user historylength set to 4+, 20,000+ features Realty Data: about 5,000 sessions, 300 properties, user history length setto 4+, 8000+ features Experimental Methodology: Measure the accuracy of our recommendation system, compare it to astandard recommendation system based on first-order Markov model Use “hit ratio” to measure recommendation accuracy Hit ratio:Given a test session, use the first k items to generate a top-Nrecommendation set.If this set contains the k+1th item of the test session, we consider it ahit.HitRatio = totalHits / totalSessions(averaged over 10 runs in 10-fold cross-validation)
  137. 137. 138Examples of Inferred TasksA real user session (page listed in the order ofbeing visited)1 Admission main page2 Welcome information – Chinese version3 Admission info for international students4 Admission - requirements5 Admission – mail request6 Admission – orientation info7 Admission – F1 visa and I20 info8 Application – status check9 Online application - start10 Online application – step 111 Online application – step 212 Online application - finish13 Department main pageTop tasks given this user – Pr(task | user)Task 10 0.4527Task 21 0.3994Task 3 0.0489Task 26 0.0458PageNameDepartment main pageAdmission requirementsAdmission main pageAdmission costsProgramsOnline application – step 1…Admission – international studentsPageNameOnline application – startOnline application – step1Online application – step2Online application - finishOnline application - submit…Department main page
  138. 138. 139Distribution of Learned Tasks0%10%20%30%40%50%60%70%80%90%100%LT0 LT1 LT2 LT3 LT4 LT5 LT6 LT7 LT8 LT9 LT10 LT11 LT12 LT13 LT14 LT15 LT16 LT17 LT18 LT19 LT20 LT21 LT22 LT23 LT24 LT25 LT26 LT27 LT28 LT29OthersMT13MT12MT11MT10MT9MT8MT7MT6MT5MT4MT3MT2MT1MT0MT0 – MT13 were actual tasks, commonly performed by users on theWeb site, selected manually by domain experts
  139. 139. 140Task-Level User Tracking?Sliding window, W (with |W| = 4) moves from the beginning to the end of this usersession. Top 2 tasks and the corresponding values for Pr(task | W) are shown.1 2 3 4 5 6 7 8 9 10 11T10: 0.2 T21: 0.8T10: 0.1 T21: 0.9T10: 0 T21: 1T10: 0.6 T21: 0.4T10: 0.3 T21: 0.7T10: 0.8 T21 :0.2T10: 0.8 T21 :0.2A real user session involving tasks 10 (Admissions Info.) and Task 21 (Online Application)T10: 0.6 T21: 0.4T10: 0.7 T21: 0.3
  140. 140. 141Task/Context Prediction0. 2 3 4 5 6 7 8 9 10HitRatioNum of RecommendationsTask-Level versus Item-Level Prediction AccuracyTask Prediction Item Prediction
  141. 141. 142Contextual Modeling: The MaximumEntropy Framework A statistical model widely used in language learning, textmining (Rosenfeld 1994, Berger et al. 1996). Estimate probability distribution from the data Labeled training data used to derive a set of constraints for the model thatcharacterize class-specific expectations in the distribution Constraints represented as expected values of “features” which are real-valued functions of examples Goal: find a probability distribution which satisfies all the constraintsimposed on the data while maintaining maximum entropy Advantage: integration of multiple sources of knowledge or multiple constraints withoutsubjective assumptions or intervention. In this case we use ME to integrate learned contextual information into thepreference modeling process.
  142. 142. 143Using the Maximum Entropy Model The basic Maximum Entropy framework: First, Identify a set of feature functions that will be useful for the desiredtask (e.g., prediction or classification) Then, for each feature:Measure its expected value over the training dataTake this expected value to be a constraint for the modeldistribution In our model Define two sets of features1. features based on item-level transitions in users interactions2. features based on task-level transitions in user interactions Max. Ent. Framework uses both sets of features to compute Pr(pd | H(u))
  143. 143. 144Experimental Results0. 4 6 8 10HitRatioTop N RecommendationsHit Ratio Comparison on CTI DataMaximum EntropyFirst-order Markov Model0. 6 8 10 12 14HitRatio Top N RecommendationsHit Ratio Comparison on Realty DataMaximum Entropy First-order MarkovItem CF User CF
  144. 144. Example Implementation: InferringLatent Contexts From Social Annotation Context-Aware Music Recommendation Based on LatentTopic Sequential Patterns In the domain of music recommendation: Context is usually not fully observable Contextual information is dynamic and should be inferred from usersinteractions with the system such as:Liking/disliking playing songsAdding/skipping songsCreating different stations by selecting different track seeds or artists Different applications: Song recommendation Playlist generation Playlist recommendation145Hariri, Mobasher, Burke, RecSys 2012
  145. 145. Song RecommendationContext isreflected in thesequence ofsongsliked/disliked orplayed by theuser in hercurrentinteraction withthe system146
  146. 146. Playlist Generation1. The user selects an initialsequence of songs for theplaylist.2. The system infers user’scontext and recommends aset of songs.3. The user adds one of therecommendations (or a newsong outside therecommendation set) to theplaylist4. The system updates itsknowledge about the user’spreferences before the nextinteraction147
  147. 147. Topic Modeling For Song ContextRepresentation LDA topic modeling approach was used to map usersinteraction sequence to a sequence of latent topics whichcapture more general trends in users interests The latent topics are generated from the top most frequenttags associated with songs, obtained from social taggingWeb sites such asTags may characterize song features, user’s situation, mood, etc.Songs are taken as documents and tags as wordsAfter fitting the topic model for K topics, the probability distributionover topics can be inferred for any given songFor each song, the set of dominant topics are selected that haveprobabilities higher than a specific threshold value148
  148. 148. Top Most Frequent Tags for a Sample ofTopics149
  149. 149. Music Era Visualization by Topic-basedCo-Occurrence Analysis150
  150. 150. Music Genre Visualization by Topic-based Co-Occurrence Analysis151
  151. 151. Predicted topicsSong sequencesTop tags forsongsContext-AwareRecommenderNeighborhoodsinformationTopic-basedSequentialPatternsSequentialPatternMinerTopic Modeling ModuleTopicPredictionModuleTopic-basedsequencesUser’s activesessionUsers’ pastpreferencesHuman-compiledplaylists152
  152. 152. Sequential Pattern Mining andTopic Prediction Used to capture typical changes in the contextual statesover time Using a training set consisting of human-compiledplaylists, sequential patterns are mined over the set oflatent topics where each pattern represents a frequentsequence of transitions between topics/contexts Given a users current interaction as the sequence of last wsongs, the discovered patterns are used to predict thecontext for the next song The predicted context is then used to post-filter and re-rankthe recommendations produced based on the whole historyof the users preferences153
  153. 153. EvaluationDataset :28,963 user-contributed playlists from Art of the Mix website inJanuary 2003This dataset consists of 218,261 distinct songs for 48,169 distinctartistsTop tags were retrieved from the website for about 71,600songs in our databaseThe last w = 7 songs were selected as the users active session,the last song was removed and the dominant topics associatedwith that song were used as target set154
  154. 154. Topic Prediction Precision155
  155. 155. Topic Prediction Recall156
  156. 156. Song Recommendation Performance157
  157. 157. Example Implementation: Query-DrivenContext-Aware RecommendationIn some applications, context is representedas a subset of an item feature spaceContextual information is not pre-specifiedContextual information can be acquiredexplicitly acquired by directly asking users orimplicitly based on user behavior or user environmentExample Scenario: music recommender systemuser can specify his/her current interest in a specific genre of musicby providing that information in the form of a query (context)both the queried context and the established user profile are usedto recommend songs that best match users short-term situationalneeds as well as his or her long-term interests.158Hariri, Mobasher, Burke, 2013
  158. 158. Example PlaylistArtist Song Popular TagsLuce Good day rock, 2000s, happy, playful, feel goodmusic, music from the ocBen Howard Keep your head up acoustic, British, folk, indie, singer-songwriter, 2000s, alternative rockNada Surf Always love alternative, indie rock, 2000s, happyVega 4 Life is beautiful alternative, 2000s, indie rock, Grey’sanatomy, happyImagineDragonOn top of the world alternative, indie rock, 10s, happy, cheerfulchill easy listeningState Radio Right me up reggae-rock, chill, indie, happiness, fun,mellow, summer159If the user specifies “80s” as the context, ideal recommendations shouldmatch the users previous interests (happy and rock music ) as well as thequeried context (80s music).
  159. 159. A Probabilistic ModelA unified probabilistic model for contextualrecommendationmodel integrates user profiles, item descriptions, and contextualinformation.assumes that the contextual information can be represented usingthe same feature space underlying the representation of itemsOur model is an extension of the Latent DirichletAllocation (LDA). Each user profile is modeled as a mixture of the latent topics.These topics capture the feature associations as well as the itemco-occurrence relations in user profiles.160
  160. 160. Graphical representation of the context-aware recommendation model161
  161. 161. Context-aware recommendationmodel Model defines a generative process for user profiles. Theprocess for generating user p is as follows:162
  162. 162. InferenceSimilar to LDA models exact inference is notpossible in the modelWe use variational message passing (VMP) forapproximate inference and parameterestimationVMP carries out variational inference using localcomputations and message passing on the graphicalmodel.We implemented our model using Infer.Net librarywhich provides implementation for different inferencealgorithms including VMP163
  163. 163. Contextual RecommendationAlgorithmThe uniform contextual modeling recommendationframework enables us to systematically optimize therecommendations for the given contextGiven a users profile P, and context Q, the model computes therecommendation score for each item I as p(I | P, Q) and ranksitems based on these scores.The high dimensionality of this feature space can be aproblem for many of the popular context-awarerecommendation methodsOne of the advantages of our recommendation method is that theassociations between features are captured by the latent factors,and therefore, are integral part of the trained model.164
  164. 164. DatasetsArt of the mix dataset containing more than 28,963user compiled playlists.The dataset contains 218,261 songs.Top tags were retrieved for 73,045 songs in the dataset.8,769 playlists, each containing more than 10 songs that havetags. The selected playlists contain 86,262 unique songs.CiteULike dataset This dataset consists of the set of articles posted by each user,and the tags that the user used to post itData pruned to contain only users with more than 4 postings andarticles with frequency of at least 4The number of users and articles after pruning: 11,533 and 67,335,respectively165
  165. 165. Sample Topics for the PlaylistsDataset166Topic#1 Topic#2 Topic#3 Topic#4 Topic#5 Topic#6Electronic Alternative Soul Rock Metal RockElectronica Indie Rnb Comedy Rock BritpopAlternative Rock Pop Alternative Industrial BritishExperimental Shoegaze Dance Funny Hardcore IndieAmbient Upbeat Electronica Punk Alternative JazzIndie Nostalgia Rap Pop Metalcore SwingRock Amazing Funk Indie Dance AlternativeChillout Pop Chillout Silly German OldiesPsychedelic Punk Jazz Quirky Pop Beatles
  166. 166. Experiments5-fold cross validation experimentIn each run 80% of the songs in each playlist were selectedfor training the model and the remaining 20% were selectedfor testing and were removed from the users profiles.Given the users queries, competing algorithms provide aranked list of songs.Need user profiles, songs descriptions, anduser queriesSimulating the queries167
  167. 167. Hit ratio for Different Number ofRecommendations (CiteULike Dataset)168
  168. 168. Hit ratio for Different Number ofRecommendations (Playlists Dataset)169
  169. 169. Conclusions170
  170. 170. Conclusions Incorporating context in recommendation generation canimprove the effectiveness of recommender systems What does it take? In representational models: careful selection of relevant contextualattributes for the specific domain (the classic knowledge engineering task)& effective (but ad hoc) ways of dealing with the qualification problem In Interactional Models: effective methods for extraction of contextualcues from user behavior & ways of coping with domains that don’t lendthemselves to user interactions Work on Interactional Models Suggests: observable behavior is “conditioned” on the underlying context The context can be inferred (and predicted) effectively in certain kinds ofapplications The integration of semantic knowledge and user activity can be particularlyeffective in contextual user modeling171
  171. 171. Still many unanswered questions ….?172