Active Learning in Recommender Systems
Upcoming SlideShare
Loading in...5
×
 

Active Learning in Recommender Systems

on

  • 2,556 views

Presentation given by Neil Rubens at the Centre for Database and Information Systems (Prof. Ricci), Free University of Bozen-Bolzano ...

Presentation given by Neil Rubens at the Centre for Database and Information Systems (Prof. Ricci), Free University of Bozen-Bolzano

For more information see http://activeintelligence.org/research/al-rs/

Statistics

Views

Total Views
2,556
Views on SlideShare
1,790
Embed Views
766

Actions

Likes
1
Downloads
58
Comments
0

1 Embed 766

http://activeintelligence.org 766

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Thank you for Prof. Ricci for his kind invitation.\nToday I would like to connivence you that Active Learning is something of value, and that is very well suited for recommender systems in particular.\n\n\n\n\n\n
  • \n
  • \n
  • \n
  • \n
  • It seems that DM may offers some relief, so why do we need to care about obtaining data of high quality?\n
  • \n\nDeath Of A Pushy Salesman, Business Week, 2006\nhttp://www.businessweek.com/magazine/content/06_27/b3991084.htm\n
  • Often it tries to sell you something, w/o trying to find out what you like.\nIt is a rather greedy approach trying to optimize immediate payoff.\nsome people may get turned off by bad recommendations and never come back to the system.\n\n\nWell, unless I am into cross dressing; these items are not of much use to me.\nAlthough, RS may have 50% success rate with the above strategy.\n\n
  • The goal of recommender systems is to personalize recommendations.\nSo it really would not hurt to spend some time on trying to find out what your interests are. It may not pay off in the short term; but may pay off quite well in the long term.\n
  • Luckily RS are starting to trying to learn more about their users.\n
  • \n
  • we consider overexagerated example, in which we can ask user to watch a movie and rate it\n
  • let me start by giving an example of something that is not useful\n
  • This strategy may be efficient in the short term; but may be not so much in the long term\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Active Learning in Recommender Systems Active Learning in Recommender Systems Presentation Transcript

  • Active Learning in Recommender Systemshttp://4.bp.blogspot.com/_qFju91K89HM/SxRpABd1DTI/AAAAAAAABjw/6LaSJfjfk-I/s1600/Unexpected_Guests.jpg Neil Rubens Active Intelligence Lab University of Electro-Communications
  • http://activeintelligence.org/research/al-rs/N. Rubens, D. Kaplan, M. Sugiyama.Recommender Systems Handbook: ActiveLearning in Recommender Systems (eds. P.B.Kantor, F. Ricci, L. Rokach,B. Shapira). Springer,2011.
  • !"#$%%&&&()*+),-./0%.&12/-%(223410%41(.!,567 !"#$%%&&&()*+,*-.%#!-/-0%.12#0)23%4567884598%: Passive Intelligence Active Intelligencedata is given Premise: given info is insufficientmodel is given active data acquisitiontask: self adaptation/reconfigurationlearn model’s parameters
  • Why Need Useful Data?“If you put into the machine wrong figures, will the right answerscome out?I am not able rightly to apprehend the kind of confusion of ideasthat could provoke such a question.” Charles BabbageGarbage In, Garbage Out(GIGO Principle) George Fuechsel
  • What about Data Mining?We can sniff through the data and try to findsomething of value.Assumptionsa lot of data is availablesome of the data is useful !"#$%%&&&()*+,-./,012-345%21#-67%*893+12%6:;*893+12!-5+< http://www.qualitydigest.com/sept06/articles/04_article.shtml
  • Obtaining Data could be “COSTLY”Medicine: diagnosis: pain, time, $ drug discovery: $$$, timeUser Interaction: effort, timeExpertise Elicitation: $, time Active Learning (AL)Goal: Estimate ‘Usefulness’ of the data before data is acquired
  • Limitation of Traditional Recommender Systems Exploitation http://misspinkslip.files.wordpress.com/2009/07/used-car-salesman.jpgRS often just tries to tell you what you want!!!
  • ExplorationFind out what your interests are http://www.flickr.com/photos/luisorlando/2688548978
  • !"#$%& 5607&"8&.+2329"#$%&"(34&1,"-.*&%"/*01.0$2" #$%&"()*&+,"-.*&%"/*01.0$2" !"
  • What is Useful depends on the Objective
  • Settings ) #!"#$% ! "
  • Not UsefulX2 X1 limited information
  • User Satisfaction Ratings positive negative X2X2 X2 X X X1 X1 user: not much variety, may get boredDrawback system: limited knowledge
  • Coverage X2 X1 X1Drawback user: exposed to items of no interest
  • [Settles, 2009] Prediction Accuracy 33 333 333 22 222 222 11 111 111 00 000 000 -1 -1 -1-1 -1 -1-1 -1 -2 -2 -2-2 -2 -2-2 -2 -3 -3 -3-3 -3 -3-3 -3 -4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444 (a)(a) (a) (b)(b) (b) (c)(c) (c) Actual Model Prediction Accuracy Prediction AccuracyFigure 2: 2: Anillustrative example(Random Sampling)learning. (a) A Atoydata set of o Figure 2: An illustrative exampleof ofpool-basedactive learning. (Active Learning) of Figure An illustrative exampleofpool-based active learning. (a) Atoy data set pool-based active (a) toy data set 400 instances, evenly sampled from two class Gaussians. The instances are 400 instances, evenly sampled from two class Gaussians. The instances are 400 instances, evenly sampled from two class Gaussians. The instances ar represented as aspointsin ina2D feature space. (b) A Alogisticregression model represented aspoints ina a2D feature space. (b) Alogistic regression model represented points 2D feature space. (b) logistic regression mode trained with 3030labeledinstances randomly drawn from the problem domain. trained with 30labeled instances randomly drawn from the problem domain. trained with labeled instances randomly drawn from the problem domain The line represents the decision boundary of of the classifier (70% accuracy).(c) The line represents the decision boundary ofthe classifier (70% accuracy). (c) The line represents the decision boundary the classifier (70% accuracy). (c A Alogisticregression model trained with 3030activelyqueried instances using Alogistic regression model trained with 30actively queried instances using logistic regression model trained with actively queried instances using uncertainty sampling (90%). uncertainty sampling (90%). uncertainty sampling (90%). Drawback user: exposed to items of no interest Figure 11illustrates the pool-based active learning cycle. A Alearnermay begin Figure 1illustrates the pool-based active learning cycle. Alearner may begin Figure illustrates the pool-based active learning cycle. learner may begin
  • • allow user to explore his/her interests Usefulness/ Objectives• prediction accuracy for (user or item)• maximize profit• maximize number of visits / time spent• minimize acquisition cost (# of ratings, implicit/explicit)• max system utility• minimize uncertainty• make it fun for the user• etc. objectives may overlap
  • Doesn’t have to Bothersome
  • Active/Passive Learning Passive Learning training data request Active Learning superviseduser training data learning approximated function
  • AL Categories Item-based ALanalyze items and select items that seem useful Model-based ALanalyze model and select items that seem useful
  • Item-based AL 3R Properties )Representedby the existing training set? # !"#$%e.g. (b) is already representedRepresentative !of others?e.g.(a) is not " !"#$%&Results in achieving objective?e.g. (d) -> max coverage[Rubens & Kaplan, 2010]
  • Item Properties• Popular [Rashid 2002] (rated by many users)• High Variance in ratings [Rashid 2002] item that people either like or hate• Best/Worst [Leino & Raiha 2007] ask user which items s/he likes most/least• Influential [Rubens & Sugiyama 2007] items on which ratings of many other items depend (Representative + Not Represented)
  • Model-based AL Initial Improve MarginX1 Improve Orientation
  • 1 Model-error AL # ##, %- 3 /)$*"+$, . .,/)-##,# 15 #" ( % - 3 2 !"#$"%& 1( 0 0$"1 3 3 14 16g : optimal function (in the sollution !"#$%&"(!)*+,space) Model Error – Cf : learned function constant and is ignoredfi ’s: learned functions from a slightlydifferent training set. Bias – BEG = B + V + C 2 Hard to estimate, but is assumedB = Ef (x) − g (x) to vanish (assymptotically). 2V = f − Ef (x) 2 Variance – VC = (g (x) − f (x)) Estimate and minize. 10 / 20
  • Parameter-Variance AL
  • Model Complexityas the number of training points increasesmore complex models tend to fit data better
  • Model Selection(a) under-fit (b) over-fit (c) appropriate fit Figure 8: Dependence between model complexity and accuracy.
  • (a) under-fit Model-Points Dependency (b) over-fit (c) appropriate fit Figure 8: Dependence between model complexity and accuracy.Training input points that are good for learning one model, are not necessary good for t Training input points that are good for learning one model, are not necessary good for the other. min G(X (T rain) ). X (T rain)
  • Black Box SettingsMay not have information/understanding about: ) # !"#$% !http://www.sps.ele.tue.nl/members/b.vries/research/research.html " !"#$%& Figure 1: Active Lear Model Points already possible from the training point in th
  • ou et al., 2000, Schuurmans, 1997] yx Black Box Settingst is [Evgeniou et al., 2000, Schuurmans, 1997] f (x) yx yx f (x) 11101010101111 01001001010011 x yx 01010110100010 yx = β · x 10101010011010 10100101001010 x yx yx = β · xrences yxniou, M. Pontil,is too complex Regularization networks and su The system and T. Poggio. Referencesx y machines.constantly in Computational Mathematics, 13(1):1–50, (and is Advances changing) T. Evgeniou, M. Pontil, and yx T. Poggio. Regularization netwourmans. A new y = β · x metric-based approach to model selection. In Procee vector machines. Advances in Computational Mathematics, 1 e.g. RS at Amazon, NetFlix: x Fourteenth National Conference on Artificial Intelligence (AAA 10,000’s lines of codes = β · x 552–558, 1997. yx D. Schuurmans. A new metric-based approach to model selection continuously changed by multiple teams Artificial Intellige of the Fourteenth National Conference on pages 552–558, 1997.
  • “Information is a difference which makes a difference” Gregory Bateson (anthropologist)Select training points based on their expected influence onthe output estimates Proposed Method Proposed Approach Proposed Method Proposed Approach(the only value accessible in Black-Box Settings). yt+1 yt+1 yt+1 yt+1 yt yt yt yt input index input index input index input indexa)a) Adding training point causes many b) Adding training point causes few Adding training point causes many b) Adding training point causes fewoutput estimates toto change. output estimates change. output estimates toto change. output estimates change.
  • Validity of Assumptions (is change in the output estimates good?)Changes in the estimates of the output [Empirical]values with regards to a new trainingpoint: 0.4 0.35 0.3 a) the estimate of the true 0.25 output value deteriorates P (yt+1 ) 0.2 relatively infrequent (16%, expected deterioration is 0.15 small) b) the estimate of the true 0.1 output value improves 0.05 most frequent case (84%) 0 c) the estimate of the true y y output value is overshoot yt+1 18 / 20
  • Criterion Accuracy 10 8 6∆G 4 High values of criterion 2 correspond to high improvements in accuracy 0 −2 0 0.5 1 1.5 2 2.5 3 3.5 2 yt − yt+1
  • (δ ) = − + Interpretation(δ ) = ∗ β −β + ( δ − δ) ∑ ∈ ∗ δ − =( δ − δ β ) − + δ − ( + δ δ) ( + δ δ) = ( + ), =( δ − δ β ) δ δ β . δ
  • Representative ∑ ∈ ∗ δ δ − = − ( + δ δ)≥ ∑ ∗ δ − ∈ δ≈ ∑ ∗ α ∑ δ ϕ ϕ . ∈ δ = + δ ∗ δ
  • Not Represented ( δ − δ) = − ( + δ δ) −δ δδ − δ ≈ α ∑ δ ϕ . = + δ {ϕ } =
  • 9 Proposed A!optimal D!optimal Evaluation E!optimal 8 Transductive Random Optimal 7Mean Squared Error 6 5 4 3 2 2 4 6 8 10 Training Set Size •system needs to be robust with respect toLimitations outliers •incremental re-training needs to be fast