Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Active Learning in                                            Recommender Systemshttp://4.bp.blogspot.com/_qFju91K89HM/SxR...
http://activeintelligence.org/research/al-rs/N. Rubens, D. Kaplan, M. Sugiyama.Recommender Systems Handbook: ActiveLearnin...
!"#$%%&&&()*+),-./0%.&12/-%(223410%41(.!,567   !"#$%%&&&()*+,*-.%#!-/-0%.12#0)23%4567884598%:      Passive Intelligence   ...
Why Need Useful Data?“If you put into the machine wrong figures, will the right answerscome out?I am not able rightly to ap...
What about Data Mining?We can sniff through the data and try to findsomething of value.Assumptionsa lot of data is availabl...
Obtaining Data could be “COSTLY”Medicine: diagnosis: pain, time, $ drug discovery: $$$, timeUser Interaction: effort, time...
Limitation of Traditional Recommender Systems Exploitation                         http://misspinkslip.files.wordpress.com/...
ExplorationFind out what your interests are                                    http://www.flickr.com/photos/luisorlando/268...
!"#$%&                                            5607&"8&.+2329"#$%&"(34&1,"-.*&%"/*01.0$2"                              ...
What is Useful depends on the Objective
Settings         )                  #!"#$%             !                      "
Not UsefulX2     X1          limited information
User Satisfaction                                                 Ratings                                                 ...
Coverage                       X2              X1                          X1Drawback user: exposed to items of no interest
[Settles, 2009]                                                                                               Prediction A...
• allow user to explore his/her interests       Usefulness/                                                Objectives• pre...
Doesn’t have to Bothersome
Active/Passive Learning                                     Passive Learning       training data          request         ...
AL Categories   Item-based ALanalyze items and select items that seem useful  Model-based ALanalyze model and select items...
Item-based AL            3R Properties                                        )Representedby the existing training set?   ...
Item Properties• Popular   [Rashid 2002]  (rated by many users)• High Variance in ratings           [Rashid 2002]  item th...
Model-based AL     Initial     Improve MarginX1   Improve Orientation
1   Model-error AL                                                                                 #                      ...
Parameter-Variance AL
Model Complexityas the number of training points increasesmore complex models tend to fit data better
Model Selection(a) under-fit                    (b) over-fit                 (c) appropriate fit          Figure 8: Dependenc...
(a) under-fit                           Model-Points Dependency                                    (b) over-fit         (c) ...
Black Box SettingsMay not have information/understanding about:                                                           ...
ou et al., 2000, Schuurmans, 1997]                              yx                                       Black Box Setting...
“Information is a difference which makes a difference”                           Gregory Bateson (anthropologist)Select tr...
Validity of Assumptions (is change in the output estimates good?)Changes in the estimates of the output               [Emp...
Criterion Accuracy     10      8      6∆G      4                              High values of criterion      2             ...
(δ ) =            −       +                                                                                    Interpretat...
Representative        ∑           ∈   ∗                               δ           δ                                       ...
Not Represented                ( δ          −                                     δ)            =                    −    ...
9                                                     Proposed                                                     A!optim...
Active Learning in Recommender Systems
Upcoming SlideShare
Loading in …5
×

Active Learning in Recommender Systems

3,041 views

Published on

Presentation given by Neil Rubens at the Centre for Database and Information Systems (Prof. Ricci), Free University of Bozen-Bolzano

For more information see http://activeintelligence.org/research/al-rs/

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Active Learning in Recommender Systems

  1. 1. Active Learning in Recommender Systemshttp://4.bp.blogspot.com/_qFju91K89HM/SxRpABd1DTI/AAAAAAAABjw/6LaSJfjfk-I/s1600/Unexpected_Guests.jpg Neil Rubens Active Intelligence Lab University of Electro-Communications
  2. 2. http://activeintelligence.org/research/al-rs/N. Rubens, D. Kaplan, M. Sugiyama.Recommender Systems Handbook: ActiveLearning in Recommender Systems (eds. P.B.Kantor, F. Ricci, L. Rokach,B. Shapira). Springer,2011.
  3. 3. !"#$%%&&&()*+),-./0%.&12/-%(223410%41(.!,567 !"#$%%&&&()*+,*-.%#!-/-0%.12#0)23%4567884598%: Passive Intelligence Active Intelligencedata is given Premise: given info is insufficientmodel is given active data acquisitiontask: self adaptation/reconfigurationlearn model’s parameters
  4. 4. Why Need Useful Data?“If you put into the machine wrong figures, will the right answerscome out?I am not able rightly to apprehend the kind of confusion of ideasthat could provoke such a question.” Charles BabbageGarbage In, Garbage Out(GIGO Principle) George Fuechsel
  5. 5. What about Data Mining?We can sniff through the data and try to findsomething of value.Assumptionsa lot of data is availablesome of the data is useful !"#$%%&&&()*+,-./,012-345%21#-67%*893+12%6:;*893+12!-5+< http://www.qualitydigest.com/sept06/articles/04_article.shtml
  6. 6. Obtaining Data could be “COSTLY”Medicine: diagnosis: pain, time, $ drug discovery: $$$, timeUser Interaction: effort, timeExpertise Elicitation: $, time Active Learning (AL)Goal: Estimate ‘Usefulness’ of the data before data is acquired
  7. 7. Limitation of Traditional Recommender Systems Exploitation http://misspinkslip.files.wordpress.com/2009/07/used-car-salesman.jpgRS often just tries to tell you what you want!!!
  8. 8. ExplorationFind out what your interests are http://www.flickr.com/photos/luisorlando/2688548978
  9. 9. !"#$%& 5607&"8&.+2329"#$%&"(34&1,"-.*&%"/*01.0$2" #$%&"()*&+,"-.*&%"/*01.0$2" !"
  10. 10. What is Useful depends on the Objective
  11. 11. Settings ) #!"#$% ! "
  12. 12. Not UsefulX2 X1 limited information
  13. 13. User Satisfaction Ratings positive negative X2X2 X2 X X X1 X1 user: not much variety, may get boredDrawback system: limited knowledge
  14. 14. Coverage X2 X1 X1Drawback user: exposed to items of no interest
  15. 15. [Settles, 2009] Prediction Accuracy 33 333 333 22 222 222 11 111 111 00 000 000 -1 -1 -1-1 -1 -1-1 -1 -2 -2 -2-2 -2 -2-2 -2 -3 -3 -3-3 -3 -3-3 -3 -4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444 (a)(a) (a) (b)(b) (b) (c)(c) (c) Actual Model Prediction Accuracy Prediction AccuracyFigure 2: 2: Anillustrative example(Random Sampling)learning. (a) A Atoydata set of o Figure 2: An illustrative exampleof ofpool-basedactive learning. (Active Learning) of Figure An illustrative exampleofpool-based active learning. (a) Atoy data set pool-based active (a) toy data set 400 instances, evenly sampled from two class Gaussians. The instances are 400 instances, evenly sampled from two class Gaussians. The instances are 400 instances, evenly sampled from two class Gaussians. The instances ar represented as aspointsin ina2D feature space. (b) A Alogisticregression model represented aspoints ina a2D feature space. (b) Alogistic regression model represented points 2D feature space. (b) logistic regression mode trained with 3030labeledinstances randomly drawn from the problem domain. trained with 30labeled instances randomly drawn from the problem domain. trained with labeled instances randomly drawn from the problem domain The line represents the decision boundary of of the classifier (70% accuracy).(c) The line represents the decision boundary ofthe classifier (70% accuracy). (c) The line represents the decision boundary the classifier (70% accuracy). (c A Alogisticregression model trained with 3030activelyqueried instances using Alogistic regression model trained with 30actively queried instances using logistic regression model trained with actively queried instances using uncertainty sampling (90%). uncertainty sampling (90%). uncertainty sampling (90%). Drawback user: exposed to items of no interest Figure 11illustrates the pool-based active learning cycle. A Alearnermay begin Figure 1illustrates the pool-based active learning cycle. Alearner may begin Figure illustrates the pool-based active learning cycle. learner may begin
  16. 16. • allow user to explore his/her interests Usefulness/ Objectives• prediction accuracy for (user or item)• maximize profit• maximize number of visits / time spent• minimize acquisition cost (# of ratings, implicit/explicit)• max system utility• minimize uncertainty• make it fun for the user• etc. objectives may overlap
  17. 17. Doesn’t have to Bothersome
  18. 18. Active/Passive Learning Passive Learning training data request Active Learning superviseduser training data learning approximated function
  19. 19. AL Categories Item-based ALanalyze items and select items that seem useful Model-based ALanalyze model and select items that seem useful
  20. 20. Item-based AL 3R Properties )Representedby the existing training set? # !"#$%e.g. (b) is already representedRepresentative !of others?e.g.(a) is not " !"#$%&Results in achieving objective?e.g. (d) -> max coverage[Rubens & Kaplan, 2010]
  21. 21. Item Properties• Popular [Rashid 2002] (rated by many users)• High Variance in ratings [Rashid 2002] item that people either like or hate• Best/Worst [Leino & Raiha 2007] ask user which items s/he likes most/least• Influential [Rubens & Sugiyama 2007] items on which ratings of many other items depend (Representative + Not Represented)
  22. 22. Model-based AL Initial Improve MarginX1 Improve Orientation
  23. 23. 1 Model-error AL # ##, %- 3 /)$*"+$, . .,/)-##,# 15 #" ( % - 3 2 !"#$"%& 1( 0 0$"1 3 3 14 16g : optimal function (in the sollution !"#$%&"(!)*+,space) Model Error – Cf : learned function constant and is ignoredfi ’s: learned functions from a slightlydifferent training set. Bias – BEG = B + V + C 2 Hard to estimate, but is assumedB = Ef (x) − g (x) to vanish (assymptotically). 2V = f − Ef (x) 2 Variance – VC = (g (x) − f (x)) Estimate and minize. 10 / 20
  24. 24. Parameter-Variance AL
  25. 25. Model Complexityas the number of training points increasesmore complex models tend to fit data better
  26. 26. Model Selection(a) under-fit (b) over-fit (c) appropriate fit Figure 8: Dependence between model complexity and accuracy.
  27. 27. (a) under-fit Model-Points Dependency (b) over-fit (c) appropriate fit Figure 8: Dependence between model complexity and accuracy.Training input points that are good for learning one model, are not necessary good for t Training input points that are good for learning one model, are not necessary good for the other. min G(X (T rain) ). X (T rain)
  28. 28. Black Box SettingsMay not have information/understanding about: ) # !"#$% !http://www.sps.ele.tue.nl/members/b.vries/research/research.html " !"#$%& Figure 1: Active Lear Model Points already possible from the training point in th
  29. 29. ou et al., 2000, Schuurmans, 1997] yx Black Box Settingst is [Evgeniou et al., 2000, Schuurmans, 1997] f (x) yx yx f (x) 11101010101111 01001001010011 x yx 01010110100010 yx = β · x 10101010011010 10100101001010 x yx yx = β · xrences yxniou, M. Pontil,is too complex Regularization networks and su The system and T. Poggio. Referencesx y machines.constantly in Computational Mathematics, 13(1):1–50, (and is Advances changing) T. Evgeniou, M. Pontil, and yx T. Poggio. Regularization netwourmans. A new y = β · x metric-based approach to model selection. In Procee vector machines. Advances in Computational Mathematics, 1 e.g. RS at Amazon, NetFlix: x Fourteenth National Conference on Artificial Intelligence (AAA 10,000’s lines of codes = β · x 552–558, 1997. yx D. Schuurmans. A new metric-based approach to model selection continuously changed by multiple teams Artificial Intellige of the Fourteenth National Conference on pages 552–558, 1997.
  30. 30. “Information is a difference which makes a difference” Gregory Bateson (anthropologist)Select training points based on their expected influence onthe output estimates Proposed Method Proposed Approach Proposed Method Proposed Approach(the only value accessible in Black-Box Settings). yt+1 yt+1 yt+1 yt+1 yt yt yt yt input index input index input index input indexa)a) Adding training point causes many b) Adding training point causes few Adding training point causes many b) Adding training point causes fewoutput estimates toto change. output estimates change. output estimates toto change. output estimates change.
  31. 31. Validity of Assumptions (is change in the output estimates good?)Changes in the estimates of the output [Empirical]values with regards to a new trainingpoint: 0.4 0.35 0.3 a) the estimate of the true 0.25 output value deteriorates P (yt+1 ) 0.2 relatively infrequent (16%, expected deterioration is 0.15 small) b) the estimate of the true 0.1 output value improves 0.05 most frequent case (84%) 0 c) the estimate of the true y y output value is overshoot yt+1 18 / 20
  32. 32. Criterion Accuracy 10 8 6∆G 4 High values of criterion 2 correspond to high improvements in accuracy 0 −2 0 0.5 1 1.5 2 2.5 3 3.5 2 yt − yt+1
  33. 33. (δ ) = − + Interpretation(δ ) = ∗ β −β + ( δ − δ) ∑ ∈ ∗ δ − =( δ − δ β ) − + δ − ( + δ δ) ( + δ δ) = ( + ), =( δ − δ β ) δ δ β . δ
  34. 34. Representative ∑ ∈ ∗ δ δ − = − ( + δ δ)≥ ∑ ∗ δ − ∈ δ≈ ∑ ∗ α ∑ δ ϕ ϕ . ∈ δ = + δ ∗ δ
  35. 35. Not Represented ( δ − δ) = − ( + δ δ) −δ δδ − δ ≈ α ∑ δ ϕ . = + δ {ϕ } =
  36. 36. 9 Proposed A!optimal D!optimal Evaluation E!optimal 8 Transductive Random Optimal 7Mean Squared Error 6 5 4 3 2 2 4 6 8 10 Training Set Size •system needs to be robust with respect toLimitations outliers •incremental re-training needs to be fast

×