Active Learning in Recommender Systems

Active Learning in
Recommender Systems

http://4.bp.blogspot.com/_qFju91K89HM/SxRpABd1DTI/AAAAAAAABjw/6LaSJfjfk-I/s1600/Unexpected_Guests.jpg

Neil Rubens
Active Intelligence Lab
University of Electro-Communications

http://activeintelligence.org/research/al-rs/
N. Rubens, D. Kaplan, M. Sugiyama.
Recommender Systems Handbook: Active
Learning in Recommender Systems (eds. P.B.
Kantor, F. Ricci, L. Rokach,B. Shapira). Springer,
2011.

!"#$%%&&&'()*+),-'./0%.&12/-%(223410%41(.'!,567 !"#$%%&&&'()*+,'*-.%#!-/-0%.12#0)23%4567884598%:

Passive Intelligence Active Intelligence
data is given Premise: given info is insufﬁcient
model is given
active data acquisition
task: self adaptation/reconﬁguration
learn model’s parameters

Why Need Useful Data?

“If you put into the machine wrong ﬁgures, will the right answers
come out?
I am not able rightly to apprehend the kind of confusion of ideas
that could provoke such a question.”
Charles Babbage

Garbage In, Garbage Out
(GIGO Principle)

George Fuechsel

What about Data Mining?

We can sniff through the data and try to ﬁnd
something of value.

Assumptions
a lot of data is available
some of the data is useful

!"#$%%&&&'()*+,-./,012-'345%21#-67%*893+12%6:;*893+1'2!-5+<
http://www.qualitydigest.com/sept06/articles/04_article.shtml

Obtaining Data could be “COSTLY”
Medicine:
diagnosis: pain, time, $
drug discovery: $$$, time

User Interaction:
effort, time

Expertise Elicitation:
$, time

Active Learning (AL)
Goal: Estimate ‘Usefulness’ of the data
before data is acquired

Limitation of Traditional Recommender Systems

Exploitation http://misspinkslip.ﬁles.wordpress.com/2009/07/used-car-salesman.jpg

RS often just tries to tell you what you want!!!

Exploration
Find out what your interests are

http://www.ﬂickr.com/photos/luisorlando/2688548978

!"#$%&

5607&"8&.+2329"

#$%&'"(34&1,"-.*&%"/*01.0$2"

#$%&'"()*&+,"-.*&%"/*01.0$2"

!"

What is Useful depends on the Objective

Settings

)

#
!"#$%'

!

"

Not Useful
X2

X1

limited information

User Satisfaction
Ratings
positive
negative

X2X2
X2

X
X
X1 X1

user: not much variety, may get bored
Drawback system: limited knowledge

Coverage

X2

X1 X1

Drawback user: exposed to items of no interest

[Settles, 2009]
Prediction Accuracy
33 333 333

22 222 222

11 111 111

00 000 000

-1 -1 -1-1 -1 -1-1 -1

-2 -2 -2-2 -2 -2-2 -2

-3 -3 -3-3 -3 -3-3 -3
-4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444 -4-4 -4 -2-2 -2 000 222 444
(a)(a)
(a) (b)(b)
(b) (c)(c)
(c)
Actual Model Prediction Accuracy Prediction Accuracy
Figure 2: 2: Anillustrative example(Random Sampling)learning. (a) A Atoydata set of o
Figure 2: An illustrative exampleof ofpool-basedactive learning. (Active Learning) of
Figure An illustrative exampleofpool-based active learning. (a) Atoy data set
pool-based active (a) toy data set
400 instances, evenly sampled from two class Gaussians. The instances are
400 instances, evenly sampled from two class Gaussians. The instances are
400 instances, evenly sampled from two class Gaussians. The instances ar
represented as aspointsin ina2D feature space. (b) A Alogisticregression model
represented aspoints ina a2D feature space. (b) Alogistic regression model
represented points 2D feature space. (b) logistic regression mode
trained with 3030labeledinstances randomly drawn from the problem domain.
trained with 30labeled instances randomly drawn from the problem domain.
trained with labeled instances randomly drawn from the problem domain
The line represents the decision boundary of of the classifier (70% accuracy).(c)
The line represents the decision boundary ofthe classifier (70% accuracy). (c)
The line represents the decision boundary the classifier (70% accuracy). (c
A Alogisticregression model trained with 3030activelyqueried instances using
Alogistic regression model trained with 30actively queried instances using
logistic regression model trained with actively queried instances using
uncertainty sampling (90%).
Drawback user: exposed to items of no interest
Figure 11illustrates the pool-based active learning cycle. A Alearnermay begin
Figure 1illustrates the pool-based active learning cycle. Alearner may begin
Figure illustrates the pool-based active learning cycle. learner may begin

• allow user to explore his/her interests Usefulness/
Objectives
• prediction accuracy for (user or item)
• maximize proﬁt
• maximize number of visits / time spent
• minimize acquisition cost (# of ratings, implicit/explicit)
• max system utility
• minimize uncertainty
• make it fun for the user
• etc.
objectives may overlap

Active/Passive Learning

Passive Learning
training data
request
Active Learning

supervised
user training data
learning approximated
function

AL Categories

Item-based AL
analyze items and select items that seem useful

Model-based AL
analyze model and select items that seem useful

Item-based AL

3R Properties
)
Represented
by the existing training set? #

!"#$%'
e.g. (b) is already represented
Representative !
of others?
e.g.(a) is not "
!"#$%&
Results in achieving objective?
e.g. (d) -> max coverage
[Rubens & Kaplan, 2010]

Item Properties
• Popular [Rashid 2002]

(rated by many users)
• High Variance in ratings [Rashid 2002]

item that people either like or hate
• Best/Worst [Leino & Raiha 2007]

ask user which items s/he likes most/least
• Inﬂuential [Rubens & Sugiyama 2007]

items on which ratings of many other items depend
(Representative + Not Represented)

Model-based AL

Initial

Improve Margin

X1 Improve Orientation

1
Model-error AL
#
##,
%-'
3 /)$*"+$, . .,/')-'##,#
15 '#"
( '%
- 3 2
!"#$"%&' 1( 0
0$"1
3 3
14 16

g : optimal function (in the sollution !"#$%&"'(!)*+,
space) Model Error – C
f : learned function constant and is ignored
fi ’s: learned functions from a slightly
diﬀerent training set. Bias – B
EG = B + V + C
2 Hard to estimate, but is assumed
B = Ef (x) − g (x) to vanish (assymptotically).
2
V = f − Ef (x)
2
Variance – V
C = (g (x) − f (x))
Estimate and minize.
10 / 20

Model Complexity

as the number of training points increases
more complex models tend to ﬁt data better

Model Selection

(a) under-fit (b) over-fit (c) appropriate fit

Figure 8: Dependence between model complexity and accuracy.

(a) under-fit Model-Points Dependency
(b) over-fit (c) appropriate fit

Figure 8: Dependence between model complexity and accuracy.

Training input points that are good for learning one model, are not necessary good for t
Training input points that are good for learning one model,
are not necessary good for the other.
min G(X (T rain) ).
X (T rain)

Black Box Settings

May not have information/understanding about:

)

#

!"#$%'
!
http://www.sps.ele.tue.nl/members/b.vries/research/research.html
"

!"#$%&

Figure 1: Active Lear

Model Points
already possible from the training point in th

ou et al., 2000, Schuurmans, 1997]
yx
Black Box Settings
t is [Evgeniou et al., 2000, Schuurmans, 1997]
f (x) yx
yx
f (x)
11101010101111
01001001010011 x yx
01010110100010 yx = β · x
10101010011010
10100101001010 x
yx yx = β · x
rences
yx
niou, M. Pontil,is too complex Regularization networks and su
The system and T. Poggio.
Referencesx y
machines.constantly in Computational Mathematics, 13(1):1–50,
(and is Advances changing)
T. Evgeniou, M. Pontil, and yx T. Poggio. Regularization netwo
urmans. A new y = β · x
metric-based approach to model selection. In Procee
vector machines. Advances in Computational Mathematics, 1
e.g. RS at Amazon, NetFlix:
x
Fourteenth National Conference on Artiﬁcial Intelligence (AAA
10,000’s lines of codes = β · x
552–558, 1997. yx
D. Schuurmans. A new metric-based approach to model selection
continuously changed by multiple teams Artiﬁcial Intellige
of the Fourteenth National Conference on
pages 552–558, 1997.

“Information is a difference which makes a difference”
Gregory Bateson (anthropologist)

Select training points based on their expected inﬂuence on
the output estimates Proposed Method Proposed Approach
Proposed Method Proposed Approach

(the only value accessible in Black-Box Settings).
yt+1 yt+1 yt+1 yt+1

yt yt yt yt
input index input index
input index input index

a)a) Adding training point causes many b) Adding training point causes few
Adding training point causes many b) Adding training point causes few
output estimates toto change.
output estimates change. output estimates toto change.
output estimates change.

Validity of Assumptions (is change in the output estimates good?)
Changes in the estimates of the output [Empirical]
values with regards to a new training
point: 0.4

0.35

0.3

a) the estimate of the true 0.25

output value deteriorates P (yt+1 )
0.2
relatively infrequent (16%,
expected deterioration is 0.15
small)
b) the estimate of the true 0.1

output value improves
0.05
most frequent case (84%)
0
c) the estimate of the true y y
output value is overshoot yt+1 18 / 20

Criterion Accuracy
10

8

6
∆G

4

High values of criterion
2
correspond to high improvements in accuracy

0

−2
0 0.5 1 1.5 2 2.5 3 3.5
2
yt − yt+1

(δ ) = − +
Interpretation

(δ ) = ∗
β −β +

( δ −
δ) ∑ ∈ ∗
δ
−
=( δ − δ β ) −
+ δ
−
( + δ δ) ( + δ δ)

= ( + ),

=( δ − δ β )

δ δ β .
δ

Representative

∑ ∈ ∗
δ δ
−
= −
( + δ δ)

≥ ∑ ∗
δ
−

∈ δ

≈ ∑ ∗ α ∑ δ ϕ ϕ .
∈ δ = +

δ

∗
δ

Not Represented

( δ −
δ)
= −
( + δ δ)

−
δ δ

δ
−
δ ≈
α ∑ δ ϕ .
= +

δ

{ϕ } =

9
Proposed
A!optimal
D!optimal
Evaluation
E!optimal
8 Transductive
Random
Optimal

7
Mean Squared Error

6

5

4

3

2
2 4 6 8 10
Training Set Size

•system needs to be robust with respect to
Limitations outliers
•incremental re-training needs to be fast

Active Learning in Recommender Systems

Recommended

Recommended

More Related Content

Similar to Active Learning in Recommender Systems

Similar to Active Learning in Recommender Systems (20)

More from Neil Rubens

More from Neil Rubens (15)

Recently uploaded

Recently uploaded (20)

Active Learning in Recommender Systems

Editor's Notes