Most large-scale commercial and social websites recommend options, such as products or people to connect with, to users. Recommendation engines sort through massive amounts of data to identify potential user preferences. This presentation, the first in a two-part series, explains the ideas behind recommendation systems and introduces you to the algorithms that power them. In Part 2, learn about some open source recommendation engines you can put to work.
5. Even
more
…
♣
Personalized
search
♣
"Computa;onal
adver;sing"
- 5 -
6. About
the
speaker
♣
Ramzi
Alqrainy
-‐ Ramzi
is
one
of
the
well
recognized
experts
within
Ar8ficial
Intelligence
and
Informa8on
Retrieval
fields
in
Middle
East.
Ac8ve
researcher
and
technology
blogger
with
the
focus
on
informa8on
retrieval.
- 6 -
7. Agenda
♣
What
are
recommender
systems
for?
-‐
Introduc8on
♣
How
do
they
work
(Part
I)
?
-‐
Collabora8ve
Filtering
♣
How
to
measure
their
success?
-‐
Evalua8on
techniques
♣
How
do
they
work
(Part
II)
?
-‐
Content-‐based
Filtering
-‐
Knowledge-‐Based
Recommenda8ons
-‐
Hybridiza8on
Strategies
♣
Advanced
topics
-‐
Explana8ons
-‐
Human
decision
making
- 7 -
9. Why
using
Recommender
Systems?
♣
Value
for
the
customer
-‐
Find
things
that
are
interes8ng
-‐
Narrow
down
the
set
of
choices
-‐
Help
me
explore
the
space
of
op8ons
-‐
Discover
new
things
-‐
Entertainment
-‐
…
♣
Value
for
the
provider
-‐
Addi8onal
and
probably
unique
personalized
service
for
the
customer
-‐
Increase
trust
and
customer
loyalty
-‐
Increase
sales,
click
trough
rates,
conversion
etc.
-‐
Opportuni8es
for
promo8on,
persuasion
-‐
Obtain
more
knowledge
about
customers
-‐
…
- 9 -
10. Real-‐world
check
♣
Myths
from
industry
-‐
Amazon.com
generates
X
percent
of
their
sales
through
the
recommenda8on
lists
(30
<
X
<
70)
-‐
NeVlix
(DVD
rental
and
movie
streaming)
generates
X
percent
of
their
sales
through
the
recommenda8on
lists
(30
<
X
<
70)
♣
There
must
be
some
value
in
it
-‐
See
recommenda8on
of
groups,
jobs
or
people
on
LinkedIn
-‐
Friend
recommenda8on
and
ad
personaliza8on
on
Facebook
-‐
Song
recommenda8on
at
last.fm
-‐
News
recommenda8on
at
Forbes.com
(plus
37%
CTR)
♣
Academia
-‐
A
few
studies
exist
that
show
the
effect
♣
increased
sales,
changes
in
sales
behavior
- 10 -
11. Problem
domain
♣
Recommenda;on
systems
(RS)
help
to
match
users
with
items
-‐
Ease
informa8on
overload
-‐
Sales
assistance
(guidance,
advisory,
persuasion,…)
RS
are
so)ware
agents
that
elicit
the
interests
and
preferences
of
individual
consumers
[…]
and
make
recommenda<ons
accordingly.
They
have
the
poten<al
to
support
and
improve
the
quality
of
the
decisions
consumers
make
while
searching
for
and
selec<ng
products
online.
»
[Xiao
&
Benbasat,
MISQ, 2007]
♣
Different
system
designs
/
paradigms
-‐
Based
on
availability
of
exploitable
data
-‐
Implicit
and
explicit
user
feedback
-‐
Domain
characteris8cs
- 11 -
12. Recommender
systems
♣
RS
seen
as
a
func;on
[AT05]
♣
Given:
-‐
User
model
(e.g.
ra8ngs,
preferences,
demographics,
situa8onal
context)
-‐
Items
(with
or
without
descrip8on
of
item
characteris8cs)
♣
Find:
-‐
Relevance
score.
Used
for
ranking.
♣
Finally:
-‐
Recommend
items
that
are
assumed
to
be
relevant
♣
But:
-‐
Remember
that
relevance
might
be
context-‐dependent
-‐
Characteris8cs
of
the
list
itself
might
be
important
(diversity)
- 12 -
13. Paradigms
of
recommender
systems
Recommender
systems
reduce
informa;on
overload
by
es;ma;ng
relevance
- 13 -
18. Paradigms
of
recommender
systems
Hybrid:
combina;ons
of
various
inputs
and/or
composi;on
of
different
mechanism
- 18 -
19. Recommender
systems:
basic
techniques
Pros
Cons
Collabora8ve
No
knowledge-‐
Requires
some
form
of
ra8ng
engineering
effort,
feedback,
cold
start
for
new
users
serendipity
of
results,
and
new
items
learns
market
segments
Content-‐based
No
community
required,
Content
descrip8ons
necessary,
comparison
between
cold
start
for
new
users,
no
items
possible
surprises
Knowledge-‐based
Determinis8c
recommenda8ons,
assured
quality,
no
cold-‐
start,
can
resemble
sales
dialogue
Knowledge
engineering
effort
to
bootstrap,
basically
sta8c,
does
not
react
to
short-‐term
trends
- 19 -
21. Collabora;ve
Filtering
(CF)
♣
The
most
prominent
approach
to
generate
recommenda;ons
-‐
used
by
large,
commercial
e-‐commerce
sites
-‐
well-‐understood,
various
algorithms
and
varia8ons
exist
-‐
applicable
in
many
domains
(book,
movies,
DVDs,
..)
♣
Approach
-‐
use
the
"wisdom
of
the
crowd"
to
recommend
items
♣
Basic
assump;on
and
idea
-‐
Users
give
ra8ngs
to
catalog
items
(implicitly
or
explicitly)
-‐
Customers
who
had
similar
tastes
in
the
past,
will
have
similar
tastes
in
the
future
- 21 -
22. User-‐based
nearest-‐neighbor
collabora;ve
filtering
(1)
♣
The
basic
technique:
-‐
Given
an
"ac8ve
user"
(Alice)
and
an
item
I
not
yet
seen
by
Alice
-‐
The
goal
is
to
es<mate
Alice's
ra<ng
for
this
item,
e.g.,
by
♣
find
a
set
of
users
(peers)
who
liked
the
same
items
as
Alice
in
the
past
and
who
have
rated
item
I
♣
use,
e.g.
the
average
of
their
ra8ngs
to
predict,
if
Alice
will
like
item
I
♣
do
this
for
all
items
Alice
has
not
seen
and
recommend
the
best-‐rated
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
?
User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 24 -
23. User-‐based
nearest-‐neighbor
collabora;ve
filtering
(2)
♣
Some
first
ques;ons
-‐
How
do
we
measure
similarity?
-‐
How
many
neighbors
should
we
consider?
-‐
How
do
we
generate
a
predic8on
from
the
neighbors'
ra8ngs?
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
?
User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 25 -
24. Measuring
user
similarity
♣
A
popular
similarity
measure
in
user-‐based
CF:
Pearson
correla;on
a,
b
:
users
ra,p
:
ra8ng
of
user
a
for
item
p
P
:
set
of
items,
rated
both
by
a
and
b
Possible
similarity
values
between
-‐1
and
1;
!!, !!=
user's
average
ra8ngs
Item1
Alice
5
User1
3
Item2
Item3
3
4
1
2
Item4
Item5
4
?
3
3
sim
=
0,85
sim
=
0,70
sim
=
-‐0,79
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 26 -
25. Pearson
correla;on
♣
Takes
differences
in
ra;ng
behavior
into
account
6
5
4
Ratings
3
2
1
0
Item1 Item2
Alice
User1
User4
Item3 Item4
♣
Works
well
in
usual
domains,
compared
with
alterna;ve
measures
-‐
such
as
cosine
similarity
- 27 -
26. Making
predic;ons
♣
A
common
predic8on
func8on:
♣
Calculate,
whether
the
neighbors'
ra8ngs
for
the
unseen
item
i
are
higher
or
lower
than
their
average
♣
Combine
the
ra8ng
differences
-‐
use
the
similarity
as
a
weight
♣
Add/subtract
the
neighbors'
bias
from
the
ac8ve
user's
average
and
use
this
as
a
predic8on
- 28 -
27. Making
recommenda;ons
♣
Making
predic;ons
is
typically
not
the
ul;mate
goal
♣
Usual
approach
(in
academia)
-‐
Rank
items
based
on
their
predicted
ra8ngs
♣
However
-‐
This
might
lead
to
the
inclusion
of
(only)
niche
items
-‐
In
prac;ce
also:
Take
item
popularity
into
account
♣
Approaches
-‐
"Learning
to
rank"
♣
Op8mize
according
to
a
given
rank
evalua8on
metric
(see
later)
- 29 -
28. Improving
the
metrics
/
predic;on
func;on
♣
Not
all
neighbor
ra;ngs
might
be
equally
"valuable"
-‐
Agreement
on
commonly
liked
items
is
not
so
informa8ve
as
agreement
on
controversial
items
-‐
Possible
solu;on:
Give
more
weight
to
items
that
have
a
higher
variance
♣
Value
of
number
of
co-‐rated
items
-‐
Use
"significance
weigh8ng",
by
e.g.,
linearly
reducing
the
weight
when
the
number
of
co-‐rated
items
is
low
♣
Case
amplifica;on
-‐
Intui8on:
Give
more
weight
to
"very
similar"
neighbors,
i.e.,
where
the
similarity
value
is
close
to
1.
♣
Neighborhood
selec;on
-‐
Use
similarity
threshold
or
fixed
number
of
neighbors
- 30 -
29. Memory-‐based
and
model-‐based
approaches
♣
User-‐based
CF
is
said
to
be
"memory-‐based"
-‐
the
ra8ng
matrix
is
directly
used
to
find
neighbors
/
make
predic8ons
-‐
does
not
scale
for
most
real-‐world
scenarios
-‐
large
e-‐commerce
sites
have
tens
of
millions
of
customers
and
millions
of
items
♣
Model-‐based
approaches
-‐
based
on
an
offline
pre-‐processing
or
"model-‐learning"
phase
-‐
at
run-‐8me,
only
the
learned
model
is
used
to
make
predic8ons
-‐
models
are
updated
/
re-‐trained
periodically
-‐
large
variety
of
techniques
used
-‐
model-‐building
and
upda8ng
can
be
computa8onally
expensive
- 31 -
30. Item-‐based
collabora;ve
filtering
♣
Basic
idea:
-‐
Use
the
similarity
between
items
(and
not
users)
to
make
predic8ons
♣
Example:
-‐
Look
for
items
that
are
similar
to
Item5
-‐
Take
Alice's
ra8ngs
for
these
items
to
predict
the
ra8ng
for
Item5
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
?
User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
- 33 -
31. The
cosine
similarity
measure
♣
Produces
beber
results
in
item-‐to-‐item
filtering
-‐
for
some
datasets,
no
consistent
picture
in
literature
♣
Ra;ngs
are
seen
as
vector
in
n-‐dimensional
space
♣
Similarity
is
calculated
based
on
the
angle
between
the
vectors
♣
Adjusted
cosine
similarity
-‐
take
average
user
ra8ngs
into
account,
transform
the
original
ra8ngs
- 34 -
32. Pre-‐processing
for
item-‐based
filtering
♣
Item-‐based
filtering
does
not
solve
the
scalability
problem
itself
♣
Pre-‐processing
approach
by
Amazon.com
(in
2003)
-‐
Calculate
all
pair-‐wise
item
similari8es
in
advance
-‐
The
neighborhood
to
be
used
at
run-‐8me
is
typically
rather
small,
because
only
items
are
taken
into
account
which
the
user
has
rated
-‐
Item
similari8es
are
supposed
to
be
more
stable
than
user
similari8es
♣
Memory
requirements
-‐
Up
to
N2
pair-‐wise
similari8es
to
be
memorized
(N
=
number
of
items)
in
theory
-‐
In
prac8ce,
this
is
significantly
lower
(items
with
no
co-‐ra8ngs)
-‐
Further
reduc8ons
possible
♣
Minimum
threshold
for
co-‐ra8ngs
(items,
which
are
rated
at
least
by
n
users)
♣
Limit
the
size
of
the
neighborhood
(might
affect
recommenda8on
accuracy)
- 35 -
33. More
on
ra;ngs
♣
Pure
CF-‐based
systems
only
rely
on
the
ra;ng
matrix
♣
Explicit
ra;ngs
-‐
Most
commonly
used
(1
to
5,
1
to
7
Likert
response
scales)
-‐
Research
topics
♣
"Op8mal"
granularity
of
scale;
indica8on
that
10-‐point
scale
is
berer
accepted
in
movie
domain
♣
Mul8dimensional
ra8ngs
(mul8ple
ra8ngs
per
movie)
-‐
Challenge
♣
Users
not
always
willing
to
rate
many
items;
sparse
ra8ng
matrices
♣
How
to
s8mulate
users
to
rate
more
items?
♣
Implicit
ra;ngs
-‐
clicks,
page
views,
8me
spent
on
some
page,
demo
downloads
…
-‐
Can
be
used
in
addi8on
to
explicit
ones;
ques8on
of
correctness
of
interpreta8on
- 36 -
34. Data
sparsity
problems
♣
Cold
start
problem
-‐
How
to
recommend
new
items?
What
to
recommend
to
new
users?
♣
Straigheorward
approaches
-‐
Ask/force
users
to
rate
a
set
of
items
-‐
Use
another
method
(e.g.,
content-‐based,
demographic
or
simply
non-‐
personalized)
in
the
ini8al
phase
♣
Alterna;ves
-‐
Use
berer
algorithms
(beyond
nearest-‐neighbor
approaches)
-‐
Example:
♣
In
nearest-‐neighbor
approaches,
the
set
of
sufficiently
similar
neighbors
might
be
to
small
to
make
good
predic8ons
♣
Assume
"transi8vity"
of
neighborhoods
- 37 -
35. Example
algorithms
for
sparse
datasets
♣
Recursive
CF
-‐
Assume
there
is
a
very
close
neighbor
n
of
u
who
however
has
not
rated
the
target
item
i
yet.
-‐
Idea:
♣
Apply
CF-‐method
recursively
and
predict
a
ra8ng
for
item
i
for
the
neighbor
♣
Use
this
predicted
ra8ng
instead
of
the
ra8ng
of
a
more
distant
direct
neighbor
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
?
sim
=
0,85
User1
3
1
2
3
?
User2
4
3
4
3
5
Predict
User3
3
3
User4
1
5
1
5
4
ra8ng
for
User1
5
2
1
- 38 -
36. Graph-‐based
methods
♣
"Spreading
ac;va;on"
(sketch)
-‐
Idea:
Use
paths
of
lengths
>
3
to
recommend
items
-‐
Length
3:
Recommend
Item3
to
User1
-‐
Length
5:
Item1
also
recommendable
- 39 -
37. More
model-‐based
approaches
♣
Plethora
of
different
techniques
proposed
in
the
last
years,
e.g.,
-‐
Matrix
factoriza8on
techniques,
sta8s8cs
♣
singular
value
decomposi8on,
principal
component
analysis
-‐
Associa8on
rule
mining
♣
compare:
shopping
basket
analysis
-‐
Probabilis8c
models
♣
clustering
models,
Bayesian
networks,
probabilis8c
Latent
Seman8c
Analysis
-‐
Various
other
machine
learning
approaches
♣
Costs
of
pre-‐processing
-‐
Usually
not
discussed
-‐
Incremental
updates
possible?
- 40 -
38. Matrix
factoriza;on
•
SVD:
Uk
Dim1
Alice
0.47
M
k
Dim2
-‐0.30
T
=U×Σ×V
k k k
T
Vk
Dim1
-‐0.44
-‐0.57
0.06
0.38
0.57
Bob
-‐0.44
0.23
Dim2
0.58
-‐0.66
0.26
0.18
-‐0.36
Mary
0.70
-‐0.06
Sue
0.31
0.93
Σ
Dim1
Dim2
k
•
Predic;on:
r
ui
=
r +U (Alice)×Σ
u k
T
×V (EPL)
k k
Dim1
5.63
0
Dim2
0
3.23
=
3
+
0.84
=
3.84
- 43 -
39. Associa;on
rule
mining
♣
Commonly
used
for
shopping
behavior
analysis
-‐
aims
at
detec8on
of
rules
such
as
"If
a
customer
purchases
baby-‐food
then
he
also
buys
diapers
in
70%
of
the
cases"
♣
Associa;on
rule
mining
algorithms
-‐
can
detect
rules
of
the
form
X
=>
Y
(e.g.,
baby-‐food
=>
diapers)
from
a
set
of
sales
transac8ons
D
=
{t1,
t2,
…
tn}
-‐
measure
of
quality:
support,
confidence
- 44 -
40. Probabilis;c
methods
♣
Basic
idea
(simplis;c
version
for
illustra;on):
-‐
given
the
user/item
ra8ng
matrix
-‐
determine
the
probability
that
user
Alice
will
like
an
item
i
-‐
base
the
recommenda8on
on
such
these
probabili8es
♣
Calcula;on
of
ra;ng
probabili;es
based
on
Bayes
Theorem
-‐
How
probable
is
ra8ng
value
"1"
for
Item5
given
Alice's
previous
ra8ngs?
-‐
Corresponds
to
condi8onal
probability
P(Item5=1
|
X),
where
♣
X
=
Alice's
previous
ra8ngs
=
(Item1
=1,
Item2=3,
Item3=
…
)
-‐
Can
be
es8mated
based
on
Bayes'
Theorem
♣
Usually
more
sophis;cated
methods
used
-‐
Clustering
-‐
pLSA
…
- 45 -
41. Summarizing
recent
methods
♣
Recommenda;on
is
concerned
with
learning
from
noisy
observa;ons
(x,
y),
where
f(x)=ŷ
2
has
to
be
determined
such
that
is
minimal.
∑
ˆ
( ˆ -y)
♣
A
variety
of
different
learning
strategies
have
been
applied
trying
to
es;mate
f(x)
-‐
Non
parametric
neighborhood
models
-‐
MF
models,
SVMs,
Neural
Networks,
Bayesian
Networks,…
- 48 -
42. Collabora;ve
Filtering
Issues
♣
Pros:
-‐
well-‐understood,
works
well
in
some
domains,
no
knowledge
engineering
required
♣
Cons:
-‐
requires
user
community,
sparsity
problems,
no
integra8on
of
other
knowledge
sources,
no
explana8on
of
results
♣
What
is
the
best
CF
method?
-‐
In
which
situa8on
and
which
domain?
Inconsistent
findings;
always
the
same
domains
and
data
sets;
differences
between
methods
are
o|en
very
small
(1/100)
♣
How
to
evaluate
the
predic;on
quality?
-‐
MAE
/
RMSE:
What
does
an
MAE
of
0.7
actually
mean?
-‐
Serendipity:
Not
yet
fully
understood
♣
What
about
mul;-‐dimensional
ra;ngs?
- 49 -
44. Recommender
Systems
in
e-‐Commerce
♣
One
Recommender
Systems
research
ques;on
-‐
What
should
be
in
that
list?
- 51 -
45. Recommender
Systems
in
e-‐Commerce
♣
Another
ques;on
both
in
research
and
prac;ce
-‐
How
do
we
know
that
these
are
good
recommenda8ons?
- 52 -
46. Recommender
Systems
in
e-‐Commerce
♣
This
might
lead
to
…
-‐
What
is
a
good
recommenda8on?
-‐
What
is
a
good
recommenda8on
strategy?
-‐
What
is
a
good
recommenda8on
strategy
for
my
business?
These have been in stock for quite a while now …
- 53 -
47. What
is
a
good
recommenda;on?
What
are
the
measures
in
prac;ce?
♣
Total
sales
numbers
♣
Promo;on
of
certain
items
♣ …
♣
Click-‐through-‐rates
♣
Interac;vity
on
plaeorm
♣ …
♣
Customer
return
rates
♣
Customer
sa;sfac;on
and
loyalty
- 54 -