Recommender Systems, Part 1 - Introduction to approaches and algorithms

Recommender
Systems

Ramzi
Alqrainy

Search
Guru

ramzi.alqrainy@gmail.com

- 1 -

Recommender
Systems

♣

Applica;on
areas

- 3 -

In
the
Social
Web

- 4 -

Even
more
…

♣

Personalized
search

♣

"Computa;onal
adver;sing"

- 5 -

About
the
speaker

♣

Ramzi
Alqrainy

-‐  Ramzi
is
one
of
the
well
recognized
experts
within
Ar8ﬁcial
Intelligence
and

Informa8on
Retrieval
ﬁelds
in
Middle
East.
Ac8ve
researcher
and
technology
blogger

with
the
focus
on
informa8on
retrieval.

- 6 -

Agenda

♣

What
are
recommender
systems
for?

-‐
Introduc8on

♣

How
do
they
work
(Part
I)
?

-‐
Collabora8ve
Filtering

♣

How
to
measure
their
success?

-‐
Evalua8on
techniques

♣

How
do
they
work
(Part
II)
?

-‐
Content-‐based
Filtering

-‐
Knowledge-‐Based
Recommenda8ons

-‐
Hybridiza8on
Strategies

♣

Advanced
topics

-‐
Explana8ons

-‐
Human
decision
making

- 7 -

Why
using
Recommender
Systems?

♣

Value
for
the
customer

-‐
Find
things
that
are
interes8ng

-‐
Narrow
down
the
set
of
choices

-‐
Help
me
explore
the
space
of
op8ons

-‐
Discover
new
things

-‐
Entertainment

-‐

…

♣

Value
for
the
provider

-‐
Addi8onal
and
probably
unique
personalized
service
for
the
customer

-‐
Increase
trust
and
customer
loyalty

-‐
Increase
sales,
click
trough
rates,
conversion
etc.

-‐
Opportuni8es
for
promo8on,
persuasion

-‐
Obtain
more
knowledge
about
customers

-‐

…

- 9 -

Real-‐world
check

♣

Myths
from
industry

-‐
Amazon.com
generates
X
percent
of
their
sales
through
the
recommenda8on

lists
(30
<
X
<
70)

-‐
NeVlix
(DVD
rental
and
movie
streaming)
generates
X
percent
of
their
sales

through
the
recommenda8on
lists
(30
<
X
<
70)

♣

There
must
be
some
value
in
it

-‐
See
recommenda8on
of
groups,
jobs
or
people
on
LinkedIn

-‐
Friend
recommenda8on
and
ad
personaliza8on
on
Facebook

-‐
Song
recommenda8on
at
last.fm

-‐
News
recommenda8on
at
Forbes.com
(plus
37%
CTR)
♣

Academia

-‐
A
few
studies
exist
that
show
the
eﬀect

♣
increased
sales,
changes
in
sales
behavior

- 10 -

Problem
domain

♣

Recommenda;on
systems
(RS)
help
to
match
users
with
items

-‐
Ease
informa8on
overload

-‐
Sales
assistance
(guidance,
advisory,
persuasion,…)

RS
are
so)ware
agents
that
elicit
the
interests
and
preferences
of
individual
consumers
[…]
and
make
recommenda<ons
accordingly.

They
have
the
poten<al
to
support
and
improve
the
quality
of
the

decisions
consumers
make
while
searching
for
and
selec<ng
products
online.

»
[Xiao
&
Benbasat,
MISQ, 2007]

♣

Diﬀerent
system
designs
/
paradigms

-‐
Based
on
availability
of
exploitable
data

-‐
Implicit
and
explicit
user
feedback

-‐
Domain
characteris8cs

- 11 -

Recommender
systems

♣

RS
seen
as
a
func;on
[AT05]
♣

Given:

-‐
User
model
(e.g.
ra8ngs,
preferences,
demographics,
situa8onal
context)

-‐
Items
(with
or
without
descrip8on
of
item
characteris8cs)
♣

Find:

-‐
Relevance
score.
Used
for
ranking.
♣

Finally:

-‐
Recommend
items
that
are
assumed
to
be
relevant
♣

But:

-‐
Remember
that
relevance
might
be
context-‐dependent

-‐
Characteris8cs
of
the
list
itself
might
be
important
(diversity)

- 12 -

Paradigms
of
recommender
systems

Recommender
systems
reduce

informa;on
overload
by
es;ma;ng
relevance

- 13 -

Paradigms
of
recommender
systems

Personalized
recommenda;ons

- 14 -

Paradigms
of
recommender
systems

Collabora;ve:
"Tell
me
what's
popular
among
my
peers"

- 15 -

Paradigms
of
recommender
systems

Content-‐based:
"Show
me
more
of
the
same
what
I've
liked"

- 16 -

Paradigms
of
recommender
systems

Knowledge-‐based:
"Tell
me
what
ﬁts
based
on
my
needs"

- 17 -

Paradigms
of
recommender
systems

Hybrid:
combina;ons
of
various
inputs
and/or
composi;on
of
diﬀerent
mechanism

- 18 -

Recommender
systems:
basic
techniques

Pros

Cons

Collabora8ve

No
knowledge-‐

Requires
some
form
of
ra8ng

engineering
eﬀort,

feedback,
cold
start
for
new
users

serendipity
of
results,

and
new
items

learns
market
segments

Content-‐based

No
community
required,

Content
descrip8ons
necessary,

comparison
between

cold
start
for
new
users,
no

items
possible

surprises

Knowledge-‐based

Determinis8c

recommenda8ons,
assured
quality,
no
cold-‐
start,
can
resemble
sales
dialogue

Knowledge
engineering
eﬀort
to
bootstrap,
basically
sta8c,
does
not
react
to
short-‐term
trends

- 19 -

Collabora;ve
Filtering
(CF)

♣

The
most
prominent
approach
to
generate
recommenda;ons

-‐
used
by
large,
commercial
e-‐commerce
sites

-‐
well-‐understood,
various
algorithms
and
varia8ons
exist

-‐
applicable
in
many
domains
(book,
movies,
DVDs,
..)
♣

Approach

-‐
use
the
"wisdom
of
the
crowd"
to
recommend
items
♣

Basic
assump;on
and
idea

-‐
Users
give
ra8ngs
to
catalog
items
(implicitly
or
explicitly)

-‐
Customers
who
had
similar
tastes
in
the
past,
will
have
similar
tastes
in
the

future

- 21 -

User-‐based
nearest-‐neighbor
collabora;ve
ﬁltering
(1)

♣
The
basic
technique:

-‐
Given
an
"ac8ve
user"
(Alice)
and
an
item
I
not
yet
seen
by
Alice

-‐
The
goal
is
to
es<mate
Alice's
ra<ng
for
this
item,
e.g.,
by

♣
ﬁnd
a
set
of
users
(peers)
who
liked
the
same
items
as
Alice
in
the
past
and

who
have
rated
item
I

♣
use,
e.g.
the
average
of
their
ra8ngs
to
predict,
if
Alice
will
like
item
I
♣
do
this
for
all
items
Alice
has
not
seen
and
recommend
the
best-‐rated

Item1

Item2

Item3

Item4

Item5

Alice

5

3

4

4

?

User1

3

1

2

3

3

User2

4

3

4

3

5

User3

3

3

1

5

4

User4

1

5

5

2

1

- 24 -

User-‐based
nearest-‐neighbor
collabora;ve
ﬁltering
(2)

♣

Some
ﬁrst
ques;ons

-‐
How
do
we
measure
similarity?

-‐
How
many
neighbors
should
we
consider?

-‐
How
do
we
generate
a
predic8on
from
the
neighbors'
ra8ngs?

Item1

Item2

Item3

Item4

Item5

Alice

5

3

4

4

?

User1

3

1

2

3

3

User2

4

3

4

3

5

User3

3

3

1

5

4

User4

1

5

5

2

1

- 25 -

Measuring
user
similarity

♣

A
popular
similarity
measure
in
user-‐based
CF:
Pearson
correla;on

a,
b

:
users

ra,p

:
ra8ng
of
user
a
for
item
p

P

:
set
of
items,
rated
both
by
a
and
b

Possible
similarity
values
between
-‐1
and
1;

!!, !!=
user's
average
ra8ngs

Item1

Alice
5

User1
3

Item2
Item3

3
4

1
2

Item4
Item5

4
?

3
3

sim
=
0,85
sim
=
0,70

sim
=
-‐0,79

User2

4

3

4

3

5

User3

3

3

1

5

4

User4

1

5

5

2

1

- 26 -

Pearson
correla;on

♣

Takes
diﬀerences
in
ra;ng
behavior
into
account

6

5

4

Ratings

3

2

1

0

Item1 Item2

Alice

User1

User4

Item3 Item4

♣

Works
well
in
usual
domains,
compared
with
alterna;ve
measures

-‐
such
as
cosine
similarity

- 27 -

Making
predic;ons

♣

A
common
predic8on
func8on:

♣

Calculate,
whether
the
neighbors'
ra8ngs
for
the
unseen
item
i
are
higher

or
lower
than
their
average

♣

Combine
the
ra8ng
diﬀerences
-‐
use
the
similarity
as
a
weight

♣

Add/subtract
the

neighbors'
bias
from
the
ac8ve
user's
average
and
use

this
as
a
predic8on

- 28 -

Making
recommenda;ons

♣

Making
predic;ons
is
typically
not
the
ul;mate
goal
♣

Usual
approach
(in
academia)

-‐
Rank
items
based
on
their
predicted
ra8ngs
♣

However

-‐
This
might
lead
to
the
inclusion
of
(only)
niche
items

-‐
In
prac;ce
also:
Take
item
popularity
into
account
♣

Approaches

-‐
"Learning
to
rank"

♣
Op8mize
according
to
a
given
rank
evalua8on
metric
(see
later)

- 29 -

Improving
the
metrics

/
predic;on
func;on

♣

Not
all
neighbor
ra;ngs
might
be
equally
"valuable"

-‐
Agreement
on
commonly
liked
items
is
not
so
informa8ve
as
agreement
on

controversial
items

-‐
Possible
solu;on:

Give
more
weight
to
items
that
have
a
higher
variance
♣

Value
of
number
of
co-‐rated
items

-‐
Use
"significance
weigh8ng",
by
e.g.,
linearly
reducing
the
weight
when
the

number
of
co-‐rated
items
is
low

♣

Case
amplifica;on

-‐
Intui8on:
Give
more
weight
to
"very
similar"
neighbors,
i.e.,
where
the

similarity
value
is
close
to
1.

♣

Neighborhood
selec;on

-‐
Use
similarity
threshold
or
fixed
number
of
neighbors

- 30 -

Memory-‐based
and
model-‐based
approaches

♣

User-‐based
CF
is
said
to
be
"memory-‐based"

-‐
the
ra8ng
matrix
is
directly
used
to
ﬁnd
neighbors
/
make
predic8ons

-‐
does
not
scale
for
most
real-‐world
scenarios

-‐
large
e-‐commerce
sites
have
tens
of
millions
of
customers
and
millions
of

items

♣

Model-‐based
approaches

-‐
based
on
an
oﬄine
pre-‐processing
or
"model-‐learning"
phase

-‐
at
run-‐8me,
only
the
learned
model
is
used
to
make
predic8ons

-‐
models
are
updated
/
re-‐trained
periodically

-‐
large
variety
of
techniques
used

-‐
model-‐building
and
upda8ng
can
be
computa8onally
expensive

- 31 -

Item-‐based
collabora;ve
ﬁltering

♣

Basic
idea:

-‐
Use
the
similarity
between
items
(and
not
users)
to
make
predic8ons
♣

Example:

-‐
Look
for
items
that
are
similar
to
Item5

-‐
Take
Alice's
ra8ngs
for
these
items
to
predict
the
ra8ng
for
Item5

Item1

Item2

Item3

Item4

Item5

Alice

5

3

4

4

?

User1

3

1

2

3

3

User2

4

3

4

3

5

User3

3

3

1

5

4

User4

1

5

5

2

1

- 33 -

The
cosine
similarity
measure

♣

Produces
beber
results
in
item-‐to-‐item
ﬁltering

-‐
for
some
datasets,
no
consistent
picture
in
literature

♣

Ra;ngs
are
seen
as
vector
in
n-‐dimensional
space

♣

Similarity
is
calculated
based
on
the
angle
between
the
vectors

♣

Adjusted
cosine
similarity

-‐
take
average
user
ra8ngs
into
account,
transform
the
original
ra8ngs

- 34 -

Pre-‐processing
for
item-‐based
filtering

♣

Item-‐based
filtering
does
not
solve
the
scalability
problem
itself
♣

Pre-‐processing
approach
by
Amazon.com
(in
2003)

-‐
Calculate
all
pair-‐wise
item
similari8es
in
advance

-‐
The
neighborhood
to
be
used
at
run-‐8me
is
typically
rather
small,
because

only
items
are
taken
into
account
which
the
user
has
rated

-‐
Item
similari8es
are
supposed
to
be
more
stable
than
user
similari8es
♣

Memory
requirements

-‐
Up
to
N2
pair-‐wise
similari8es
to
be
memorized
(N
=
number
of
items)
in

theory

-‐
In
prac8ce,
this
is
significantly
lower
(items
with
no
co-‐ra8ngs)

-‐
Further
reduc8ons
possible

♣
Minimum
threshold
for
co-‐ra8ngs
(items,
which
are
rated
at
least
by
n
users)
♣
Limit
the
size
of
the
neighborhood
(might
affect
recommenda8on
accuracy)

- 35 -

More
on
ra;ngs

♣

Pure
CF-‐based
systems
only
rely
on
the
ra;ng
matrix
♣

Explicit
ra;ngs

-‐
Most
commonly
used
(1
to
5,
1
to
7
Likert
response
scales)

-‐
Research
topics

♣
"Op8mal"
granularity
of
scale;
indica8on
that
10-‐point
scale
is
berer
accepted
in

movie
domain

♣
Mul8dimensional
ra8ngs
(mul8ple
ra8ngs
per
movie)

-‐
Challenge

♣
Users
not
always
willing
to
rate
many
items;
sparse
ra8ng
matrices
♣
How
to
s8mulate
users
to
rate
more
items?

♣

Implicit
ra;ngs

-‐
clicks,
page
views,
8me
spent
on
some
page,
demo
downloads
…

-‐
Can
be
used
in
addi8on
to
explicit
ones;
ques8on
of
correctness
of
interpreta8on

- 36 -

Data
sparsity
problems

♣

Cold
start
problem

-‐
How
to
recommend
new
items?
What
to
recommend
to
new
users?
♣

Straigheorward
approaches

-‐
Ask/force
users
to
rate
a
set
of
items

-‐
Use
another
method
(e.g.,
content-‐based,
demographic
or
simply
non-‐

personalized)
in
the
ini8al
phase

♣

Alterna;ves

-‐
Use
berer
algorithms
(beyond
nearest-‐neighbor
approaches)

-‐
Example:

♣
In
nearest-‐neighbor
approaches,
the
set
of
suﬃciently
similar
neighbors
might

be
to
small
to
make
good
predic8ons

♣
Assume
"transi8vity"
of
neighborhoods

- 37 -

Example
algorithms
for
sparse
datasets

♣

Recursive
CF

-‐
Assume
there
is
a
very
close
neighbor
n
of
u
who
however
has
not
rated
the

target
item
i
yet.

-‐
Idea:

♣
Apply
CF-‐method
recursively
and
predict
a
ra8ng
for
item
i
for
the
neighbor
♣
Use
this
predicted
ra8ng
instead
of
the
ra8ng
of
a
more
distant
direct

neighbor

Item1

Item2

Item3

Item4

Item5

Alice

5

3

4

4

?

sim
=
0,85

User1

3

1

2

3

?

User2
4
3

4
3
5

Predict

User3
3
3

User4
1
5

1
5
4
ra8ng
for

User1

5
2
1

- 38 -

Graph-‐based
methods

♣

"Spreading
ac;va;on"
(sketch)

-‐
Idea:
Use
paths
of
lengths
>
3

to
recommend
items

-‐
Length
3:
Recommend
Item3
to
User1

-‐
Length
5:
Item1
also
recommendable

- 39 -

More
model-‐based
approaches

♣

Plethora
of
diﬀerent
techniques
proposed
in
the
last
years,
e.g.,

-‐
Matrix
factoriza8on
techniques,
sta8s8cs

♣
singular
value
decomposi8on,
principal
component
analysis

-‐
Associa8on
rule
mining

♣
compare:
shopping
basket
analysis

-‐
Probabilis8c
models

♣
clustering
models,
Bayesian
networks,
probabilis8c
Latent
Seman8c
Analysis

-‐
Various
other
machine
learning
approaches

♣

Costs
of
pre-‐processing

-‐
Usually
not
discussed

-‐
Incremental
updates
possible?

- 40 -

Matrix
factoriza;on

•

SVD:

Uk
Dim1

Alice
0.47

M

k

Dim2

-‐0.30

T

=U×Σ×V

k k k

T

Vk

Dim1

-‐0.44
-‐0.57
0.06
0.38
0.57

Bob
-‐0.44
0.23

Dim2
0.58
-‐0.66

0.26
0.18
-‐0.36

Mary
0.70
-‐0.06

Sue
0.31
0.93

Σ
Dim1
Dim2

k

•

Predic;on:

r

ui

=

r +U (Alice)×Σ

u k

T

×V (EPL)

k k

Dim1
5.63
0

Dim2
0
3.23

=
3
+
0.84
=
3.84

- 43 -

Associa;on
rule
mining

♣

Commonly
used
for
shopping
behavior
analysis

-‐
aims
at
detec8on
of
rules
such
as

"If
a
customer
purchases
baby-‐food
then
he
also
buys
diapers
in
70%
of
the
cases"

♣

Associa;on
rule
mining
algorithms

-‐
can
detect
rules
of
the
form
X
=>
Y
(e.g.,
baby-‐food
=>
diapers)
from
a
set
of

sales
transac8ons
D
=
{t1,
t2,
…
tn}

-‐
measure
of
quality:
support,
conﬁdence

- 44 -

Probabilis;c
methods

♣

Basic
idea
(simplis;c
version
for
illustra;on):

-‐
given
the
user/item
ra8ng
matrix

-‐
determine
the
probability
that
user
Alice
will
like
an
item
i

-‐
base
the
recommenda8on
on
such
these
probabili8es

♣

Calcula;on
of
ra;ng
probabili;es
based
on
Bayes
Theorem

-‐
How
probable
is
ra8ng
value
"1"
for
Item5
given
Alice's
previous
ra8ngs?

-‐
Corresponds
to
condi8onal
probability
P(Item5=1
|
X),
where
♣
X
=
Alice's
previous
ra8ngs
=
(Item1
=1,
Item2=3,
Item3=
…
)

-‐
Can
be
es8mated
based
on
Bayes'
Theorem
♣

Usually
more
sophis;cated
methods
used

-‐
Clustering

-‐
pLSA
…

- 45 -

Summarizing
recent
methods

♣

Recommenda;on
is
concerned
with
learning
from
noisy
observa;ons

(x,
y),
where

f(x)=ŷ

2

has
to
be
determined
such

that
is
minimal.

∑

ˆ

( ˆ -y)

♣

A
variety
of
diﬀerent
learning
strategies
have
been
applied
trying
to

es;mate
f(x)

-‐
Non
parametric
neighborhood
models

-‐
MF
models,
SVMs,
Neural
Networks,
Bayesian
Networks,…

- 48 -

Collabora;ve
Filtering
Issues

♣

Pros:

-‐

well-‐understood,
works
well
in
some
domains,
no
knowledge
engineering
required

♣

Cons:

-‐

requires
user
community,
sparsity
problems,
no
integra8on
of
other
knowledge
sources,

no
explana8on
of
results

♣

What
is
the
best
CF
method?

-‐

In
which
situa8on
and
which
domain?
Inconsistent
ﬁndings;
always
the
same
domains

and
data
sets;
diﬀerences
between
methods
are
o|en
very
small
(1/100)

♣

How
to
evaluate
the
predic;on
quality?

-‐

MAE
/
RMSE:
What
does
an
MAE
of
0.7
actually
mean?

-‐

Serendipity:
Not
yet
fully
understood

♣

What
about
mul;-‐dimensional
ra;ngs?

- 49 -

Recommender
Systems
in
e-‐Commerce

♣

One
Recommender
Systems
research
ques;on

-‐
What
should
be
in
that
list?

- 51 -

Recommender
Systems
in
e-‐Commerce

♣

Another
ques;on
both
in
research
and
prac;ce

-‐
How
do
we
know
that
these
are
good

recommenda8ons?

- 52 -

Recommender
Systems
in
e-‐Commerce

♣

This
might
lead
to
…

-‐

What
is
a
good
recommenda8on?

-‐

What
is
a
good
recommenda8on
strategy?

-‐

What
is
a
good
recommenda8on
strategy
for
my

business?

These have been in stock for quite a while now …

- 53 -

What
is
a
good
recommenda;on?

What
are
the
measures
in
prac;ce?

♣

Total
sales
numbers

♣

Promo;on
of
certain
items

♣ …

♣

Click-‐through-‐rates

♣

Interac;vity
on
plaeorm

♣ …

♣

Customer
return
rates

♣

Customer
sa;sfac;on
and
loyalty

- 54 -

You
have
Ques8ons
and
we
have
Answers

Recommender Systems, Part 1 - Introduction to approaches and algorithms

Recommended

Recommended

More Related Content

Similar to Recommender Systems, Part 1 - Introduction to approaches and algorithms

Similar to Recommender Systems, Part 1 - Introduction to approaches and algorithms (20)

More from Ramzi Alqrainy

More from Ramzi Alqrainy (20)

Recently uploaded

Recently uploaded (20)

Recommender Systems, Part 1 - Introduction to approaches and algorithms