Bootstrapping Recommendations with Neo4j

Bootstrapping
Recommendations 
with
Neo4j
Big
Data
TechCon

About
Me
• Max
De
Marzi
-‐
Neo4j
Field
Engineer

• My
Blog:
http://maxdemarzi.com

• Find
me
on
Twitter:
@maxdemarzi

• Email
me:
maxdemarzi@gmail.com

• GitHub:
http://github.com/maxdemarzi

Big
Data
-‐
What
is
it
good
for?
• Absolutely
Nothing! 
• Benchmarks 
Is
this
performing
better
then
that?
Yes,
why?
Uh.

• Recommendations 
You
should
buy
this
right
now.

• Predictions 
You
will
probably
buy
this.

Top
10
Recommendations
• Popularity 
 
The
naive
approach 
 
One
size
fits
most

Naive
Approach
I’m
getting
little
Timmy
some

“Cards
Against
Humanity”

Content
Based
Recommendations
• Step
1:
Collect
Item
Characteristics

• Step
2:
Find
similar
Items

• Step
3:
Recommend
Similar
Items

• Example:
Similar
Movie
Genres

There
is
more
to
life
than
Romantic
Zombie-‐coms

Collaborative
Filtering
Recommendations
• Step
1:
Collect
User
Behavior

• Step
2:
Find
similar
Users

• Step
3:
Recommend
Behavior
taken
by
similar
users

• Example:
People
with
similar
musical
tastes

You
are
so
original!

Using
Relationships
for
Recommendations
Content-‐based
filtering

Recommend
items
based
on
what
users

have
liked
in
the
past

Collaborative
filtering

Predict
what
users
like
based
on
the

similarity
of
their
behaviors,
activities

and
preferences
to
others

Movie
Person
Person
RATED
SIMILARITY
rating:
7
value:
.92

Hybrid
Recommendations
• Combine
the
two
for

better
results

• Like
Peanut
Butter
and

Jelly

Benefits
of
Real-‐Time
Recommendations
Online
Retail

• Suggest
related
products
and
services

• Increase
revenue
and
engagement

Media
and
Broadcasting

• Create
an
engaging
experience

• Produce
personalized
content
and
offers

Logistics

• Recommend
optimal
routes

• Increase
network
efficiency

Challenges
for
Real-‐Time
Recommendations
Make
effective
real-‐time
recommendations

• Timing
is
everything
in
point-‐of-‐touch
applications

• Base
recommendations
on
current
data,
not
last
night’s
batch
load

Process
large
amounts
of
data
and
relationships
for
context

• Relevance
is
king:
Make
the
right
connections

• Drive
traffic:
Get
users
to
do
more
with
your
application

Accommodate
new
data
and
relationships
continuously

• Systems
get
richer
with
new
data
and
relationships

• Recommendations
become
more
relevant

Relational
vs.
Graph
Models
Relational
Model Graph
Model
RATED
RATED
RATED
MAX
Person MovieRatings
MAX
Terminator
Toy
Story
Titanic

Cypher
Query
Language
MATCH
(:Person
{
name:“Dan”}
)
-‐[:KNOWS]-‐>
(:Person
{
name:“Ann”}
)

KNOWS
Dan Ann
Label Property Label Property
Node Node

MATCH
(boss)-‐[:MANAGES*0..3]-‐>(sub),

(sub)-‐[:MANAGES*1..3]-‐>(report)

WHERE
boss.name
=
“John
Doe”

RETURN
sub.name
AS
Subordinate,
 

count(report)
AS
Total
Express
Complex
Queries
Easily
with
Cypher
Find
all
direct
reports
and
 
how
many
people
they
manage,
 
up
to
3
levels
down
Cypher
QuerySQL
Query

Hello
World
Recommendation

Cypher
Query:
Movie
Recommendation
MATCH
(watched:Movie
{title:"Toy
Story”})
<-‐[r1:RATED]-‐
()
-‐[r2:RATED]-‐>
(unseen:Movie)

WHERE
r1.rating
>
7
AND
r2.rating
>
7

AND
watched.genres
=
unseen.genres

AND
NOT(
(:Person
{username:”maxdemarzi"})
-‐[:RATED|WATCHED]-‐>
(unseen)
)

RETURN
unseen.title,
COUNT(*)

ORDER
BY
COUNT(*)
DESC

LIMIT
25
What
are
the
Top
25
Movies

• that
I
haven't
seen

• with
the
same
genres
as
Toy
Story

• given
high
ratings

• by
people
who
liked
Toy
Story

Let’s
try
k-‐nearest
neighbors
(k-‐NN)
Cosine
Similarity

Cypher
Query:
Ratings
of
Two
Users
MATCH

(p1:Person
{name:'Michael
Sherman’})
-‐[r1:RATED]-‐>
(m:Movie),

(p2:Person
{name:'Michael
Hunger’})
-‐[r2:RATED]-‐>
(m:Movie)

RETURN
m.name
AS
Movie,
 

r1.rating
AS
`M.
Sherman's
Rating`,

r2.rating
AS
`M.
Hunger's
Rating`
What
are
the
Movies
these
2
users
have
both
rated

Cypher
Query:
Ratings
of
Two
Users
Calculating
Cosine
Similarity

Cypher
Query:
Cosine
Similarity

MATCH
(p1:Person)
-‐[x:RATED]-‐>
(m:Movie)
<-‐[y:RATED]-‐
(p2:Person)

WITH

SUM(x.rating
*
y.rating)
AS
xyDotProduct,

SQRT(REDUCE(xDot
=
0.0,
a
IN
COLLECT(x.rating)
|
xDot
+
a^2))
AS
xLength,

SQRT(REDUCE(yDot
=
0.0,
b
IN
COLLECT(y.rating)
|
yDot
+
b^2))
AS
yLength,

p1,
p2

MERGE
(p1)-‐[s:SIMILARITY]-‐(p2)

SET

s.similarity
=
xyDotProduct
/
(xLength
*
yLength)
Calculate
it
for
all
Person
nodes
with
at
least
one
Movie
between
them

Cypher
Query:
Your
nearest
neighbors
MATCH
(p1:Person
{name:'Grace
Andrews’})
-‐[s:SIMILARITY]-‐
(p2:Person)

WITH

p2,
s.score
AS
sim

ORDER
BY
sim
DESC

LIMIT
5

RETURN

p2.name
AS
Neighbor,
sim
AS
Similarity
Who
are
the

• top
5
Persons
and
their
similarity
score

• ordered
by
similarity
in
descending
order

• for
Grace
Andrews

Your
nearest
neighbors

Cypher
Query:
k-‐NN
Recommendation
MATCH
(m:Movie)
<-‐[r:RATED]-‐
(b:Person)
-‐[s:SIMILARITY]-‐
(p:Person
{name:'Zoltan
Varju'})

WHERE
NOT(
(p)
-‐[:RATED]-‐>
(m)
)

WITH
m,
s.similarity
AS
similarity,
r.rating
AS
rating

ORDER
BY
m.name,
similarity
DESC

WITH
m.name
AS
movie,
COLLECT(rating)[0..3]
AS
ratings

WITH
movie,
REDUCE(s
=
0,
i
IN
ratings
|
s
+
i)*1.0
/
LENGTH(ratings)
AS
recommendation

ORDER
BY
recommendation
DESC

RETURN
movie,
recommendation 
LIMIT
25
What
are
the
Top
25
Movies

• that
Zoltan
Varju
has
not
seen

• using
the
average
rating

• by
my
top
3
neighbors

Recommendations
over
Searching/Browsing

Recommend
Jobs
to
Job
Seekers
What
connects
them?

• location

• skills

• education

• experience

Cypher
Query:
Job
Recommendation
What
are
the
Top
10
Jobs
for
me

• that
are
in
the
same
location
I’m
in

• for
which
I
have
the
necessary
qualifications

Job
Recommendation
Results
Perfect
Candidate
for
100%
matches

• missing
qualifications
can
be
added
quickly

• might
encourage
exaggerated
resumes

Just
one
tiny
itsy
bitsy
problem
Job
Boards
get
paid
by

• Number
of
Applicants
to
a
Job

• Wholesale
Resume
sales

• Selling
your
data

Recommend
Love
Find
your
soulmate
in
the
graph

• Are
they
energetic?

• Do
they
like
dogs?

• Have
a
good
sense
of
humor?

• Neat
and
tidy,
but
not
crazy
about
it?
What
are
the
Top
10
Potential
Mates
for
me

• that
are
in
the
same
location

• are
sexually
compatible

• have
traits
I
want

• want
traits
I
have

Cypher
Query:
Love
Recommendation

Love
Recommendation
Results

Linked
Data
Connect
to
the

Semantic
Web

graphipedia
https://github.com/mirkonasato/graphipedia

neo4j-‐dbpedia-‐importer
https://github.com/kbastani/neo4j-‐dbpedia-‐importer

Named
Entity
Recognition
Automatically
find

• names
of
people

• place
and
locations

• products

• and
organizations

Hacker
News
for
Example
• What
are
the
kids
in

silicon
valley
talking

about?

Let’s
find
out
• They
have
an
API!

• Get
some
data: 
Stories 
Users 
Authors 
Commenters

Hacker
News
Recommendations
• Which
stories
should
I
read?

• Which
users
should
I
follow?

• What
else
should
I
be
interested
in?

• Who
seems
to
know
a
lot
about
X?

• Etc.

GraphAware
Recommendation
Framework
• Ability
to
trade
off
recommendation
quality
for
speed

• Ability
to
pre-‐compute
recommendations

• Built-‐in
algorithms
and
functions

• Ability
to
measure
recommendation
quality

• Ability
to
easily
run
in
A/B
test
environments

Real-‐Time
Recommendations
with
Neo4j
Social 
Recommendations
Products
 
and
Services
Content Routing

Walmart

BUSINESS
CASE
World’s
largest
company 
by
revenue

World’s
largest
retailer
and

private
employer

SF-‐based
global
 
e-‐commerce
division

manages
several
websites

Found
in
1969 
Bentonville,
Arkansas

• Needed
online
customer
recommendations
to

keep
pace
with
competition

• Data
connections
provided
predictive
context,
but

were
not
in
a
usable
format

• Solution
had
to
serve
many
millions
of
customers

and
products
while
maintaining
superior

scalability
and
performance

Walmart

SOLUTION
• Brings
customers,
preferences,
purchases,

products
and
locations
into
a
graph
model

• Uses
connections
to
make
product

recommendations

• Solution
deployed
across
WalMart
 
divisions
and
websites

Global
Courier

BUSINESS
CASE
World’s
largest
courier

480,000
employees 
€55
billion
in
revenue

Needed
new
 
B2C
and
B2B
parcel
routing

system
for
its
logistics

practice

Legacy
system
neither

supported
the
full
network

nor
the
shift
to
online

demands
Needed
to
replace
aging
B2B
and
B2C
parcel
routing

system
whose
requirements
include:

• 24x7
availability

• Peak
loads
of
5M
parcels
per
day,
3K
per
second

• Support
for
complex
and
diverse
software
stack

• Predictable
performance
with
linear
scalability

• Daily
changes
to
logistics
networks

• Route
from
any
point
to
any
point

• Single
point
of
truth
for
entire
network

Global
Courier

SOLUTION
Neo4j
provides
the
ideal
domain
fit
since
 
a
logistics
network
is
a
graph

• High
availability
and
performance
via
Neo4j

clustering

• Greatly
simplified
Cypher
queries
for
routing

versus
relational
SQL
queries

• Flexible
data
model
that
reflects
the
real

logistics
world
far
better
than
relational

• Easy-‐to-‐grasp
whiteboard-‐friendly
model

eBay

BUSINESS
CASE
C2C
and
B2C 
retail
network

Full
e-‐commerce

functionality
for
individuals

and
businesses

Integrated
with
logistics

vendors
for
product

deliveries
• Needed
an
offering
to
compete
with
 
Amazon
Prime

• Enable
customer-‐selected
delivery
inside
 
90
minutes

• Calculate
best
route
option
in
real-‐time

• Scale
to
enable
a
variety
of
services

• Offer
more
predictable
delivery
times

eBay
Now

SOLUTION
• Acquired
UK-‐based
Shutl.
a
leader

in
same-‐day
delivery

• Used
Neo4j
to
create
eBay
Now

• 1000
times
faster
than
the
prior
 
MySQL-‐based
solution

• Faster
time-‐to-‐market

• Improved
code
quality
with
 
10
to
100
times
less
query
code

Classmates

BUSINESS
CASE
Online
yearbook

connecting
friends
from

school,
work
and
military

in
US
and
Canada

Founded
as
 
Memory
Lane
in
Seattle

Develop
new
social
networking
capabilities
to

monetize
yearbook-‐related
offerings

• Show
all
the
people
I
know
in
a
yearbook

• Show
yearbooks
my
friends
appear
in
most
often

• Show
sections
of
a
yearbook
that
my
friends

appear
most
in

• Show
me
other
schools
my
friends
attended

Classmates

SOLUTION
Neo4j
provides
a
robust
and
scalable
graph

database
solution

• 3-‐instance
cluster
with
cache
sharding
and

disaster-‐recovery

• 18ms
response
time
for
top
4
queries

• 100M
nodes
and
600M
relationships
in

initial
graph—including
people,
images,

schools,
yearbooks
and
pages

• Projected
to
grow
to
1B
nodes
and
6B

relationships

National
Geographic

BUSINESS
CASE
Non-‐profit
scientific
and

educational
institution

founded
in
1888

Covers
geography,

archaeology,
natural
science,

environment
and
historical

conservation

Journals,
online
media,
 
radio,
TV,
documentaries,
 
live
events
and
consumer

content
and
goods
• Improve
poor
performance
of
PostgreSQL
app

• Increase
user
engagement
by
linking
to
100+
years

of
multimedia
content

• Improve
targeting
by
understand
subscribers’

interests
better

• Recommend
content
and
services
to
users
based

on
their
interests

National
Geographic

SOLUTION
• Enabled
complex
real-‐time
analytics
across

eight
million
users
and
a
century
of
content

• Delivered
robust
performance
by
eliminating

triple-‐nested
SQL
joins

• Cross-‐refers
users
among
content,
live
events,

travel,
goods
and
causes

• Neo4j
solution
much
less
cumbersome
 
and
easier
to
maintain
than
previous
 
SQL
system

Curaspan

BUSINESS
CASE
Leader
in
patient

management
for
discharges

and
referrals

Manages
patient
referrals

4600+
health
care
facilities

Connects
providers,
payers

via
web-‐based
patient

management
platform

Founded
in
1999
in

Newton,
Massachusetts
• Improve
poor
performance
of
Oracle
solution

• Support
more
complexity
including
granular,
 
role-‐based
access
control

• Satisfy
complex
Graph
Search
queries
by
discharge

nurses
and
intake
coordinators

Find
a
skilled
nursing
facility
within
n
miles
of
a

given
location,
belonging
to
health
care
group

XYZ,
offering
speech
therapy
and
cardiac
care,

and
optionally
Italian
language
services

Curaspan

SOLUTION
• Met
fast,
real-‐time
performance
demands

• Supported
queries
span
multiple
hierarchies

including
provider
and
employee-‐permissions

graphs

• Improved
data
model
to
handle
adding
more

dimensions
to
the
data
such
as
insurance

networks,
service
areas
and
care
organizations

• Greatly
simplified
queries,
simplifying
 
multi-‐page
SQL
statements
into
one
 
Neo4j
function

FiftyThree

BUSINESS
CASE
Maker
of
Paper,
 
one
of
the
top
apps
 
in
Apple’s
App
Store,
with

millions
of
users

Based
in
New
York
City
• Add
social
capabilities
to
digital-‐paper
app

• Support
social
collaboration
across
millions
of

users
in
new
Mix
app

• Enable
seamless
interaction
between
social

and
content-‐asset
networks

• Ensure
new
apps
are
robust,
scalable
and
fast

FiftyThree

SOLUTION
• Neo4j
data
model
ideal
for
social
network,
content

management
and
access
control

• Users
create,
publish
and
share
designs
simply

• Easy
to
develop
and
evolve
Neo4j-‐based
app

• Integrates
well
with
FiftyThree
EC2
architecture

See
the
Neo4j
solution
in
action

Betting
the
Company
(Literally)
on
a
Graph
Database 
http://aseemk.com/talks/neo4j-‐lessons-‐learned#/
App
Store
Editor’s
Choice 
2012
iPad
App
of
Year 
Apple
Best
Apps
of
2014

Questions
• How
does
Neo4j
fit
into
my
existing

infrastructure? 
As
a
Service.

• Will
Neo4j
scale? 
Yes.

Bootstrapping Recommendations with Neo4j

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Bootstrapping Recommendations with Neo4j

Similar to Bootstrapping Recommendations with Neo4j (20)

More from Max De Marzi

More from Max De Marzi (20)

Recently uploaded

Recently uploaded (20)

Bootstrapping Recommendations with Neo4j