Neo4j - Graph database for
recommendations
30.5.2013 Jakub Kříž, Ondrej Proksa
Summary
 Graph databases
 Working with Neo4j and Ruby (On Rails)
 Plugins and algorithms – live demos
Document similarity
Movie recommendation
Recommendation from subgraph
 TeleVido.tv
Why Graphs?
 Graphs are everywhere!
Natural way to model almost everything
“Whiteboard friendly”
Even the internet is a graph
Why Graph Databases?
 Relational databases are not so great for
storing graph structures
Unnatural m:n relations
Expensive joins
Expensive look ups during graph traversals
 Graph databases fix this
Efficient storage
Direct pointers = no joins
Neo4j
 The World's Leading Graph Database
 www.neo4j.org
 NOSQL database
 Open source - github.com/neo4j
 ACID
 Brief history
Official v1.0 – 2010
Current version 1.9
2.0 coming soon
Querying Neo4j
 Querying languages
Structurally similar to SQL
Based on graph traversal
 Most often used
Gremlin – generic graph querying language
Cypher – graph querying language for Neo4j
SPARQL – generic querying language for data in
RDF format
Cypher Example
CREATE (n {name: {value}})
CREATE (n)-[r:KNOWS]->(m)
START
[MATCH]
[WHERE]
RETURN [ORDER BY] [SKIP] [LIMIT]
Cypher Example (2)
 Friend of a friend
START n=node(0)
MATCH (n)--()--(f)
RETURN f
Working with Neo4j
 REST API => wrappers
Neography for Ruby
py2neo for Python
…
Your own wrapper
 Java API
Direct access in JVM based applications
neo4j.rb
Neography – API wrapper example
# create nodes and properties
n1 = Neography::Node.create("age" => 31, "name" => "Max")
n2 = Neography::Node.create("age" => 33, "name" => "Roel")
n1.weight = 190
# create relationships
new_rel = Neography::Relationship.create(:coding_buddies, n1, n2)
n1.outgoing(:coding_buddies) << n2
# get nodes related by outgoing friends relationship
n1.outgoing(:friends)
# get n1 and nodes related by friends and friends of friends
n1.outgoing(:friends).depth(2).include_start_node
Neo4j.rb – JRuby gem example
class Person < Neo4j::Rails::Model
property :name
property :age, :index => :exact # :fulltext
has_n(:friends).to(Person).relationship(Friend)
end
class Friend < Neo4j::Rails::Relationship
property :as
end
mike = Person.new(:name => ‘Mike’, :age => 24)
john = Person.new(:name => ‘John’, :age => 27)
mike.friends << john
mike.save
Our Approach
 Relational databases are not so bad
Good for basic data storage
Widely used for web applications
Well supported in Rails via ActiveRecord
Performance issues with Neo4j
 However, we need a graph database
We model the domain as a graph
Our recommendation is based on graph traversal
Our Approach (2)
 Hybrid model using both MySQL and Neo4j
 MySQL contains basic information about
entities
 Neo4j contains only
relationships
 Paired via
identifiers (neo4j_id)
Our Approach (3)
 Recommendation algorithms
Made as plugins to Neo4j
Written in Java
Embedded into Neo4j API
 Rails application uses custom made wrapper
Creates and modifies nodes and relationships via
API calls
Handles recommendation requests
Graph Algorithms
 Built-in algorithms
Shortest path
All shortest paths
Dijkstra’s algorithm
 Custom algorithms
Depth first search
Breadth first search
Spreading activation
Flows, pairing, etc.
Document Similarity
 Task: find similarities between documents
 Documents data model:
Each document is made of sentences
Each sentence can be divided into n-grams
N-grams are connected with relationships
Neo4J is graph database in Java
(Neo4j, graph) – (graph, database) – (database, Java)
Document Similarity (2)
 Detecting similar documents in our graph
model
Shortest path between documents
Number of paths shorter than some distance
Weighing relationships
 How about a custom plugin?
Spreading activation
Document Similarity (3)
Live Demo…
Document Similarity (4)
 Task: recommend movies based on what
we like
 We like some entities, let’s call them initial
Movies
People (actors, directors etc.)
Genres
 We want recommended nodes from input
 Find nodes which are
The closest to initial nodes
The most relevant to initial nodes
Movie Recommendation
 165k nodes
Movies
People
Genre
 870k relationships
Movies – People
Movies – Genres
 Easy to add more entities
Tags, mood, period, etc.
 Will it be fast? We need 1-2 seconds
Movie Recommendation (2)
Movie Recommendation (3)
 Breadth first search
Union Colors
Mixing Colors
 Modified Dijkstra
Weighted relationships between entities
 Spreading activation (energy)
Each initial node gets same starting energy
Recommendation Algorithms
Union Colors
Mixing Colors
Spreading Activation (Energy)
100.0
100.0
100.0
100.0
Spreading Activation (Energy)
100.0
100.0
100.0
12.0
12.0
12.0
100.0
Spreading Activation (Energy)
0.0
100.0
100.0
12.0
100.0
10.0
10.0
Spreading Activation (Energy)
0.0
0.0
100.0
22.0
10.0
8.0
8.0 8.0
100.0
8.0
Spreading Activation (Energy)
0.0
0.0
0.0
100.0
22.0
18.0
Recommendation - Evaluation
 Experimental evaluation
Which algorithm is the best (rating on scale 1-5)
30 users / 168 scenarios
3.5
3
2.5
2
1.5
1
0.5
0
Spájanie farieb Miešanie farieb Šírenie energie Dijkstra
Live Demo…
Movie Recommendation (4)
Movie Recommendation – User Model
 Spreading energy
Each initial node gets different starting energy
Based on user’s interests and feedback
 Improves the recommendation!
Recommendation from subgraph
 Recommend movies which are currently in
cinemas
 Recommend movies which are currently on TV
 How?
Algorithm will traverse normally
Creates a subgraph from which it returns nodes
Live Demo…
Recommendation from subgraph (2)
TeleVido.tv
 Media content recommendation using Neo4j
Movie recommendation
Recommendation of movies in cinemas
Recommendation of TV programs and schedules
Summary
 Graph databases
 Working with Neo4j and Ruby (On Rails)
 Plugins and algorithms
Document similarity
Movie recommendation
Recommendation from subgraph
 TeleVido.tv

Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted

  • 1.
    Neo4j - Graphdatabase for recommendations 30.5.2013 Jakub Kříž, Ondrej Proksa
  • 2.
    Summary  Graph databases Working with Neo4j and Ruby (On Rails)  Plugins and algorithms – live demos Document similarity Movie recommendation Recommendation from subgraph  TeleVido.tv
  • 3.
    Why Graphs?  Graphsare everywhere! Natural way to model almost everything “Whiteboard friendly” Even the internet is a graph
  • 4.
    Why Graph Databases? Relational databases are not so great for storing graph structures Unnatural m:n relations Expensive joins Expensive look ups during graph traversals  Graph databases fix this Efficient storage Direct pointers = no joins
  • 5.
    Neo4j  The World'sLeading Graph Database  www.neo4j.org  NOSQL database  Open source - github.com/neo4j  ACID  Brief history Official v1.0 – 2010 Current version 1.9 2.0 coming soon
  • 6.
    Querying Neo4j  Queryinglanguages Structurally similar to SQL Based on graph traversal  Most often used Gremlin – generic graph querying language Cypher – graph querying language for Neo4j SPARQL – generic querying language for data in RDF format
  • 7.
    Cypher Example CREATE (n{name: {value}}) CREATE (n)-[r:KNOWS]->(m) START [MATCH] [WHERE] RETURN [ORDER BY] [SKIP] [LIMIT]
  • 8.
    Cypher Example (2) Friend of a friend START n=node(0) MATCH (n)--()--(f) RETURN f
  • 9.
    Working with Neo4j REST API => wrappers Neography for Ruby py2neo for Python … Your own wrapper  Java API Direct access in JVM based applications neo4j.rb
  • 10.
    Neography – APIwrapper example # create nodes and properties n1 = Neography::Node.create("age" => 31, "name" => "Max") n2 = Neography::Node.create("age" => 33, "name" => "Roel") n1.weight = 190 # create relationships new_rel = Neography::Relationship.create(:coding_buddies, n1, n2) n1.outgoing(:coding_buddies) << n2 # get nodes related by outgoing friends relationship n1.outgoing(:friends) # get n1 and nodes related by friends and friends of friends n1.outgoing(:friends).depth(2).include_start_node
  • 11.
    Neo4j.rb – JRubygem example class Person < Neo4j::Rails::Model property :name property :age, :index => :exact # :fulltext has_n(:friends).to(Person).relationship(Friend) end class Friend < Neo4j::Rails::Relationship property :as end mike = Person.new(:name => ‘Mike’, :age => 24) john = Person.new(:name => ‘John’, :age => 27) mike.friends << john mike.save
  • 12.
    Our Approach  Relationaldatabases are not so bad Good for basic data storage Widely used for web applications Well supported in Rails via ActiveRecord Performance issues with Neo4j  However, we need a graph database We model the domain as a graph Our recommendation is based on graph traversal
  • 13.
    Our Approach (2) Hybrid model using both MySQL and Neo4j  MySQL contains basic information about entities  Neo4j contains only relationships  Paired via identifiers (neo4j_id)
  • 14.
    Our Approach (3) Recommendation algorithms Made as plugins to Neo4j Written in Java Embedded into Neo4j API  Rails application uses custom made wrapper Creates and modifies nodes and relationships via API calls Handles recommendation requests
  • 15.
    Graph Algorithms  Built-inalgorithms Shortest path All shortest paths Dijkstra’s algorithm  Custom algorithms Depth first search Breadth first search Spreading activation Flows, pairing, etc.
  • 16.
    Document Similarity  Task:find similarities between documents  Documents data model: Each document is made of sentences Each sentence can be divided into n-grams N-grams are connected with relationships Neo4J is graph database in Java (Neo4j, graph) – (graph, database) – (database, Java)
  • 17.
  • 18.
     Detecting similardocuments in our graph model Shortest path between documents Number of paths shorter than some distance Weighing relationships  How about a custom plugin? Spreading activation Document Similarity (3)
  • 19.
  • 20.
     Task: recommendmovies based on what we like  We like some entities, let’s call them initial Movies People (actors, directors etc.) Genres  We want recommended nodes from input  Find nodes which are The closest to initial nodes The most relevant to initial nodes Movie Recommendation
  • 21.
     165k nodes Movies People Genre 870k relationships Movies – People Movies – Genres  Easy to add more entities Tags, mood, period, etc.  Will it be fast? We need 1-2 seconds Movie Recommendation (2)
  • 22.
  • 23.
     Breadth firstsearch Union Colors Mixing Colors  Modified Dijkstra Weighted relationships between entities  Spreading activation (energy) Each initial node gets same starting energy Recommendation Algorithms
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
    Recommendation - Evaluation Experimental evaluation Which algorithm is the best (rating on scale 1-5) 30 users / 168 scenarios 3.5 3 2.5 2 1.5 1 0.5 0 Spájanie farieb Miešanie farieb Šírenie energie Dijkstra
  • 32.
  • 33.
    Movie Recommendation –User Model  Spreading energy Each initial node gets different starting energy Based on user’s interests and feedback  Improves the recommendation!
  • 34.
    Recommendation from subgraph Recommend movies which are currently in cinemas  Recommend movies which are currently on TV  How? Algorithm will traverse normally Creates a subgraph from which it returns nodes
  • 35.
  • 36.
    TeleVido.tv  Media contentrecommendation using Neo4j Movie recommendation Recommendation of movies in cinemas Recommendation of TV programs and schedules
  • 37.
    Summary  Graph databases Working with Neo4j and Ruby (On Rails)  Plugins and algorithms Document similarity Movie recommendation Recommendation from subgraph  TeleVido.tv