Neo4j - graph database for recommendations


Published on

The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.

Published in: Technology

Neo4j - graph database for recommendations

  1. 1. Neo4j - Graph database forrecommendationsJakub Kříž, Ondrej Proksa30.5.2013
  2. 2. Summary Graph databases Working with Neo4j and Ruby (On Rails) Plugins and algorithms – live demos Document similarity Movie recommendation Recommendation from subgraph
  3. 3. Why Graphs? Graphs are everywhere! Natural way to model almost everything “Whiteboard friendly” Even the internet is a graph
  4. 4. Why Graph Databases? Relational databases are not so great forstoring graph structures Unnatural m:n relations Expensive joins Expensive look ups during graph traversals Graph databases fix this Efficient storage Direct pointers = no joins
  5. 5. Neo4j The Worlds Leading Graph Database NOSQL database Open source - ACID Brief history Official v1.0 – 2010 Current version 1.9 2.0 coming soon
  6. 6. Querying Neo4j Querying languages Structurally similar to SQL Based on graph traversal Most often used Gremlin – generic graph querying language Cypher – graph querying language for Neo4j SPARQL – generic querying language for data inRDF format
  7. 7. Cypher ExampleCREATE (n {name: {value}})CREATE (n)-[r:KNOWS]->(m)START[MATCH][WHERE]RETURN [ORDER BY] [SKIP] [LIMIT]
  8. 8. Cypher Example (2) Friend of a friendSTART n=node(0)MATCH (n)--()--(f)RETURN f
  9. 9. Working with Neo4j REST API => wrappers Neography for Ruby py2neo for Python … Your own wrapper Java API Direct access in JVM based applications neo4j.rb
  10. 10. Neography – API wrapper example# create nodes and propertiesn1 = Neography::Node.create("age" => 31, "name" => "Max")n2 = Neography::Node.create("age" => 33, "name" => "Roel")n1.weight = 190# create relationshipsnew_rel = Neography::Relationship.create(:coding_buddies, n1, n2)n1.outgoing(:coding_buddies) << n2# get nodes related by outgoing friends relationshipn1.outgoing(:friends)# get n1 and nodes related by friends and friends of friendsn1.outgoing(:friends).depth(2).include_start_node
  11. 11. Neo4j.rb – JRuby gem exampleclass Person < Neo4j::Rails::Modelproperty :nameproperty :age, :index => :exact # :fulltexthas_n(:friends).to(Person).relationship(Friend)endclass Friend < Neo4j::Rails::Relationshipproperty :asendmike = => ‘Mike’, :age => 24)john = => ‘John’, :age => 27)mike.friends <<
  12. 12. Our Approach Relational databases are not so bad Good for basic data storage Widely used for web applications Well supported in Rails via ActiveRecord Performance issues with Neo4j However, we need a graph database We model the domain as a graph Our recommendation is based on graph traversal
  13. 13. Our Approach (2) Hybrid model using both MySQL and Neo4j MySQL contains basic information aboutentities Neo4j contains onlyrelationships Paired viaidentifiers (neo4j_id)
  14. 14. Our Approach (3) Recommendation algorithms Made as plugins to Neo4j Written in Java Embedded into Neo4j API Rails application uses custom made wrapper Creates and modifies nodes and relationships viaAPI calls Handles recommendation requests
  15. 15. Graph Algorithms Built-in algorithms Shortest path All shortest paths Dijkstra’s algorithm Custom algorithms Depth first search Breadth first search Spreading activation Flows, pairing, etc.
  16. 16. Document Similarity Task: find similarities between documents Documents data model: Each document is made of sentences Each sentence can be divided into n-grams N-grams are connected with relationships Neo4J is graph database in Java (Neo4j, graph) – (graph, database) – (database, Java)
  17. 17. Document Similarity (2)
  18. 18.  Detecting similar documents in our graphmodel Shortest path between documents Number of paths shorter than some distance Weighing relationships How about a custom plugin? Spreading activationDocument Similarity (3)
  19. 19. Live Demo…Document Similarity (4)
  20. 20.  Task: recommend movies based on whatwe like We like some entities, let’s call them initial Movies People (actors, directors etc.) Genres We want recommended nodes from input Find nodes which are The closest to initial nodes The most relevant to initial nodesMovie Recommendation
  21. 21.  165k nodes Movies People Genre 870k relationships Movies – People Movies – Genres Easy to add more entities Tags, mood, period, etc. Will it be fast? We need 1-2 secondsMovie Recommendation (2)
  22. 22. Movie Recommendation (3)
  23. 23.  Breadth first search Union Colors Mixing Colors Modified Dijkstra Weighted relationships between entities Spreading activation (energy) Each initial node gets same starting energyRecommendation Algorithms
  24. 24. Union Colors
  25. 25. Mixing Colors
  26. 26. Spreading Activation (Energy)100.0100.0100.0100.0
  27. 27. Spreading Activation (Energy)100.0100.0100.0100.
  28. 28. Spreading Activation (Energy)0.0100.0100.0100.
  29. 29. Spreading Activation (Energy)0.00.0100.0100. 8.08.0
  30. 30. Spreading Activation (Energy)
  31. 31.  Experimental evaluation Which algorithm is the best (rating on scale 1-5) 30 users / 168 scenariosRecommendation - Evaluation00.511.522.533.5Spájanie farieb Miešanie farieb Šírenie energie Dijkstra
  32. 32. Live Demo…Movie Recommendation (4)
  33. 33. Movie Recommendation – User Model Spreading energy Each initial node gets different starting energy Based on user’s interests and feedback Improves the recommendation!
  34. 34. Recommendation from subgraph Recommend movies which are currently incinemas Recommend movies which are currently on TV How? Algorithm will traverse normally Creates a subgraph from which it returns nodes
  35. 35. Live Demo…Recommendation from subgraph (2)
  36. 36. Media content recommendation using Neo4j Movie recommendation Recommendation of movies in cinemas Recommendation of TV programs and schedules
  37. 37. Summary Graph databases Working with Neo4j and Ruby (On Rails) Plugins and algorithms Document similarity Movie recommendation Recommendation from subgraph