The trend nowadays is to represent the relationships between entities in a graph structure. Neo4j is a NOSQL graph database, which allows for fast and effective queries on connected data. Implementation of own algorithms is possible, which can improve the functionality of built in API. We make use of the graph database to model and recommend movies and other media content.
Neo4j - Graph database forrecommendationsJakub Kříž, Ondrej Proksa30.5.2013
Summary Graph databases Working with Neo4j and Ruby (On Rails) Plugins and algorithms – live demos Document similarity Movie recommendation Recommendation from subgraph TeleVido.tv
Why Graphs? Graphs are everywhere! Natural way to model almost everything “Whiteboard friendly” Even the internet is a graph
Why Graph Databases? Relational databases are not so great forstoring graph structures Unnatural m:n relations Expensive joins Expensive look ups during graph traversals Graph databases fix this Efficient storage Direct pointers = no joins
Neo4j The Worlds Leading Graph Database www.neo4j.org NOSQL database Open source - github.com/neo4j ACID Brief history Official v1.0 – 2010 Current version 1.9 2.0 coming soon
Querying Neo4j Querying languages Structurally similar to SQL Based on graph traversal Most often used Gremlin – generic graph querying language Cypher – graph querying language for Neo4j SPARQL – generic querying language for data inRDF format
Cypher Example (2) Friend of a friendSTART n=node(0)MATCH (n)--()--(f)RETURN f
Working with Neo4j REST API => wrappers Neography for Ruby py2neo for Python … Your own wrapper Java API Direct access in JVM based applications neo4j.rb
Neography – API wrapper example# create nodes and propertiesn1 = Neography::Node.create("age" => 31, "name" => "Max")n2 = Neography::Node.create("age" => 33, "name" => "Roel")n1.weight = 190# create relationshipsnew_rel = Neography::Relationship.create(:coding_buddies, n1, n2)n1.outgoing(:coding_buddies) << n2# get nodes related by outgoing friends relationshipn1.outgoing(:friends)# get n1 and nodes related by friends and friends of friendsn1.outgoing(:friends).depth(2).include_start_node
Our Approach Relational databases are not so bad Good for basic data storage Widely used for web applications Well supported in Rails via ActiveRecord Performance issues with Neo4j However, we need a graph database We model the domain as a graph Our recommendation is based on graph traversal
Our Approach (2) Hybrid model using both MySQL and Neo4j MySQL contains basic information aboutentities Neo4j contains onlyrelationships Paired viaidentifiers (neo4j_id)
Our Approach (3) Recommendation algorithms Made as plugins to Neo4j Written in Java Embedded into Neo4j API Rails application uses custom made wrapper Creates and modifies nodes and relationships viaAPI calls Handles recommendation requests
Graph Algorithms Built-in algorithms Shortest path All shortest paths Dijkstra’s algorithm Custom algorithms Depth first search Breadth first search Spreading activation Flows, pairing, etc.
Document Similarity Task: find similarities between documents Documents data model: Each document is made of sentences Each sentence can be divided into n-grams N-grams are connected with relationships Neo4J is graph database in Java (Neo4j, graph) – (graph, database) – (database, Java)
Detecting similar documents in our graphmodel Shortest path between documents Number of paths shorter than some distance Weighing relationships How about a custom plugin? Spreading activationDocument Similarity (3)
Task: recommend movies based on whatwe like We like some entities, let’s call them initial Movies People (actors, directors etc.) Genres We want recommended nodes from input Find nodes which are The closest to initial nodes The most relevant to initial nodesMovie Recommendation
165k nodes Movies People Genre 870k relationships Movies – People Movies – Genres Easy to add more entities Tags, mood, period, etc. Will it be fast? We need 1-2 secondsMovie Recommendation (2)
Experimental evaluation Which algorithm is the best (rating on scale 1-5) 30 users / 168 scenariosRecommendation - Evaluation00.511.522.533.5Spájanie farieb Miešanie farieb Šírenie energie Dijkstra
Movie Recommendation – User Model Spreading energy Each initial node gets different starting energy Based on user’s interests and feedback Improves the recommendation!
Recommendation from subgraph Recommend movies which are currently incinemas Recommend movies which are currently on TV How? Algorithm will traverse normally Creates a subgraph from which it returns nodes