Your SlideShare is downloading. ×
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Graph Databases
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Graph Databases

2,727

Published on

Published in: Technology, Business
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,727
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
62
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • * graph db usage poll
  • * Six degrees game * Relational databases can't easily answer certain types of questions
  • * first pass using a relational database * cast table: actor_name, movie_title * hard to visualize the solution * In order to do this, you need to do multiple passes or joins
  • * Each degree adds a join * Increases complexity * Decreases performance * Stop when the actor you're looking for is in the list
  • * this problem highlights the ugly truth about RDBs * they weren't designed to handle these types of problems. * RDB relationships join data, but are not data in themselves
  • * Gather everything in the set that matches these criteria, then tell me if this thing is in the set * 1 set, no problem * 2nd set no problem * 3rd set not related to 1st * 4th not related to 2nd * 5th related to 1st and 4th * etc. * Relationships are only available between overlapping sets
  • * disjoint sets
  • * Graphs * Not X-Y * Computer Science definition of graphs
  • * graph theory
  • * Nodes can have arbitrary properties * Relationships can have arbitrary properties * Paths are found using traversal algorithms * Indexes help find starting points
  • * This is how graph dbs solve the problems that RDBs can't
  • * Tree data-structures * Networks * Maps * vehicles on streets == packets through network
  • * Make each record a node * Make every foreign key a relationship * RDB indexes are usually stored in a tree structure * Trees are graphs * Why not use RDBs? * The trouble with RDBs is how they are stored in memory and queried   * Require a translation step from memory blocks to graph structure * Relationships not first-class citizens * Many problem domains map poorly to rows/tables
  • * Actors are nodes * Movies are nodes * Relationship: Actor is IN a movie * pseudo-code shortened for brevity * Compare to degree selection join queries
  • * Social networking - friends of friends of friends of friends * Assembly/Manufacturing - 1 widget contains 3 gadgets each contain 2 gizmos * Map directions - starting at my house find a route to the office that goes past the pub * Multi-tenancy - root node per tenant * all queries start at root * No overlap between graphs = no accidental data spillage * Fraud: track transactions back to origination * Pretty much anything that can be drawn on a whiteboard
  • * Example: retail system * Customer makes Order * Store sells Order * Order contains Items * Supplier supplied Items * Customer rates Items * Did this customer rank supplier X highly? * Which suppliers sell the highest rated items? * Does item A get rated higher when ordered with Item B? * All can be answered with RDBs as well * Not as elegant * Not as performant
  • * Recreate Google+
  • * billions of nodes and relationships in a single instance * cluster replication * transactions * native bindings for Ruby, Python, and language that can run in JVM * Licensing * Neo4jPHP - Josh's REST client, no affiliated with Neo Technologies
  • * Index can be saved separately * Or it is saved on `add` * Note that indexes don't have to be on real properties or values
  • * This is where the power of graph dbs comes from * Paths - find any relationship chain between A and B * Traversal - filter out paths that don't meet criteria * Queries - Here is what I want, find it however you can
  • * Paths deal with two known nodes * start and end point * This is the Kevin Bacon example, but with multiple datatypes  * Path can be treated as an array of nodes or relationships * findPathsTo() returns a PathFinder which can have further restrictions placed on it
  • * Written in Javascript * plugins provide other languages: Groovy, Python * Anything that runs on JVM * Path object, check apidocs * inline edit/update/delete * explicit prune evaluator of maxDepth = 1 unless overriden * built in prune: none * built in return: all or all-but-start * Prune: should we continue doen this path? Return: Should we return the entity at this position? * You can return things and still continue traversing * Pros: expressive, powerful, complex search behaviors, in-line edit/update * Cons: complex to write, complex to understand (query languages make this better)
  • * Not very familiar with it * Just mentioning it's out there
  • * Cypher is "what to find" * describe the "shape" of the thing you're looking for * Very white-board friendly * Pros: easy to understand, query looks like domain model * Cons: not as powerful, not fully featured (YET) * result set is an array of arrays 
  • * Three parts ** Where to start ** Shape to find   ** possibly qualifiers ** What to return
  • * If there could be more than one relationship type, could further constrain by ratings 
  • * Webadmin built into neo4j server
  • * RDBs are really good at data aggregation * Set math, duh * Have to traverse the whole graph in order to do aggregation * Truly tabular means not a lot of relationships between the data types
  • Transcript

    • 1. Graph Databases Josh Adell <josh.adell@gmail.com> 20110719
    • 2. Who am I?
        • Software developer: PHP, Javascript, SQL
        • http://www.dunnwell.com
        • Fan of using the right tool for the job
    • 3. The Problem
    • 4. The Solution?
      • > -- Given &quot;Keanu Reeves&quot; find a connection to &quot;Kevin Bacon&quot;
      • > SELECT ??? FROM cast WHERE ???
      • +---------------------------------------------------------------------+
      • | actor_name                 | movie_title                            |
      • +============================+========================================+
      • | Jennifer Connelley         | Higher Learning                        |
      • +----------------------------+----------------------------------------+
      • | Laurence Fishburne         | Mystic River                           |
      • +----------------------------+----------------------------------------+
      • | Laurence Fishburne         | Higher Learning                        |
      • +----------------------------+----------------------------------------+
      • | Kevin Bacon                | Mystic River                           |
      • +----------------------------+----------------------------------------+
      • | Keanu Reeves               | The Matrix                             |
      • +----------------------------+----------------------------------------+
      • | Laurence Fishburne         | The Matrix                             |
      • +----------------------------+----------------------------------------+
    • 5. Find Every Actor at Each Degree
      • > -- First degree
      • > SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')
      • > -- Second degree
      • > SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')))
      • > -- Third degree
      • > SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon'))))
    • 6. The Truth
      • Relational databases aren't very good with relationsh ips
      Data RDBMs
    • 7. RDBs Use Set Math
    • 8. The Real Problem
      • Finding relationships across multiple degrees of separation
      •     ...and across multiple data types
      •     ...and where you don't even know there is a relationship
    • 9. The Real Solution
    • 10. Computer Science Definition
      • A graph is an ordered pair G = (V, E) where V is a set of vertices and E is a set of edges , which are pairs of vertices.
    • 11. Some Graph DB Vocabulary
        • Node : vertex
        • Relationship : edge
        • Property : meta-datum attached to a node or relationship
        • Path : an ordered list of nodes and relationships
        • Index : node or relationship lookup table
    • 12. Relationships are First-Class Citizens
        • Have a type
        • Have properties
        • Have a direction
          • Domain semantics
          • Traversable in any direction
    • 13. Graph Examples
    • 14. Relational Databases are Graphs!
    • 15. New Solution to the Bacon Problem $keanu = $actorIndex->find('name', 'Keanu Reeves'); $kevin = $actorIndex->find('name', 'Kevin Bacon'); $path = $keanu->findPathTo($kevin);
    • 16. Some Graph Use Cases
        • Social networking
        • Manufacturing
        • Map directions
        • Fraud detection
        • Multi-tenancy
    • 17. Modelling a Domain with Graphs
        • Graphs are &quot;whiteboard-friendly&quot;
        • Nouns become nodes
        • Verbs become relationships
        • Properties are adjectives and adverbs
    • 18. Audience Participation!
    • 19.
        • Neo Technologies
        • http://neo4j.org
        • Embedded in Java applications
        • Standalone server via REST
        • Plugins: spatial, lucene, rdf
        • http://github.com/jadell/Neo4jPHP
    • 20. Using the REST client
      • $client = new Client(new Transport());
      • $customer = new Node($client);
      • $customer->setProperty('name', 'Josh')->save();
      • $store = new Node($client);
      • $store->setProperty('name', 'Home Despot')
      •       ->setProperty('location', 'Durham, NC')->save();
      • $order = new Node($client);
      • $order->save();
      • $item = new Node($client);
      • $item->setProperty('item_number', 'Q32-ESM')->save();
      • $order->relateTo($item, 'CONTAINS')->save();
      • $customer->relateTo($order, 'BOUGHT')->save();
      • $store->relateTo($order, 'SOLD')->save();
      • $customerIndex = new Index($client, Index::TypeNode, 'customers');
      • $customerIndex->add($customer, 'name', $customer->getProperty('name'));
      • $customerIndex->add($customer, 'rating', 'A++');
    • 21. Graph Mining
        • Paths
        • Traversals
        • Ad-hoc Queries
    • 22. Path Finding
        • Find any connection from node A to node B
        • Limit by relationship types and/or direction
        • Path finding algorithms: all, simple, shortest, Dijkstra
      $customer = $customerIndex->findOne('name', 'Josh'); $item = $itemIndex->findOne('item_number', 'Q32-ESM'); $path = $item->findPathsTo($customer)               ->setMaxDepth(2)               ->getSinglePath(); foreach ($path as $node) {     echo $node->getId() . &quot;n&quot;; }
    • 23. Traversal
        • Complex/Custom path finding
        • Base next decision on previous path
      $traversal = new Traversal($client); $traversal ->setOrder(Traversal::OrderDepthFirst) ->setUniqueness(Traversal::UniquenessNodeGlobal) ->setPruneEvaluator('javascript','(function traverse(pos) {       if (pos.length() == 1 && pos.lastRelationship.getType() == &quot;CONTAINS&quot;) {         return false;     } else if (pos.length() == 2 && pos.lastRelationship.getType() == &quot;BOUGHT&quot;) {         return false;      }     return true;})(position)') ->setReturnFilter('javascript',      'return position.endNode().getProperty('type') == 'Customer;'); $customers = $traversal->getResults($item, Traversal::ReturnTypeNode);
    • 24.
        • Uses mathematical notation approach
        • Complex traversal behaviors, including backtracking
        • https://github.com/tinkerpop/gremlin/wiki
      • m = [:]
      • g.v(1).out('likes').in('likes').out('likes').groupCount(m)
      • m.sort{a,b -> a.value <=> b.value}
    • 25. Cypher
        • &quot;What to find&quot; vs. &quot;How to find&quot;
      $query = 'START item=(1) MATCH (item)<-[:CONTAINS]-(order)<-[:BOUGHT]-(customer) RETURN customer'; $cypher = new CypherQuery($client, $query); $customers = $cypher->getResultSet();
    • 26. Cypher Syntax
      • START item = (1)                        START item = (1,2,3)
      • START item = (items, 'name:Q32*')       START item = (1), customer = (2,3)
      • MATCH (item)<--(order)                  MATCH (order)-->(item)
      • MATCH (order)-[r]->(item)                                MATCH ()--(item)
      • MATCH
      •      (supplier)-[:SUPPLIES]->(item)<-[:CONTAINS]-(order),
      •     (customer)-[:RATED]->(item)
      • WHERE customer.name = 'Josh' and s.coupon = 'freewidget'
      • RETURN item, order                      RETURN customer, item, r.rating
      • RETURN r~TYPE                                                        RETURN COUNT(*)
      • ORDER BY customer.name DESC             RETURN AVG(r.rating)
      • LIMIT 3 SKIP 2
    • 27. Cypher - All Together Now
      • // Find the top 10 `widget` ratings by customers who bought AND rated
      • // `widgets`, and the supplier
      • START item = (items, 'name:widget')
      • MATCH (item)<--(order)<--(customer)-[r:RATED]->(item)<--(supplier)
      • RETURN customer, r.rating, supplier ORDER BY r.rating DESC LIMIT 10
    • 28. Tools
        • Neoclipse
        • Webadmin
    • 29. Are RDBs Useful At All?
        • Aggregation
        • Ordered data
        • Truly tabular data
        • Few or clearly defined relationships
    • 30. Questions?
    • 31. Resources
        • http://neo4j.org
        • http://docs.neo4j.org
        • http://www.youtube.com/watch?v=UodTzseLh04
          • Emil Eifrem (Neo Tech. CEO) webinar
          • Check out around the 54 minute mark
        • http://github.com/jadell/Neo4jPHP
        • http://joshadell.com
        • [email_address]
        • @josh_adell
        • Google+, Facebook, LinkedIn

    ×