Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Graph Databases

3,246 views

Published on

Published in: Technology, Business
  • Be the first to comment

Graph Databases

  1. 1. Graph Databases Josh Adell <josh.adell@gmail.com> 20110719
  2. 2. Who am I? <ul><ul><li>Software developer: PHP, Javascript, SQL </li></ul></ul><ul><ul><li>http://www.dunnwell.com </li></ul></ul><ul><ul><li>Fan of using the right tool for the job </li></ul></ul>
  3. 3. The Problem
  4. 4. The Solution? <ul><li>> -- Given &quot;Keanu Reeves&quot; find a connection to &quot;Kevin Bacon&quot; </li></ul><ul><li>> SELECT ??? FROM cast WHERE ??? </li></ul><ul><li>+---------------------------------------------------------------------+ </li></ul><ul><li>| actor_name                 | movie_title                            | </li></ul><ul><li>+============================+========================================+ </li></ul><ul><li>| Jennifer Connelley         | Higher Learning                        | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul><ul><li>| Laurence Fishburne         | Mystic River                           | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul><ul><li>| Laurence Fishburne         | Higher Learning                        | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul><ul><li>| Kevin Bacon                | Mystic River                           | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul><ul><li>| Keanu Reeves               | The Matrix                             | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul><ul><li>| Laurence Fishburne         | The Matrix                             | </li></ul><ul><li>+----------------------------+----------------------------------------+ </li></ul>
  5. 5. Find Every Actor at Each Degree <ul><li>> -- First degree </li></ul><ul><li>> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon') </li></ul><ul><li>> -- Second degree </li></ul><ul><li>> SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon'))) </li></ul><ul><li>> -- Third degree </li></ul><ul><li>> SELECT actor_name FROM cast WHERE movie_title IN(SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name IN (SELECT actor_name FROM cast WHERE movie_title IN (SELECT DISTINCT movie_title FROM cast WHERE actor_name='Kevin Bacon')))) </li></ul>
  6. 6. The Truth <ul><li>Relational databases aren't very good with relationsh ips </li></ul>Data RDBMs
  7. 7. RDBs Use Set Math
  8. 8. The Real Problem <ul><li>Finding relationships across multiple degrees of separation </li></ul><ul><li>    ...and across multiple data types </li></ul><ul><li>    ...and where you don't even know there is a relationship </li></ul>
  9. 9. The Real Solution
  10. 10. Computer Science Definition <ul><li>A graph is an ordered pair G = (V, E) where V is a set of vertices and E is a set of edges , which are pairs of vertices. </li></ul>
  11. 11. Some Graph DB Vocabulary <ul><ul><li>Node : vertex </li></ul></ul><ul><ul><li>Relationship : edge </li></ul></ul><ul><ul><li>Property : meta-datum attached to a node or relationship </li></ul></ul><ul><ul><li>Path : an ordered list of nodes and relationships </li></ul></ul><ul><ul><li>Index : node or relationship lookup table </li></ul></ul>
  12. 12. Relationships are First-Class Citizens <ul><ul><li>Have a type </li></ul></ul><ul><ul><li>Have properties </li></ul></ul><ul><ul><li>Have a direction </li></ul></ul><ul><ul><ul><li>Domain semantics </li></ul></ul></ul><ul><ul><ul><li>Traversable in any direction </li></ul></ul></ul>
  13. 13. Graph Examples
  14. 14. Relational Databases are Graphs!
  15. 15. New Solution to the Bacon Problem $keanu = $actorIndex->find('name', 'Keanu Reeves'); $kevin = $actorIndex->find('name', 'Kevin Bacon'); $path = $keanu->findPathTo($kevin);
  16. 16. Some Graph Use Cases <ul><ul><li>Social networking </li></ul></ul><ul><ul><li>Manufacturing </li></ul></ul><ul><ul><li>Map directions </li></ul></ul><ul><ul><li>Fraud detection </li></ul></ul><ul><ul><li>Multi-tenancy </li></ul></ul>
  17. 17. Modelling a Domain with Graphs <ul><ul><li>Graphs are &quot;whiteboard-friendly&quot; </li></ul></ul><ul><ul><li>Nouns become nodes </li></ul></ul><ul><ul><li>Verbs become relationships </li></ul></ul><ul><ul><li>Properties are adjectives and adverbs </li></ul></ul>
  18. 18. Audience Participation!
  19. 19. <ul><ul><li>Neo Technologies </li></ul></ul><ul><ul><li>http://neo4j.org </li></ul></ul><ul><ul><li>Embedded in Java applications </li></ul></ul><ul><ul><li>Standalone server via REST </li></ul></ul><ul><ul><li>Plugins: spatial, lucene, rdf </li></ul></ul><ul><ul><li>http://github.com/jadell/Neo4jPHP </li></ul></ul>
  20. 20. Using the REST client <ul><li>$client = new Client(new Transport()); </li></ul><ul><li>$customer = new Node($client); </li></ul><ul><li>$customer->setProperty('name', 'Josh')->save(); </li></ul><ul><li>$store = new Node($client); </li></ul><ul><li>$store->setProperty('name', 'Home Despot') </li></ul><ul><li>      ->setProperty('location', 'Durham, NC')->save(); </li></ul><ul><li>$order = new Node($client); </li></ul><ul><li>$order->save(); </li></ul><ul><li>$item = new Node($client); </li></ul><ul><li>$item->setProperty('item_number', 'Q32-ESM')->save(); </li></ul><ul><li>$order->relateTo($item, 'CONTAINS')->save(); </li></ul><ul><li>$customer->relateTo($order, 'BOUGHT')->save(); </li></ul><ul><li>$store->relateTo($order, 'SOLD')->save(); </li></ul><ul><li>$customerIndex = new Index($client, Index::TypeNode, 'customers'); </li></ul><ul><li>$customerIndex->add($customer, 'name', $customer->getProperty('name')); </li></ul><ul><li>$customerIndex->add($customer, 'rating', 'A++'); </li></ul>
  21. 21. Graph Mining <ul><ul><li>Paths </li></ul></ul><ul><ul><li>Traversals </li></ul></ul><ul><ul><li>Ad-hoc Queries </li></ul></ul>
  22. 22. Path Finding <ul><ul><li>Find any connection from node A to node B </li></ul></ul><ul><ul><li>Limit by relationship types and/or direction </li></ul></ul><ul><ul><li>Path finding algorithms: all, simple, shortest, Dijkstra </li></ul></ul>$customer = $customerIndex->findOne('name', 'Josh'); $item = $itemIndex->findOne('item_number', 'Q32-ESM'); $path = $item->findPathsTo($customer)               ->setMaxDepth(2)               ->getSinglePath(); foreach ($path as $node) {     echo $node->getId() . &quot;n&quot;; }
  23. 23. Traversal <ul><ul><li>Complex/Custom path finding </li></ul></ul><ul><ul><li>Base next decision on previous path </li></ul></ul>$traversal = new Traversal($client); $traversal ->setOrder(Traversal::OrderDepthFirst) ->setUniqueness(Traversal::UniquenessNodeGlobal) ->setPruneEvaluator('javascript','(function traverse(pos) {       if (pos.length() == 1 && pos.lastRelationship.getType() == &quot;CONTAINS&quot;) {         return false;     } else if (pos.length() == 2 && pos.lastRelationship.getType() == &quot;BOUGHT&quot;) {         return false;      }     return true;})(position)') ->setReturnFilter('javascript',      'return position.endNode().getProperty('type') == 'Customer;'); $customers = $traversal->getResults($item, Traversal::ReturnTypeNode);
  24. 24. <ul><ul><li>Uses mathematical notation approach </li></ul></ul><ul><ul><li>Complex traversal behaviors, including backtracking </li></ul></ul><ul><ul><li>https://github.com/tinkerpop/gremlin/wiki </li></ul></ul><ul><li>m = [:] </li></ul><ul><li>g.v(1).out('likes').in('likes').out('likes').groupCount(m) </li></ul><ul><li>m.sort{a,b -> a.value <=> b.value} </li></ul>
  25. 25. Cypher <ul><ul><li>&quot;What to find&quot; vs. &quot;How to find&quot; </li></ul></ul>$query = 'START item=(1) MATCH (item)<-[:CONTAINS]-(order)<-[:BOUGHT]-(customer) RETURN customer'; $cypher = new CypherQuery($client, $query); $customers = $cypher->getResultSet();
  26. 26. Cypher Syntax <ul><li>START item = (1)                        START item = (1,2,3) </li></ul><ul><li>START item = (items, 'name:Q32*')       START item = (1), customer = (2,3) </li></ul><ul><li>MATCH (item)<--(order)                  MATCH (order)-->(item) </li></ul><ul><li>MATCH (order)-[r]->(item)                                MATCH ()--(item) </li></ul><ul><li>MATCH </li></ul><ul><li>     (supplier)-[:SUPPLIES]->(item)<-[:CONTAINS]-(order), </li></ul><ul><li>    (customer)-[:RATED]->(item) </li></ul><ul><li>WHERE customer.name = 'Josh' and s.coupon = 'freewidget' </li></ul><ul><li>RETURN item, order                      RETURN customer, item, r.rating </li></ul><ul><li>RETURN r~TYPE                                                        RETURN COUNT(*) </li></ul><ul><li>ORDER BY customer.name DESC             RETURN AVG(r.rating) </li></ul><ul><li>LIMIT 3 SKIP 2 </li></ul>
  27. 27. Cypher - All Together Now <ul><li>// Find the top 10 `widget` ratings by customers who bought AND rated </li></ul><ul><li>// `widgets`, and the supplier </li></ul><ul><li>START item = (items, 'name:widget') </li></ul><ul><li>MATCH (item)<--(order)<--(customer)-[r:RATED]->(item)<--(supplier) </li></ul><ul><li>RETURN customer, r.rating, supplier ORDER BY r.rating DESC LIMIT 10 </li></ul>
  28. 28. Tools <ul><ul><li>Neoclipse </li></ul></ul><ul><ul><li>Webadmin </li></ul></ul>
  29. 29. Are RDBs Useful At All? <ul><ul><li>Aggregation </li></ul></ul><ul><ul><li>Ordered data </li></ul></ul><ul><ul><li>Truly tabular data </li></ul></ul><ul><ul><li>Few or clearly defined relationships </li></ul></ul>
  30. 30. Questions?
  31. 31. Resources <ul><ul><li>http://neo4j.org </li></ul></ul><ul><ul><li>http://docs.neo4j.org </li></ul></ul><ul><ul><li>http://www.youtube.com/watch?v=UodTzseLh04 </li></ul></ul><ul><ul><ul><li>Emil Eifrem (Neo Tech. CEO) webinar </li></ul></ul></ul><ul><ul><ul><li>Check out around the 54 minute mark </li></ul></ul></ul><ul><ul><li>http://github.com/jadell/Neo4jPHP </li></ul></ul><ul><ul><li>http://joshadell.com </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>@josh_adell </li></ul></ul><ul><ul><li>Google+, Facebook, LinkedIn </li></ul></ul>

×