Graph db: time for serious stuff @ codemotion 23/03/2012

8,841 views
8,836 views

Published on

Graph databases are not widespread in the development communities, although they are a swiss-army knife for problem that the relational model can't simple handle well. In this talk we're gonna talk for a few minutes about the graph theory, see how to easily solve a few relational anti-patterns with graph databases and how to integrate them in your next project. At the end we will take a practical look to OrientDB, "next big thing" of the NoSQL ecosystem, through its PHP Data Mapper, "Orient".

Published in: Technology

Graph db: time for serious stuff @ codemotion 23/03/2012

  1. 1. Graph databases:time for serious stuff David Funaro Alessandro Nadalin 1
  2. 2. Agenda•Theory•When to use a graph?•Why graphDB?•The graphDB community•OrientDB•Orient PHP library 2
  3. 3. Essential (Theory)G = (V, E) VertexGraph Edge A 3
  4. 4. Binary Relation Hates A BItchy Scratchy 4
  5. 5. Binary Relation Edge A BVertex Vertex 4
  6. 6. Graph B E FA D G 5
  7. 7. Undirected Graph B E A D FExample: Friendship 6
  8. 8. Directed Edge Edge A BVertex Vertex 7
  9. 9. Directed Graph A B A F DExample: Followee 8
  10. 10. Path B E FA D G 9
  11. 11. PathA B D G E F 10
  12. 12. Graph -> GraphDBA GraphDB is a database that use the graph as its primary data structure 11
  13. 13. ... when to use a graph ?
  14. 14. Recommendations lives inJohn type shows Mr Fun Cinema B Bean loca tion lik Rome es shows Cinema A location type Thriller Se7en s ho ws location Milan Cinema C 14
  15. 15. Recommendations lives inJohn x x x Fun type Mr Bean shows Cinema B loca tion lik ✓ Rome es type shows ✓ Cinema A location ✓ ✓ shows x x ✓ Thriller Se7en location Milan Cinema C 22
  16. 16. Your data is a graph 23
  17. 17. a tree is a graph 24
  18. 18. parent_id is a graph 25
  19. 19. Solve decision problems
  20. 20. Maximum flow
  21. 21. Given a dataset, calculate how to best organize it maximum flow
  22. 22. travelling salesman problem
  23. 23. The pizza guy needs to deliver on A, B,C.
  24. 24. Decision base on distance, traffic, time and so on.
  25. 25. Shortest path
  26. 26. Identify "special" nodes of the graph
  27. 27. Given your dataset, organize some clusters Are there some nodes which cannot belong to a cluster?They probably have some properties different from the average
  28. 28. Given your dataset, organize some clusters Are there some nodes which cannot belong to a cluster?They probably have some properties different from the average ACHTUNG! TERRORISTEN!
  29. 29. but ... why graphDB? 36
  30. 30. Representing a Graph in: http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching- scoring-ranking-and-recommendation#✓ Relational Database (mysql, oracle)✓ Document Oriented DB (mongodb, couchdb)✓ XML Database (MarkLogic, eXist-db) 37
  31. 31. where is the difference ? 38
  32. 32. GraphDB A graph database is any storage system that provides index-free adjacency.http://www.slideshare.net/slidarko/problemsolving-using-graph-traversals-searching-scoring-ranking-and-recommendation
  33. 33. Step by step exampleGiven a list of people, find their homepages 40
  34. 34. Tree-based DB WAY David Funaro put in the Search Engine 2 find 3 1 http://davidfunaro.com 41
  35. 35. Tree-based DB WAY David Funaro The cost to find Search Engine friend HP put in the a single 2grows as the friends HP tables grows find 3 1 http://davidfunaro.com 41
  36. 36. GraphDB WAY 1 get the embedded information(index) www.odino.orgit’s like that the GraphDB has an additional information (the ancor <a>) 42
  37. 37. GraphDB WAY The Anchor work as a local index to reach the document = index-free adjacency <a href=”http://odino.org”> Alessandro Nadalin </a> 43
  38. 38. Local costThe local cost is O(k) = Constant 44
  39. 39. Local costThe local cost is O(k) = Constant 45
  40. 40. Local cost Thus, as the graph grows in size,the cost of a local step remain the same 46
  41. 41. any database can implicity represent a graph BUTonly a graph database make the graph structure explicit 47
  42. 42. BenchmarkDeph RDBMS Graph 1 100ms 30ms • 1 Million Vertex • 4 Million Edge 2 1000ms 500ms • Scale Free Tolopogy 3 10000ms 3000ms • MySql VS Neo4J 4 100000m 50000ms s • Both Hash and BTree 5 N/A 100000m s 48 http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
  43. 43. How ?PREFIX geospecies: <http://rdf.geospecies.org/ont/geospecies#>PREFIX lycopodiophyta: <http://lod.geospecies.org/phyla/Pc2>PREFIX door_county: <http://sws.geonames.org/5250768/>PREFIX dcterms: <http://purl.org/dc/terms/>SELECT DISTINCT ?family_name ?canonicalName ?commonName ?identifier ?wikipedia_urlWHERE {?x geospecies:hasFamilyName ?family_name; geospecies:hasCanonicalName ?canonicalName; geospecies:hasCommonName ?commonName; dcterms:identifier ?identifier; geospecies:inPhylum lycopodiophyta:; geospecies:isUSDA_ExpectedIn door_county:. OPTIONAL { ?x geospecies:hasCommonName ?commonName; geospecies:hasWikipediaArticle ?wikipedia_url} } ORDER BY ?family_name ?canonicalName 49
  44. 44. 50
  45. 45. NoSPARQL http://blog.acaro.org/entry/somebody-is-going-to-hate-me-nosparql
  46. 46. community that is building and feeding the GraphDB ecosystem NoSPARQL ThinkerPop Stack Databases
  47. 47. data model and their implementation Blueprints is a collection of interfaces, implementations,ouplementations, and test suites for the property graph data model. Blueprints is analogous to the JDBC, but for graph databases. https://github.com/tinkerpop/blueprints/wiki/
  48. 48. a data flow Framework using Process Graph provide a collection of "pipes" that are connected togheter to from processing pipelines
  49. 49. a graph-based programming language.a Turing-Complete graph-base programming language that compiles Gremlin syntax down to Pipes
  50. 50. a REST-full graph shell.Allow blueprints graph to be exposed through a RESTful API (HTTP)
  51. 51. Whats hot
  52. 52. OrientDB
  53. 53. Glossary RID <10:05>Cluster Position CLASS 59
  54. 54. Main features
  55. 55. Inheritance
  56. 56. class Vehicle class Carclass Bike
  57. 57. class Vehicle class Car class BikeSELECT FROM Vehicle WHERE owner = 1:1
  58. 58. class Vehicle class Car class Bikecan return records of class Bike or Car
  59. 59. Traversal
  60. 60. SELECT FROM fellas WHERE any() traverse(0,-1) ( @rid = [Michelle @rid] ) 68
  61. 61. SELECT FROM fellas WHERE any() traverse(0,2) ( @rid = [Michelle @rid] )
  62. 62. SQL synthax
  63. 63. beyond SQL
  64. 64. SELECT FROM authors WHERE book.title = ...
  65. 65. ACID
  66. 66. speaks JSON
  67. 67. { "schema": { "name": "Address" }, "result": [{ "@type": "d", "@rid": "#13:0", "@version": 6, "@class": "Address", "type": "Residence", "street": "Piazza Navona, 1", "city": "#14:0", "nick": "Luca2" }, { ... ...
  68. 68. Double Protocol
  69. 69. HTTPUniversal
  70. 70. HTTPEasy to interact with
  71. 71. binary Blazing fast
  72. 72. on-record SELECTs
  73. 73. SELECT FROM cats
  74. 74. SELECT FROM 11:0
  75. 75. SELECT FROM [11:0,11:1]
  76. 76. SELECT FROM [11:0,12:0]
  77. 77. stress-free setup
  78. 78. 2 Mb
  79. 79. ./orient/bin/server.sh 94
  80. 80. in-memory DB
  81. 81. or disk-persisted
  82. 82. Supports standards Supports standards 97
  83. 83. •Inheritance •Traversal •SQL-like syntax •ACIDOrientDB •Speak JSON •Double protocol •on-record Select •ThinkerPop Compliant
  84. 84. Language Bindings http://code.google.com/p/orient/wiki/ProgrammingLanguageBindings 99
  85. 85. Orient = PHP Library to work with OrientDB https://github.com/congow/Orient 101
  86. 86. Data MapperQuery Builder HTTP Binding
  87. 87. HTTP Binding
  88. 88. use CongowOrient;use CongowOrientFoundationBinding;$driver   = new OrientHttpClientCurl();$orient   = new Binding($driver, 127.0.0.1, 2480, admin, admin, demo);$response = $orient->query("SELECT FROM Address");$output   = json_decode($response->getBody());foreach ($output->result as $address){  var_dump($address->street);}
  89. 89. apart from ->query($SQL)
  90. 90. ->get|delete|postClass($class)
  91. 91. ->post|delete|put|getDocument($rid)
  92. 92. ...and much more!(connect, disconnect, ...)
  93. 93. Query Builder
  94. 94. use CongowOrientQuery;$query = new Query();$query->from(array(users))->where(username = ?, "admin");echo $query->getRaw(); // SELECT FROM users WHERE username = "admin"
  95. 95.    $query->select(array(name, username, email), false)    ->from(array(12:0, 12:1), false)    ->where(any() traverse ( any() like "%danger%" ))    ->orWhere("1 = ?", 1)    ->andWhere("links = ?", 1)    ->limit(20)    ->orderBy(username)    ->orderBy(name, true, true)    ->range("12:0", "12:1");  SELECT name, username, email   FROM [12:0, 12:1]   WHERE any() traverse ( any() like "%danger%" )  OR 1 = "1" AND links = "1"   ORDER BY name, username   LIMIT 20   RANGE 12:0 12:1
  96. 96. Data Mapper
  97. 97. namespace PolandPHPConEntity;use CongowOrientODMMapperAnnotations as ODM;/*** @ODMDocument(class="Person")*/class Speaker{    /**     * @ODMProperty( type="string")     */    protected $name;    public function setName($name)    {        $this->name = $name;    }
  98. 98. Domain Driven Design
  99. 99. { "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ...
  100. 100. { "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "David Coallier" }, { ... ... $david = $mapper->hydrate(json_decode($speaker));
  101. 101. { "schema": { "name": "Speaker" }, "result": [{ "@type": "d", "@rid": "#1:0", "@version": 6, "@class": "Speaker", "name": "Martin Fowler" }, { ... ... $david instanceOf PolandPHPConEntitySpeaker
  102. 102. Repository Pattern$repo = $manager->getRepository(Speaker)
  103. 103. $speakers = $repo->findAll();
  104. 104. $speaker = $repo->find($rid);
  105. 105. $criteria = array(Name => Martin); $lornas = $repo->findBy($criteria);
  106. 106. $criteria = array( Name => Martin, last_name => Fowler);$lornaJ = $repo->findOneBy($criteria);
  107. 107. https://github.com/doctrine/common/tree/master/lib/Doctrine/Common/Persistence 134
  108. 108. 135
  109. 109. That’s all, folks!David Funaro Alessandro Nadalin@ingdavidino @_odino_http://davidfunaro.com http://odino.org 136
  110. 110. Creditshttp://www.flickr.com/photos/sayamindu/5677281218/sizes/l/in/photostream/ http://farm1.static.flickr.com/182/471383865_79d04aec36_o.png http://farm1.static.flickr.com/134/318947873_12028f1b66_b.jpg http://www.flickr.com/photos/atomdocs/3275758118/sizes/o/in/photostream/ http://www.flickr.com/photos/pattipics/5229478393/sizes/o/in/photostream/ http://www.flickr.com/photos/kongharald/366597251/sizes/o/in/photostream/ http://www.everaldo.com/ http://www.flickr.com/photos/tusnelda/6140792529/sizes/l/in/photostream/ http://www.flickr.com/photos/mondi/5368644355/sizes/l/in/photostream/ http://www.flickr.com/photos/jayneandd/4191106566/sizes/l/in/photostream/ http://www.flickr.com/photos/jooon/2093253534/sizes/l/in/photostream/ http://www.flickr.com/photos/bluedharma/89186151/sizes/o/in/photostream/ http://www.flickr.com/photos/exfordy/2747089295/sizes/l/in/photostream/ http://www.flickr.com/photos/nostri-imago/3137422976/sizes/o/in/photostream/ http://www.flickr.com/photos/fionasjournal/379587818/sizes/z/in/photostream/ http://www.flickr.com/photos/nperlapro/1297392267/ http://www.flickr.com/photos/fastphive/28428808/sizes/m/in/photostream/ http://www.flickr.com/photos/rnugraha/2003147365/sizes/o/in/photostream/ http://www.flickr.com/photos/zigazou76/4412946911/sizes/l/in/photostream/ http://www.flickr.com/photos/greatnet/4667555436/sizes/l/in/photostream/ http://www.flickr.com/photos/mnsc/2768391365/sizes/l/in/photostream/http://www.flickr.com/photos/christmaswithak/4675962453/sizes/l/in/photostream/ http://www.amazon.com/Trainspotting-Irvine-Welsh/dp/0393314804http://www.flickr.com/photos/franconadalin59/5778176872/sizes/l/in/photostream/ http://farm6.static.flickr.com/5176/5474445627_875d621689_b.jpg http://farm3.static.flickr.com/2243/2189435082_a16d3c89ae_b.jpg http://farm3.static.flickr.com/2647/3816311930_ac52cff491_o.jpg http://i130.photobucket.com/albums/p266/feike1977/PES6-4-3-3defencesettings.jpg http://images.usatoday.com/life/_photos/2006/11/30/numb3rs-topper.jpg http://www.flickr.com/photos/jakecaptive/3205277810/sizes/l/in/photostream/

×