0
NoSQL with Graphs                              mining graphs for fun & profit                claudio martella              ...
Outline                                   Semantic Web                                                             Documen...
Who am I?                    • PhD in Distributed Graphs @ UniBZ                    • Analyst @ TIS Innovation Park       ...
Surrounded by graphs                    • the Web Graph                    • Semantic Web                    • Social Netw...
Property Graph                    • A Graph is composed by Vertices and Edges                    • Vertices are connected ...
GraphDB                                              belongs to                                                        NoS...
A graph in RDBMS                      ID         Name          Follower   Followee                           1     Claudio...
BTree Index 101             •      Lookup costs Log(N)             •      Where N is the global                    size of...
A lookup (RDBMS)                                                       I     Name    Fr    Fe                    •      Lo...
A graph in NoSQL                             ID       F1      F2       F3      ...                           Cirpo      .....
A lookup (NoSQL)                                                ID    F1    F2    F3    ...                               ...
A graph in GraphDB                                               name: Cirpo                                       2      ...
A lookup (Graph)                                                                      name: Cirpo                         ...
What about Friends (of Friends)*?                               14Saturday, March 26, 2011
A benchmark                                                       Depth RDBMS                   Graph               •     ...
A benchmark                                                                   DB           #          Time                ...
A Graph Database                       allows O(1) access to                          adjacent Vertices                  R...
Example: Queries                                                                              genre   Action              ...
Example: Queries                                                                              genre   Action              ...
Example: Queries                                                                              genre   Action              ...
Example: Queries                                                                              genre   Action              ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Example: Recommendations                                                                       tagged     Sci-Fi          ...
Graph Mining                                 How are they connected?                           Ref: Programming the Semant...
Graph Mining                           Ref: Programming the Semantic Web - O’Reilly                                       ...
Graph Mining                                33Saturday, March 26, 2011
Other Applications                    • Community Analysis                    • Fraud Detection                    • Plann...
as you can’t get rid of logicians                                      35Saturday, March 26, 2011
there’s an SQL also for Graphs                                  36Saturday, March 26, 2011
Triplestores                                           Scientology                                          advocate      ...
Triplestores                            Subject     Predicate      Object                           Tom Cruise     actor  ...
SPARQL               PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#>               SELECT ?name ?marriedOn       ...
what if Internet was your GraphDB?                                 40Saturday, March 26, 2011
41Saturday, March 26, 2011
what about a NoSPARQL?                           42Saturday, March 26, 2011
Tinkerpop                               43Saturday, March 26, 2011
•      Blueprints is the like the JDBC of the graph database                           community.                    •    ...
•      A dataflow framework with support for Blueprints-based                           graph processing.                  ...
•      A Turing-complete, graph-based programming language that                           compiles Gremlin syntax down to ...
•      Allows Blueprints graphs to be exposed through a RESTful                           API (HTTP)                    • ...
Sample Stack                    •      HTTP Request arrives                    •      Converts REST to                    ...
Neo4J                •          Engine: Graph                •          License: AGPLv3                •          Language...
OrientDB                •          Engine: Document-Graph                •          License: Apache 2.0                •  ...
HypergraphDB                •          Engine: HyperGraph                •          License: LGPL                •        ...
InfiniteGraph                •          Engine: Graph                •          License: Commercial                •       ...
Where do I go now?                    Tinkerpop: http://www.tinkerpop.com                    Neo4J: http://neo4j.org      ...
Questions?                              claudio.martella@gmail.com                                   http://blog.acaro.org...
Upcoming SlideShare
Loading in...5
×

NoSQL with Graphs

12,426

Published on

A talk about graph databases and their usage.

Published in: Technology
0 Comments
31 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,426
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
320
Comments
0
Likes
31
Embeds 0
No embeds

No notes for slide

Transcript of "NoSQL with Graphs"

  1. 1. NoSQL with Graphs mining graphs for fun & profit claudio martella NoSQLDay 2011Saturday, March 26, 2011
  2. 2. Outline Semantic Web Documents Tinkerpop GraphDBs Tools NoSQL Table Graphs O(1) Why Apps Query RDBMS Recommendation 2Saturday, March 26, 2011
  3. 3. Who am I? • PhD in Distributed Graphs @ UniBZ • Analyst @ TIS Innovation Park • Topics: Data / Text Mining with Graphs • Technology: Hadoop, NoSQL, GraphDBs • Writing Graffiti 3Saturday, March 26, 2011
  4. 4. Surrounded by graphs • the Web Graph • Semantic Web • Social Networks • Natural Sciences • GIS 4Saturday, March 26, 2011
  5. 5. Property Graph • A Graph is composed by Vertices and Edges • Vertices are connected by Edges • An Edge has a Label and Direction • Edges and Vertices have Properties 5Saturday, March 26, 2011
  6. 6. GraphDB belongs to NoSQL belongs to belongs to likes Hadoop Graffiti works with name: claudio author Me surname: martella email: claudio.martella@gmail.com works at studies at TIS UniBZ Who am I? 6Saturday, March 26, 2011
  7. 7. A graph in RDBMS ID Name Follower Followee 1 Claudio 1 2 2 Cirpo 1 3 3 Okram 1 4 4 Spinoza 2 5 ... ... ... ... 7Saturday, March 26, 2011
  8. 8. BTree Index 101 • Lookup costs Log(N) • Where N is the global size of the data structure • Updating the index is Cirpo Claudio Okram Spinoza also not for free 8Saturday, March 26, 2011
  9. 9. A lookup (RDBMS) I Name Fr Fe • Look for Claudio’s ID 1 Claudio 1 2 [ Log(N) ] 2 Cirpo 1 3 • Look for K Followees [ Log(N) ] 3 Okram 1 4 • Get their names 4 Spinoza 2 5 [ K*Log(N) ] ... ... ... ... 9Saturday, March 26, 2011
  10. 10. A graph in NoSQL ID F1 F2 F3 ... Cirpo ... ... ... ... Claudio Cirpo Okram Spinoza ... Okram ... ... ... ... Spinoza ... ... ... ... ... ... ... ... ... 10Saturday, March 26, 2011
  11. 11. A lookup (NoSQL) ID F1 F2 F3 ... Cirpo ... ... ... ... • Look for Claudio’s ID [ Log(N) ] Claudio ... ... ... ... • Look for Followees Okram ... ... ... ... [ O(K) ] Spinoza ... ... ... ... ... ... ... ... ... 11Saturday, March 26, 2011
  12. 12. A graph in GraphDB name: Cirpo 2 follows follows 1 3 name: Claudio name: Okram follows 4 name: Spinoza 12Saturday, March 26, 2011
  13. 13. A lookup (Graph) name: Cirpo 2 • Look for Claudio’s ID follows follows [ Log(N) ] 1 3 name: Claudio • Look for Followees name: Okram [ O(K) ] follows 4 name: Spinoza 13Saturday, March 26, 2011
  14. 14. What about Friends (of Friends)*? 14Saturday, March 26, 2011
  15. 15. A benchmark Depth RDBMS Graph • 1 Million Vertices 1 100ms 30ms • 4 Million Edges 2 1000ms 500ms • Scale-Free Topology 3 10000ms 3000ms • Postgres VS Neo4J 4 100000ms 50000ms • Both Hash and BTree 5 N/A 100000ms Ref: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/ 15Saturday, March 26, 2011
  16. 16. A benchmark DB # Time RDBMS 1K 2000ms • 50 friends on average Graph 1K 2ms • Look if there’s a path connecting two people Graph 1M 2ms RDBMS 1M N/A Ref: http://www.slideshare.net/thobe/nosqleu-graph-databases-and-neo4j 16Saturday, March 26, 2011
  17. 17. A Graph Database allows O(1) access to adjacent Vertices Ref: The Graph Traversal Pattern: Marko A. Rodriguez and Peter Neubauer 17Saturday, March 26, 2011
  18. 18. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 18Saturday, March 26, 2011
  19. 19. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 19Saturday, March 26, 2011
  20. 20. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 20Saturday, March 26, 2011
  21. 21. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 21Saturday, March 26, 2011
  22. 22. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 22Saturday, March 26, 2011
  23. 23. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 23Saturday, March 26, 2011
  24. 24. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Adventure The Lord of the Graphs tagged likes Trilogy tagged Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 24Saturday, March 26, 2011
  25. 25. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Adventure The Lord of the Graphs tagged likes Trilogy tagged Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 25Saturday, March 26, 2011
  26. 26. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 26Saturday, March 26, 2011
  27. 27. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 27Saturday, March 26, 2011
  28. 28. Example: Recommendations tagged Sci-Fi Graph Runner likes Claudio tagged likes tagged Adventure likes The Lord of the Graphs tagged Cirpo likes tagged Trilogy likes Caprazzi Javatar likes tagged Geeky tagged likes PHP I love You tagged Boring 28Saturday, March 26, 2011
  29. 29. Example: Recommendations tagged Sci-Fi likes Graph Runner Claudio likes tagged Trilogy tagged likes The Lord of the Graphs Cirpo tagged likes Adventure tagged likes likes Javatar tagged Caprazzi likes tagged Geeky PHP I love You tagged Boring 29Saturday, March 26, 2011
  30. 30. Example: Recommendations tagged Sci-Fi likes Graph Runner Claudio likes tagged Trilogy tagged likes The Lord of the Graphs Cirpo tagged likes Adventure tagged likes likes Javatar tagged Caprazzi likes tagged Geeky PHP I love You tagged Boring 30Saturday, March 26, 2011
  31. 31. Graph Mining How are they connected? Ref: Programming the Semantic Web - O’Reilly 31Saturday, March 26, 2011
  32. 32. Graph Mining Ref: Programming the Semantic Web - O’Reilly 32Saturday, March 26, 2011
  33. 33. Graph Mining 33Saturday, March 26, 2011
  34. 34. Other Applications • Community Analysis • Fraud Detection • Planning • Text Processing • Reasoning 34Saturday, March 26, 2011
  35. 35. as you can’t get rid of logicians 35Saturday, March 26, 2011
  36. 36. there’s an SQL also for Graphs 36Saturday, March 26, 2011
  37. 37. Triplestores Scientology advocate Katie Holmes married Hollywood lives Tom Cruise born actor July 3, 1962 Top Gun 37Saturday, March 26, 2011
  38. 38. Triplestores Subject Predicate Object Tom Cruise actor Top Gun Tom Cruise married Katie Holmes Tom Cruise advocate Scientology Tom Cruise lives Hollywood Tom Cruise born July 3, 1962 38Saturday, March 26, 2011
  39. 39. SPARQL PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#> SELECT ?name ?marriedOn FROM <http://www.daml.org/2001/01/gedcom/royal92.daml> WHERE { ?royal ged:title "Princess". ?royal ged:name ?name. ?royal ged:spouseIn ?family. ?family ged:marriage ?marriage. ?marriage ged:date ?marriedOn. } ORDER BY ASC [?name] 39Saturday, March 26, 2011
  40. 40. what if Internet was your GraphDB? 40Saturday, March 26, 2011
  41. 41. 41Saturday, March 26, 2011
  42. 42. what about a NoSPARQL? 42Saturday, March 26, 2011
  43. 43. Tinkerpop 43Saturday, March 26, 2011
  44. 44. • Blueprints is the like the JDBC of the graph database community. • Provides a Java-based interface API for the property graph data model. Graph,Vertex, Edge, Index. • Provides implementations of the interfaces for TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully) others such as InfiniteGraph, InfoGrid, Sones, and HyperGraphDB 44Saturday, March 26, 2011
  45. 45. • A dataflow framework with support for Blueprints-based graph processing. • Provides a collection of “pipes” (implement Iterable and Iterator) ✴ Filters: ComparisonFilterPipe, RandomFilterPipe, etc. ✴ Traversal:VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc. ✴ Splitting/Merging: CopySplitPipe, RobinMergePipe, etc. ✴ Logic: OrPipe, AndPipe, etc. 45Saturday, March 26, 2011
  46. 46. • A Turing-complete, graph-based programming language that compiles Gremlin syntax down to Pipes (implements JSR 223). • Builds on top of Groovy • Support various language constructs: :=, foreach, while, repeat, if/else, function and path definitions, etc. An example of “Amazon’s” recommender:    m = [:]    g.v(1).outE(purchased).inV.inE(purchased).outV.groupCount(m);    m.sort{ a,b -> a.value <=> b.value } 46Saturday, March 26, 2011
  47. 47. • Allows Blueprints graphs to be exposed through a RESTful API (HTTP) • Supports stored traversals written in raw Pipes or Gremlin. • Supports adhoc traversals represented in Gremlin. • Provides “helper classes” for performing search-, score-, and rank-based traversal algorithms—in concert, support for recommendation. 47Saturday, March 26, 2011
  48. 48. Sample Stack • HTTP Request arrives • Converts REST to Gremlin • Gremlin “compiles” to Pipes • Pipes makes Blueprints calls • Store provides the data 48Saturday, March 26, 2011
  49. 49. Neo4J • Engine: Graph • License: AGPLv3 • Language: Java • Transactions: ACID • Distributed: HA, Master-Slave Cache Sharding, Domain-Specific • Features: Embeddable, REST, many plugins 49Saturday, March 26, 2011
  50. 50. OrientDB • Engine: Document-Graph • License: Apache 2.0 • Language: Java • Transactions: ACID • Distributed: HA through Replication • Features: Embeddable, REST, SQL-like 50Saturday, March 26, 2011
  51. 51. HypergraphDB • Engine: HyperGraph • License: LGPL • Language: Java • Transactions: ACID • Distributed: P2P distribution and replication • Features: Hyperedges, Java OODB, storage on BerkeleyDB 51Saturday, March 26, 2011
  52. 52. InfiniteGraph • Engine: Graph • License: Commercial • Language: Java • Transactions: ACID • Distributed: Graph Partitioning, Federation on Objectivity • Features: Distributed lock management, scales to Exabytes 52Saturday, March 26, 2011
  53. 53. Where do I go now? Tinkerpop: http://www.tinkerpop.com Neo4J: http://neo4j.org OrientDB: http://www.orientechnologies.com/orient-db.htm InfoGrid: http://infogrid.org InfiniteGraph: http://www.infinitegraph.com Sones: http://developers.sones.de AllegroGraph: http://www.franz.com/agraph/allegrograph HypergraphDB: http://www.kobrix.com/hgdb.jsp 53Saturday, March 26, 2011
  54. 54. Questions? claudio.martella@gmail.com http://blog.acaro.org http://github.com/claudiomartella/ @claudiomartella http://joind.in/2946Saturday, March 26, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×