NoSQL with Graphs

  • 11,947 views
Uploaded on

A talk about graph databases and their usage.

A talk about graph databases and their usage.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
11,947
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
306
Comments
0
Likes
31

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NoSQL with Graphs mining graphs for fun & profit claudio martella NoSQLDay 2011Saturday, March 26, 2011
  • 2. Outline Semantic Web Documents Tinkerpop GraphDBs Tools NoSQL Table Graphs O(1) Why Apps Query RDBMS Recommendation 2Saturday, March 26, 2011
  • 3. Who am I? • PhD in Distributed Graphs @ UniBZ • Analyst @ TIS Innovation Park • Topics: Data / Text Mining with Graphs • Technology: Hadoop, NoSQL, GraphDBs • Writing Graffiti 3Saturday, March 26, 2011
  • 4. Surrounded by graphs • the Web Graph • Semantic Web • Social Networks • Natural Sciences • GIS 4Saturday, March 26, 2011
  • 5. Property Graph • A Graph is composed by Vertices and Edges • Vertices are connected by Edges • An Edge has a Label and Direction • Edges and Vertices have Properties 5Saturday, March 26, 2011
  • 6. GraphDB belongs to NoSQL belongs to belongs to likes Hadoop Graffiti works with name: claudio author Me surname: martella email: claudio.martella@gmail.com works at studies at TIS UniBZ Who am I? 6Saturday, March 26, 2011
  • 7. A graph in RDBMS ID Name Follower Followee 1 Claudio 1 2 2 Cirpo 1 3 3 Okram 1 4 4 Spinoza 2 5 ... ... ... ... 7Saturday, March 26, 2011
  • 8. BTree Index 101 • Lookup costs Log(N) • Where N is the global size of the data structure • Updating the index is Cirpo Claudio Okram Spinoza also not for free 8Saturday, March 26, 2011
  • 9. A lookup (RDBMS) I Name Fr Fe • Look for Claudio’s ID 1 Claudio 1 2 [ Log(N) ] 2 Cirpo 1 3 • Look for K Followees [ Log(N) ] 3 Okram 1 4 • Get their names 4 Spinoza 2 5 [ K*Log(N) ] ... ... ... ... 9Saturday, March 26, 2011
  • 10. A graph in NoSQL ID F1 F2 F3 ... Cirpo ... ... ... ... Claudio Cirpo Okram Spinoza ... Okram ... ... ... ... Spinoza ... ... ... ... ... ... ... ... ... 10Saturday, March 26, 2011
  • 11. A lookup (NoSQL) ID F1 F2 F3 ... Cirpo ... ... ... ... • Look for Claudio’s ID [ Log(N) ] Claudio ... ... ... ... • Look for Followees Okram ... ... ... ... [ O(K) ] Spinoza ... ... ... ... ... ... ... ... ... 11Saturday, March 26, 2011
  • 12. A graph in GraphDB name: Cirpo 2 follows follows 1 3 name: Claudio name: Okram follows 4 name: Spinoza 12Saturday, March 26, 2011
  • 13. A lookup (Graph) name: Cirpo 2 • Look for Claudio’s ID follows follows [ Log(N) ] 1 3 name: Claudio • Look for Followees name: Okram [ O(K) ] follows 4 name: Spinoza 13Saturday, March 26, 2011
  • 14. What about Friends (of Friends)*? 14Saturday, March 26, 2011
  • 15. A benchmark Depth RDBMS Graph • 1 Million Vertices 1 100ms 30ms • 4 Million Edges 2 1000ms 500ms • Scale-Free Topology 3 10000ms 3000ms • Postgres VS Neo4J 4 100000ms 50000ms • Both Hash and BTree 5 N/A 100000ms Ref: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/ 15Saturday, March 26, 2011
  • 16. A benchmark DB # Time RDBMS 1K 2000ms • 50 friends on average Graph 1K 2ms • Look if there’s a path connecting two people Graph 1M 2ms RDBMS 1M N/A Ref: http://www.slideshare.net/thobe/nosqleu-graph-databases-and-neo4j 16Saturday, March 26, 2011
  • 17. A Graph Database allows O(1) access to adjacent Vertices Ref: The Graph Traversal Pattern: Marko A. Rodriguez and Peter Neubauer 17Saturday, March 26, 2011
  • 18. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 18Saturday, March 26, 2011
  • 19. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 19Saturday, March 26, 2011
  • 20. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 20Saturday, March 26, 2011
  • 21. Example: Queries genre Action director Ocean 11 genre Steven Soderbergh director genre Thriller actor Ocean 12 genre director genre actor Ocean 13 genre Crime genre actor Brad Pitt producer The Departed genre actor genre Drama Se7en genre 21Saturday, March 26, 2011
  • 22. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 22Saturday, March 26, 2011
  • 23. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 23Saturday, March 26, 2011
  • 24. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Adventure The Lord of the Graphs tagged likes Trilogy tagged Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 24Saturday, March 26, 2011
  • 25. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Adventure The Lord of the Graphs tagged likes Trilogy tagged Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 25Saturday, March 26, 2011
  • 26. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 26Saturday, March 26, 2011
  • 27. Example: Recommendations tagged Sci-Fi likes Graph Runner tagged Claudio likes tagged Trilogy The Lord of the Graphs tagged likes tagged Adventure Caprazzi likes Javatar likes tagged Geeky likes tagged Cirpo likes PHP I love You tagged Boring 27Saturday, March 26, 2011
  • 28. Example: Recommendations tagged Sci-Fi Graph Runner likes Claudio tagged likes tagged Adventure likes The Lord of the Graphs tagged Cirpo likes tagged Trilogy likes Caprazzi Javatar likes tagged Geeky tagged likes PHP I love You tagged Boring 28Saturday, March 26, 2011
  • 29. Example: Recommendations tagged Sci-Fi likes Graph Runner Claudio likes tagged Trilogy tagged likes The Lord of the Graphs Cirpo tagged likes Adventure tagged likes likes Javatar tagged Caprazzi likes tagged Geeky PHP I love You tagged Boring 29Saturday, March 26, 2011
  • 30. Example: Recommendations tagged Sci-Fi likes Graph Runner Claudio likes tagged Trilogy tagged likes The Lord of the Graphs Cirpo tagged likes Adventure tagged likes likes Javatar tagged Caprazzi likes tagged Geeky PHP I love You tagged Boring 30Saturday, March 26, 2011
  • 31. Graph Mining How are they connected? Ref: Programming the Semantic Web - O’Reilly 31Saturday, March 26, 2011
  • 32. Graph Mining Ref: Programming the Semantic Web - O’Reilly 32Saturday, March 26, 2011
  • 33. Graph Mining 33Saturday, March 26, 2011
  • 34. Other Applications • Community Analysis • Fraud Detection • Planning • Text Processing • Reasoning 34Saturday, March 26, 2011
  • 35. as you can’t get rid of logicians 35Saturday, March 26, 2011
  • 36. there’s an SQL also for Graphs 36Saturday, March 26, 2011
  • 37. Triplestores Scientology advocate Katie Holmes married Hollywood lives Tom Cruise born actor July 3, 1962 Top Gun 37Saturday, March 26, 2011
  • 38. Triplestores Subject Predicate Object Tom Cruise actor Top Gun Tom Cruise married Katie Holmes Tom Cruise advocate Scientology Tom Cruise lives Hollywood Tom Cruise born July 3, 1962 38Saturday, March 26, 2011
  • 39. SPARQL PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#> SELECT ?name ?marriedOn FROM <http://www.daml.org/2001/01/gedcom/royal92.daml> WHERE { ?royal ged:title "Princess". ?royal ged:name ?name. ?royal ged:spouseIn ?family. ?family ged:marriage ?marriage. ?marriage ged:date ?marriedOn. } ORDER BY ASC [?name] 39Saturday, March 26, 2011
  • 40. what if Internet was your GraphDB? 40Saturday, March 26, 2011
  • 41. 41Saturday, March 26, 2011
  • 42. what about a NoSPARQL? 42Saturday, March 26, 2011
  • 43. Tinkerpop 43Saturday, March 26, 2011
  • 44. • Blueprints is the like the JDBC of the graph database community. • Provides a Java-based interface API for the property graph data model. Graph,Vertex, Edge, Index. • Provides implementations of the interfaces for TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon (hopefully) others such as InfiniteGraph, InfoGrid, Sones, and HyperGraphDB 44Saturday, March 26, 2011
  • 45. • A dataflow framework with support for Blueprints-based graph processing. • Provides a collection of “pipes” (implement Iterable and Iterator) ✴ Filters: ComparisonFilterPipe, RandomFilterPipe, etc. ✴ Traversal:VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc. ✴ Splitting/Merging: CopySplitPipe, RobinMergePipe, etc. ✴ Logic: OrPipe, AndPipe, etc. 45Saturday, March 26, 2011
  • 46. • A Turing-complete, graph-based programming language that compiles Gremlin syntax down to Pipes (implements JSR 223). • Builds on top of Groovy • Support various language constructs: :=, foreach, while, repeat, if/else, function and path definitions, etc. An example of “Amazon’s” recommender:    m = [:]    g.v(1).outE(purchased).inV.inE(purchased).outV.groupCount(m);    m.sort{ a,b -> a.value <=> b.value } 46Saturday, March 26, 2011
  • 47. • Allows Blueprints graphs to be exposed through a RESTful API (HTTP) • Supports stored traversals written in raw Pipes or Gremlin. • Supports adhoc traversals represented in Gremlin. • Provides “helper classes” for performing search-, score-, and rank-based traversal algorithms—in concert, support for recommendation. 47Saturday, March 26, 2011
  • 48. Sample Stack • HTTP Request arrives • Converts REST to Gremlin • Gremlin “compiles” to Pipes • Pipes makes Blueprints calls • Store provides the data 48Saturday, March 26, 2011
  • 49. Neo4J • Engine: Graph • License: AGPLv3 • Language: Java • Transactions: ACID • Distributed: HA, Master-Slave Cache Sharding, Domain-Specific • Features: Embeddable, REST, many plugins 49Saturday, March 26, 2011
  • 50. OrientDB • Engine: Document-Graph • License: Apache 2.0 • Language: Java • Transactions: ACID • Distributed: HA through Replication • Features: Embeddable, REST, SQL-like 50Saturday, March 26, 2011
  • 51. HypergraphDB • Engine: HyperGraph • License: LGPL • Language: Java • Transactions: ACID • Distributed: P2P distribution and replication • Features: Hyperedges, Java OODB, storage on BerkeleyDB 51Saturday, March 26, 2011
  • 52. InfiniteGraph • Engine: Graph • License: Commercial • Language: Java • Transactions: ACID • Distributed: Graph Partitioning, Federation on Objectivity • Features: Distributed lock management, scales to Exabytes 52Saturday, March 26, 2011
  • 53. Where do I go now? Tinkerpop: http://www.tinkerpop.com Neo4J: http://neo4j.org OrientDB: http://www.orientechnologies.com/orient-db.htm InfoGrid: http://infogrid.org InfiniteGraph: http://www.infinitegraph.com Sones: http://developers.sones.de AllegroGraph: http://www.franz.com/agraph/allegrograph HypergraphDB: http://www.kobrix.com/hgdb.jsp 53Saturday, March 26, 2011
  • 54. Questions? claudio.martella@gmail.com http://blog.acaro.org http://github.com/claudiomartella/ @claudiomartella http://joind.in/2946Saturday, March 26, 2011