3. Who am I?
• PhD in Distributed Graphs @ UniBZ
• Analyst @ TIS Innovation Park
• Topics: Data / Text Mining with Graphs
• Technology: Hadoop, NoSQL, GraphDBs
• Writing Graffiti
3
Saturday, March 26, 2011
4. Surrounded by graphs
• the Web Graph
• Semantic Web
• Social Networks
• Natural Sciences
• GIS
4
Saturday, March 26, 2011
5. Property Graph
• A Graph is composed by Vertices and Edges
• Vertices are connected by Edges
• An Edge has a Label and Direction
• Edges and Vertices have Properties
5
Saturday, March 26, 2011
6. GraphDB
belongs to
NoSQL
belongs to belongs to
likes Hadoop
Graffiti
works with
name: claudio
author Me surname: martella
email: claudio.martella@gmail.com
works at
studies at
TIS
UniBZ
Who am I?
6
Saturday, March 26, 2011
7. A graph in RDBMS
ID Name Follower Followee
1 Claudio 1 2
2 Cirpo 1 3
3 Okram 1 4
4 Spinoza 2 5
... ... ... ...
7
Saturday, March 26, 2011
8. BTree Index 101
• Lookup costs Log(N)
• Where N is the global
size of the data
structure
• Updating the index is Cirpo Claudio Okram Spinoza
also not for free
8
Saturday, March 26, 2011
9. A lookup (RDBMS)
I Name Fr Fe
• Look for Claudio’s ID
1 Claudio 1 2
[ Log(N) ]
2 Cirpo 1 3
• Look for K Followees
[ Log(N) ]
3 Okram 1 4
• Get their names
4 Spinoza 2 5
[ K*Log(N) ]
... ... ... ...
9
Saturday, March 26, 2011
10. A graph in NoSQL
ID F1 F2 F3 ...
Cirpo ... ... ... ...
Claudio Cirpo Okram Spinoza ...
Okram ... ... ... ...
Spinoza ... ... ... ...
... ... ... ... ...
10
Saturday, March 26, 2011
11. A lookup (NoSQL)
ID F1 F2 F3 ...
Cirpo ... ... ... ...
• Look for Claudio’s ID
[ Log(N) ] Claudio ... ... ... ...
• Look for Followees Okram ... ... ... ...
[ O(K) ]
Spinoza ... ... ... ...
... ... ... ... ...
11
Saturday, March 26, 2011
12. A graph in GraphDB
name: Cirpo
2
follows follows
1 3
name: Claudio
name: Okram
follows
4
name: Spinoza
12
Saturday, March 26, 2011
13. A lookup (Graph)
name: Cirpo
2
• Look for Claudio’s ID follows follows
[ Log(N) ]
1 3
name: Claudio
• Look for Followees name: Okram
[ O(K) ] follows
4
name: Spinoza
13
Saturday, March 26, 2011
15. A benchmark
Depth RDBMS Graph
• 1 Million Vertices
1 100ms 30ms
• 4 Million Edges
2 1000ms 500ms
• Scale-Free Topology
3 10000ms 3000ms
• Postgres VS Neo4J
4 100000ms 50000ms
• Both Hash and BTree
5 N/A 100000ms
Ref: http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/
15
Saturday, March 26, 2011
16. A benchmark
DB # Time
RDBMS 1K 2000ms
• 50 friends on average
Graph 1K 2ms
• Look if there’s a path
connecting two people
Graph 1M 2ms
RDBMS 1M N/A
Ref: http://www.slideshare.net/thobe/nosqleu-graph-databases-and-neo4j
16
Saturday, March 26, 2011
17. A Graph Database
allows O(1) access to
adjacent Vertices
Ref: The Graph Traversal Pattern: Marko A. Rodriguez and Peter Neubauer
17
Saturday, March 26, 2011
18. Example: Queries
genre Action
director Ocean 11
genre
Steven Soderbergh director genre
Thriller
actor Ocean 12 genre
director genre
actor Ocean 13 genre
Crime
genre
actor
Brad Pitt
producer
The Departed
genre
actor
genre Drama
Se7en
genre
18
Saturday, March 26, 2011
19. Example: Queries
genre Action
director Ocean 11
genre
Steven Soderbergh director genre
Thriller
actor Ocean 12 genre
director genre
actor Ocean 13 genre
Crime
genre
actor
Brad Pitt
producer
The Departed
genre
actor
genre Drama
Se7en
genre
19
Saturday, March 26, 2011
20. Example: Queries
genre Action
director Ocean 11
genre
Steven Soderbergh director genre
Thriller
actor Ocean 12 genre
director genre
actor Ocean 13 genre
Crime
genre
actor
Brad Pitt
producer
The Departed
genre
actor
genre Drama
Se7en
genre
20
Saturday, March 26, 2011
21. Example: Queries
genre Action
director Ocean 11
genre
Steven Soderbergh director genre
Thriller
actor Ocean 12 genre
director genre
actor Ocean 13 genre
Crime
genre
actor
Brad Pitt
producer
The Departed
genre
actor
genre Drama
Se7en
genre
21
Saturday, March 26, 2011
22. Example: Recommendations
tagged Sci-Fi
likes Graph Runner tagged
Claudio
likes
tagged Trilogy
The Lord of the Graphs tagged
likes
tagged Adventure
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You
tagged
Boring
22
Saturday, March 26, 2011
23. Example: Recommendations
tagged Sci-Fi
likes Graph Runner tagged
Claudio
likes
tagged Trilogy
The Lord of the Graphs tagged
likes
tagged Adventure
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You
tagged
Boring
23
Saturday, March 26, 2011
24. Example: Recommendations
tagged Sci-Fi
likes Graph Runner
tagged
Claudio
likes tagged Adventure
The Lord of the Graphs tagged
likes Trilogy
tagged
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You tagged
Boring
24
Saturday, March 26, 2011
25. Example: Recommendations
tagged Sci-Fi
likes Graph Runner
tagged
Claudio
likes tagged Adventure
The Lord of the Graphs tagged
likes Trilogy
tagged
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You tagged
Boring
25
Saturday, March 26, 2011
26. Example: Recommendations
tagged Sci-Fi
likes Graph Runner tagged
Claudio
likes
tagged Trilogy
The Lord of the Graphs tagged
likes
tagged Adventure
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You
tagged
Boring
26
Saturday, March 26, 2011
27. Example: Recommendations
tagged Sci-Fi
likes Graph Runner tagged
Claudio
likes
tagged Trilogy
The Lord of the Graphs tagged
likes
tagged Adventure
Caprazzi likes
Javatar
likes
tagged
Geeky
likes tagged
Cirpo
likes PHP I love You
tagged
Boring
27
Saturday, March 26, 2011
28. Example: Recommendations
tagged Sci-Fi
Graph Runner
likes
Claudio tagged
likes
tagged Adventure
likes The Lord of the Graphs tagged
Cirpo
likes
tagged
Trilogy
likes
Caprazzi Javatar
likes tagged
Geeky
tagged
likes PHP I love You
tagged
Boring
28
Saturday, March 26, 2011
29. Example: Recommendations
tagged Sci-Fi
likes Graph Runner
Claudio
likes tagged Trilogy
tagged
likes The Lord of the Graphs
Cirpo tagged
likes Adventure
tagged
likes
likes Javatar tagged
Caprazzi
likes tagged Geeky
PHP I love You
tagged
Boring
29
Saturday, March 26, 2011
30. Example: Recommendations
tagged Sci-Fi
likes Graph Runner
Claudio
likes tagged Trilogy
tagged
likes The Lord of the Graphs
Cirpo tagged
likes Adventure
tagged
likes
likes Javatar tagged
Caprazzi
likes tagged Geeky
PHP I love You
tagged
Boring
30
Saturday, March 26, 2011
31. Graph Mining
How are they connected?
Ref: Programming the Semantic Web - O’Reilly
31
Saturday, March 26, 2011
32. Graph Mining
Ref: Programming the Semantic Web - O’Reilly
32
Saturday, March 26, 2011
34. Other Applications
• Community Analysis
• Fraud Detection
• Planning
• Text Processing
• Reasoning
34
Saturday, March 26, 2011
35. as you can’t get rid of logicians
35
Saturday, March 26, 2011
36. there’s an SQL also for Graphs
36
Saturday, March 26, 2011
37. Triplestores
Scientology
advocate Katie Holmes
married
Hollywood
lives Tom Cruise
born
actor
July 3, 1962
Top Gun
37
Saturday, March 26, 2011
38. Triplestores
Subject Predicate Object
Tom Cruise actor Top Gun
Tom Cruise married Katie Holmes
Tom Cruise advocate Scientology
Tom Cruise lives Hollywood
Tom Cruise born July 3, 1962
38
Saturday, March 26, 2011
39. SPARQL
PREFIX ged: <http://www.daml.org/2001/01/gedcom/gedcom#>
SELECT ?name ?marriedOn
FROM <http://www.daml.org/2001/01/gedcom/royal92.daml>
WHERE
{
?royal ged:title "Princess".
?royal ged:name ?name.
?royal ged:spouseIn ?family.
?family ged:marriage ?marriage.
?marriage ged:date ?marriedOn.
}
ORDER BY ASC [?name]
39
Saturday, March 26, 2011
44. • Blueprints is the like the JDBC of the graph database
community.
• Provides a Java-based interface API for the property graph
data model. Graph,Vertex, Edge, Index.
• Provides implementations of the interfaces for TinkerGraph,
Neo4j, OrientDB, Sails (e.g. AllegroSail, Neo4jSail), and soon
(hopefully) others such as InfiniteGraph, InfoGrid, Sones, and
HyperGraphDB
44
Saturday, March 26, 2011
45. • A dataflow framework with support for Blueprints-based
graph processing.
• Provides a collection of “pipes” (implement Iterable and
Iterator)
✴ Filters: ComparisonFilterPipe, RandomFilterPipe, etc.
✴ Traversal:VertexEdgePipe, EdgeVertexPipe, PropertyPipe,
etc.
✴ Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.
✴ Logic: OrPipe, AndPipe, etc.
45
Saturday, March 26, 2011
46. • A Turing-complete, graph-based programming language that
compiles Gremlin syntax down to Pipes (implements JSR
223).
• Builds on top of Groovy
• Support various language constructs: :=, foreach, while,
repeat, if/else, function and path definitions, etc.
An example of “Amazon’s” recommender:
m = [:]
g.v(1).outE('purchased').inV.inE('purchased').outV.groupCount(m);
m.sort{ a,b -> a.value <=> b.value }
46
Saturday, March 26, 2011
47. • Allows Blueprints graphs to be exposed through a RESTful
API (HTTP)
• Supports stored traversals written in raw Pipes or Gremlin.
• Supports adhoc traversals represented in Gremlin.
• Provides “helper classes” for performing search-, score-, and
rank-based traversal algorithms—in concert, support for
recommendation.
47
Saturday, March 26, 2011
48. Sample Stack
• HTTP Request arrives
• Converts REST to
Gremlin
• Gremlin “compiles” to
Pipes
• Pipes makes Blueprints
calls
• Store provides the data
48
Saturday, March 26, 2011