Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
3. About
about me
Max Neunhöffer (@neunhoef) working for ArangoDB
about the talk
different kinds of graph algorithms
Pregel example
Pregel mind set
ArangoDB implementation
5. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
6. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
⇒ Often require a global view on the graph
7. Pregel
A framework to query distributed, directed graphs.
Known as “Map-Reduce” for graphs
Uses same phases
Has several iterations
Aims at:
Operate all servers at full capacity
Reduce network traffic
Good at calculations touching all vertices
Bad at calculations touching a very small number of vertices
23. Worker ˆ= Map
“Map” a user-defined algorithm over all vertices
Output: set of messages to other vertices
Available parameters:
The current vertex and its outbound edges
All incoming messages
Global values
Allow modifications on the vertex:
Attach a result to this vertex and its outgoing edges
Delete the vertex and its outgoing edges
Deactivate the vertex
24. Combine ˆ= Reduce
“Reduce” all generated messages
Output: An aggregated message for each vertex.
Executed on sender as well as receiver.
Available parameters:
One new message for a vertex
The stored aggregate for this vertex
Typical combiners are SUM, MIN or MAX
Reduces network traffic
25. Activity ˆ= Termination
Execute several rounds of Map/Reduce
Count active vertices and messages
Start next round if one of the following is true:
At least one vertex is active
At least one message is sent
Terminate if neither a vertex is active nor messages were sent
Store all non-deleted vertices and edges as resulting graph
26. The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
27. The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
28. The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
Important:
is able to compete with specialised products on their turf
29. The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
Important:
is able to compete with specialised products on their turf
allows for polyglot persistence using a single database
31. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
32. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
33. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
34. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
35. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
36. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
37. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
38. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
39. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
40. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
and enjoys good community as well as professional support.
41. Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
42. Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
43. Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
schema-validation
44. Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
schema-validation
push feeds, etc.
45. Pregel in ArangoDB
Started as a side project in free hack time
Experimental on operational database
Implemented as an alternative to traversals
Make use of the flexibility of JavaScript:
No strict type system
No pre-compilation, on-the-fly queries
Native JSON documents
Really fast development