SlideShare a Scribd company logo
1 of 51
Download to read offline
Processing large-scale graphs
with GoogleTMPregel
Max Neunhöffer
Big data technology and applications, 25 March 2015
www.arangodb.com
About
about me
Max Neunhöffer (@neunhoef) working for ArangoDB
About
about me
Max Neunhöffer (@neunhoef) working for ArangoDB
about the talk
different kinds of graph algorithms
Pregel example
Pregel mind set
ArangoDB implementation
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
⇒ Touch all vertices and their neighbourhoods
Traversals
Define a specific start point
Iteratively explore the graph
⇒ History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
⇒ Often require a global view on the graph
Pregel
A framework to query distributed, directed graphs.
Known as “Map-Reduce” for graphs
Uses same phases
Has several iterations
Aims at:
Operate all servers at full capacity
Reduce network traffic
Good at calculations touching all vertices
Bad at calculations touching a very small number of vertices
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
4
5
5
6
6
7
7
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
4
5
5
6
6
7
7
2
3
4
4
5
6
7
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
4
5
5
6
6
7
7
2
3
4
4
5
6
7
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
4
5
5
6
5
7
6
1
2
2
3
5
5
6
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
2
3
3 4
4
5
5
6
5
7
6
1
2
2
3
5
5
6
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
2 4
2
5
5
6
5
7
5
1
1
2
2
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
2 4
2
5
5
6
5
7
5
1
1
2
2
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
1
5
5
6
5
7
5
1
1
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
1
5
5
6
5
7
5
1
1
Example – Connected Components
active inactive
3 forward message 2 backward message
1
1
2
1
3
1 4
1
5
5
6
5
7
5
Pregel – Sequence
Pregel – Sequence
Pregel – Sequence
Pregel – Sequence
Pregel – Sequence
Worker ˆ= Map
“Map” a user-defined algorithm over all vertices
Output: set of messages to other vertices
Available parameters:
The current vertex and its outbound edges
All incoming messages
Global values
Allow modifications on the vertex:
Attach a result to this vertex and its outgoing edges
Delete the vertex and its outgoing edges
Deactivate the vertex
Combine ˆ= Reduce
“Reduce” all generated messages
Output: An aggregated message for each vertex.
Executed on sender as well as receiver.
Available parameters:
One new message for a vertex
The stored aggregate for this vertex
Typical combiners are SUM, MIN or MAX
Reduces network traffic
Activity ˆ= Termination
Execute several rounds of Map/Reduce
Count active vertices and messages
Start next round if one of the following is true:
At least one vertex is active
At least one message is sent
Terminate if neither a vertex is active nor messages were sent
Store all non-deleted vertices and edges as resulting graph
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
Important:
is able to compete with specialised products on their turf
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store,
with a common query language for all three data models.
Important:
is able to compete with specialised products on their turf
allows for polyglot persistence using a single database
is a multi-model database (document store & graph database),
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
and enjoys good community as well as professional support.
Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
schema-validation
Extensible through JavaScript
The Foxx Microservice Framework
Allows you to extend the HTTP/REST API by your own
routes, which you implement in JavaScript running on the
database server, with direct access to the C++ DB engine.
Unprecedented possibilities for data centric services:
custom-made complex queries or authorizations
schema-validation
push feeds, etc.
Pregel in ArangoDB
Started as a side project in free hack time
Experimental on operational database
Implemented as an alternative to traversals
Make use of the flexibility of JavaScript:
No strict type system
No pre-compilation, on-the-fly queries
Native JSON documents
Really fast development
Cluster structure of ArangoDB
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
Pagerank for Giraph
1 public class SimplePageRankComputation extends BasicComputation <
LongWritable , DoubleWritable , FloatWritable , DoubleWritable >
{
2 public static final int MAX_SUPERSTEPS = 30;
3
4 @Override
5 public void compute(Vertex <LongWritable , DoubleWritable ,
FloatWritable > vertex , Iterable <DoubleWritable > messages)
throws IOException {
6 if (getSuperstep () >= 1) {
7 double sum = 0;
8 for (DoubleWritable message : messages) {
9 sum += message.get();
10 }
11 DoubleWritable vertexValue = new DoubleWritable ((0.15f /
getTotalNumVertices ()) + 0.85f * sum);
12 vertex.setValue(vertexValue);
13 }
14 if (getSuperstep () < MAX_SUPERSTEPS) {
15 long edges = vertex.getNumEdges ();
16 sendMessageToAllEdges(vertex , new DoubleWritable(vertex.
getValue ().get() / edges));
17 } else {
18 vertex.voteToHalt ();
19 }
20 }
21
22 public static class SimplePageRankWorkerContext extends
WorkerContext {
23 @Override
24 public void preApplication () throws InstantiationException ,
IllegalAccessException { }
25 @Override
26 public void postApplication () { }
27 @Override
28 public void preSuperstep () { }
29 @Override
30 public void postSuperstep () { }
31 }
32
33 public static class SimplePageRankMasterCompute extends
DefaultMasterCompute {
34 @Override
35 public void initialize () throws InstantiationException ,
IllegalAccessException {
36 }
37 }
38 public static class SimplePageRankVertexReader extends
GeneratedVertexReader <LongWritable , DoubleWritable ,
FloatWritable > {
39 @Override
40 public boolean nextVertex () {
41 return totalRecords > recordsRead;
42 }
44 @Override
45 public Vertex <LongWritable , DoubleWritable , FloatWritable >
getCurrentVertex () throws IOException {
46 Vertex <LongWritable , DoubleWritable , FloatWritable > vertex
= getConf ().createVertex ();
47 LongWritable vertexId = new LongWritable(
48 (inputSplit.getSplitIndex () * totalRecords) +
recordsRead);
49 DoubleWritable vertexValue = new DoubleWritable(vertexId.
get() * 10d);
50 long targetVertexId = (vertexId.get() + 1) % (inputSplit.
getNumSplits () * totalRecords);
51 float edgeValue = vertexId.get() * 100f;
52 List <Edge <LongWritable , FloatWritable >> edges = Lists.
newLinkedList ();
53 edges.add(EdgeFactory.create(new LongWritable(
targetVertexId), new FloatWritable(edgeValue)));
54 vertex.initialize(vertexId , vertexValue , edges);
55 ++ recordsRead;
56 return vertex;
57 }
58 }
59
60 public static class SimplePageRankVertexInputFormat extends
GeneratedVertexInputFormat <LongWritable , DoubleWritable ,
FloatWritable > {
61 @Override
62 public VertexReader <LongWritable , DoubleWritable ,
FloatWritable > createVertexReader(InputSplit split ,
TaskAttemptContext context)
63 throws IOException {
64 return new SimplePageRankVertexReader ();
65 }
66 }
67
68 public static class SimplePageRankVertexOutputFormat extends
TextVertexOutputFormat <LongWritable , DoubleWritable ,
FloatWritable > {
69 @Override
70 public TextVertexWriter createVertexWriter(
TaskAttemptContext context) throws IOException ,
InterruptedException {
71 return new SimplePageRankVertexWriter ();
72 }
73
74 public class SimplePageRankVertexWriter extends
TextVertexWriter {
75 @Override
76 public void writeVertex( Vertex <LongWritable ,
DoubleWritable , FloatWritable > vertex) throws
IOException , InterruptedException {
77 getRecordWriter ().write( new Text(vertex.getId().
toString ()), new Text(vertex.getValue ().toString ()))
;
78 }
79 }
80 }
81 }
Pagerank for TinkerPop3
1 public class PageRankVertexProgram implements VertexProgram <
Double > {
2 private MessageType.Local messageType = MessageType.Local.of
(() -> GraphTraversal.<Vertex >of().outE());
3 public static final String PAGE_RANK = Graph.Key.hide("gremlin
.pageRank");
4 public static final String EDGE_COUNT = Graph.Key.hide("
gremlin.edgeCount");
5 private static final String VERTEX_COUNT = "gremlin.
pageRankVertexProgram.vertexCount";
6 private static final String ALPHA = "gremlin.
pageRankVertexProgram.alpha";
7 private static final String TOTAL_ITERATIONS = "gremlin.
pageRankVertexProgram.totalIterations";
8 private static final String INCIDENT_TRAVERSAL = "gremlin.
pageRankVertexProgram.incidentTraversal";
9 private double vertexCountAsDouble = 1;
10 private double alpha = 0.85d;
11 private int totalIterations = 30;
12 private static final Set <String > COMPUTE_KEYS = new HashSet <>(
Arrays.asList(PAGE_RANK , EDGE_COUNT));
13
14 private PageRankVertexProgram () {}
15
16 @Override
17 public void loadState(final Configuration configuration) {
18 this.vertexCountAsDouble = configuration.getDouble(
VERTEX_COUNT , 1.0d);
19 this.alpha = configuration.getDouble(ALPHA , 0.85d);
20 this.totalIterations = configuration.getInt(
TOTAL_ITERATIONS , 30);
21 try {
22 if (configuration.containsKey(INCIDENT_TRAVERSAL)) {
23 final SSupplier <Traversal > traversalSupplier =
VertexProgramHelper.deserialize(configuration ,
INCIDENT_TRAVERSAL);
24 VertexProgramHelper.verifyReversibility(
traversalSupplier.get());
25 this.messageType = MessageType.Local.of(( SSupplier)
traversalSupplier);
26 }
27 } catch (final Exception e) {
28 throw new IllegalStateException(e.getMessage (), e);
29 }
30 }
32 @Override
33 public void storeState(final Configuration configuration) {
34 configuration.setProperty(GraphComputer.VERTEX_PROGRAM ,
PageRankVertexProgram.class.getName ());
35 configuration.setProperty(VERTEX_COUNT , this.
vertexCountAsDouble);
36 configuration.setProperty(ALPHA , this.alpha);
37 configuration.setProperty(TOTAL_ITERATIONS , this.
totalIterations);
38 try {
39 VertexProgramHelper.serialize(this.messageType.
getIncidentTraversal (), configuration ,
INCIDENT_TRAVERSAL);
40 } catch (final Exception e) {
41 throw new IllegalStateException(e.getMessage (), e);
42 }
43 }
44
45 @Override
46 public Set <String > getElementComputeKeys () {
47 return COMPUTE_KEYS;
48 }
49
50 @Override
51 public void setup(final Memory memory) {
52
53 }
54
55 @Override
56 public void execute(final Vertex vertex , Messenger <Double >
messenger , final Memory memory) {
57 if (memory.isInitialIteration ()) {
58 double initialPageRank = 1.0d / this.vertexCountAsDouble
;
59 double edgeCount = Double.valueOf ((Long) this.
messageType.edges(vertex).count().next());
60 vertex.singleProperty(PAGE_RANK , initialPageRank);
61 vertex.singleProperty(EDGE_COUNT , edgeCount);
62 messenger.sendMessage(this.messageType , initialPageRank
/ edgeCount);
63 } else {
64 double newPageRank = StreamFactory.stream(messenger.
receiveMessages(this.messageType)).reduce (0.0d, (a,
b) -> a + b);
65 newPageRank = (this.alpha * newPageRank) + ((1.0d - this
.alpha) / this.vertexCountAsDouble);
66 vertex.singleProperty(PAGE_RANK , newPageRank);
67 messenger.sendMessage(this.messageType , newPageRank /
vertex.<Double >property(EDGE_COUNT).orElse (0.0d));
68 }
69 }
70
71 @Override
72 public boolean terminate(final Memory memory) {
73 return memory.getIteration () >= this.totalIterations;
74 }
75 }
Pagerank for ArangoDB
1 var pageRank = function (vertex , message , global) {
2 var total , rank , edgeCount , send , edge , alpha , sum;
3 total = global.vertexCount;
4 edgeCount = vertex._outEdges.length;
5 alpha = global.alpha;
6 sum = 0;
7 if (global.step > 0) {
8 while (message.hasNext ()) {
9 sum += message.next().data;
10 }
11 rank = alpha * sum + (1-alpha) / total;
12 } else {
13 rank = 1 / total;
14 }
15 vertex._setResult(rank);
16 if (global.step < global.MAX_STEPS) {
17 send = rank / edgeCount;
18 while (vertex._outEdges.hasNext ()) {
19 edge = vertex._outEdges.next();
20 message.sendTo(edge._getTarget (), send);
21 }
22 } else {
23 vertex._deactivate ();
24 }
25 };
26
27 var combiner = function (message , oldMessage) {
28 return message + oldMessage;
29 };
30
31 var Runner = require ("org/arangodb/pregelRunner ").Runner;
32 var runner = new Runner ();
33 runner.setWorker(pageRank);
34 runner.setCombiner(combiner);
35 runner.setGlobal (" alpha", 0.85);
36 runner.setGlobal (" vertexCount", db.vertices.count ());
37 runner.start (" myGraph ");
Pregel-type problems
page rank
single-source shortest paths (all)
maximal bipartite matching (randomized)
semi-clustering
connected components
distributed minimum spanning forest
graph coloring
Thanks
Twitter: @arangodb @neunhoef
Github: ArangoDB/ArangoDB
Google Group: arangodb
IRC: arangodb
https://www.arangodb.com

More Related Content

What's hot

guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDBMax Neunhöffer
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSMax Neunhöffer
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
 
The CIOs Guide to NoSQL
The CIOs Guide to NoSQLThe CIOs Guide to NoSQL
The CIOs Guide to NoSQLDATAVERSITY
 
Creating data centric microservices
Creating data centric microservicesCreating data centric microservices
Creating data centric microservicesArangoDB Database
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql DatabasePrashant Gupta
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
 

What's hot (20)

guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model Database
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
 
CouchDB
CouchDBCouchDB
CouchDB
 
The CIOs Guide to NoSQL
The CIOs Guide to NoSQLThe CIOs Guide to NoSQL
The CIOs Guide to NoSQL
 
Arango DB
Arango DBArango DB
Arango DB
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Creating data centric microservices
Creating data centric microservicesCreating data centric microservices
Creating data centric microservices
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
Couch db
Couch dbCouch db
Couch db
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowHands-On: Managing Slowly Changing Dimensions Using TD Workflow
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 

Viewers also liked

ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosHelder Santana
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
 
CAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertCAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertArangoDB Database
 
Introduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLIntroduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLArangoDB Database
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 ArangoDB Database
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseMax Neunhöffer
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them forMichael Hackstein
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)ArangoDB Database
 
Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)ArangoDB Database
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQLArangoDB Database
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB Database
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQLArangoDB Database
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...Big Data Spain
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Max Neunhöffer
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB Database
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSConArangoDB Database
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1ArangoDB Database
 
Domain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLVDomain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLVArangoDB Database
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSMax Neunhöffer
 

Viewers also liked (20)

ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
 
CAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertCAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin Schönert
 
Introduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLIntroduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQL
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them for
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)
 
Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
ArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQLArangoDB – A different approach to NoSQL
ArangoDB – A different approach to NoSQL
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the database
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSCon
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Wir sind aber nicht Twitter
Wir sind aber nicht TwitterWir sind aber nicht Twitter
Wir sind aber nicht Twitter
 
Domain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLVDomain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLV
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 

Similar to Processing large-scale graphs with Google Pregel

Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBArangoDB Database
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamDataWorks Summit
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsFlurry, Inc.
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 

Similar to Processing large-scale graphs with Google Pregel (20)

Deep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDBDeep dive into the native multi model database ArangoDB
Deep dive into the native multi model database ArangoDB
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Realizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache BeamRealizing the Promise of Portable Data Processing with Apache Beam
Realizing the Promise of Portable Data Processing with Apache Beam
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Advanced JavaScript
Advanced JavaScriptAdvanced JavaScript
Advanced JavaScript
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 

Recently uploaded

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Processing large-scale graphs with Google Pregel

  • 1. Processing large-scale graphs with GoogleTMPregel Max Neunhöffer Big data technology and applications, 25 March 2015 www.arangodb.com
  • 2. About about me Max Neunhöffer (@neunhoef) working for ArangoDB
  • 3. About about me Max Neunhöffer (@neunhoef) working for ArangoDB about the talk different kinds of graph algorithms Pregel example Pregel mind set ArangoDB implementation
  • 4. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods
  • 5. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods Traversals Define a specific start point Iteratively explore the graph ⇒ History of steps is known
  • 6. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ⇒ Touch all vertices and their neighbourhoods Traversals Define a specific start point Iteratively explore the graph ⇒ History of steps is known Global measurements Compute one value for the graph, based on all it’s vertices or edges Compute one value for each vertex or edge ⇒ Often require a global view on the graph
  • 7. Pregel A framework to query distributed, directed graphs. Known as “Map-Reduce” for graphs Uses same phases Has several iterations Aims at: Operate all servers at full capacity Reduce network traffic Good at calculations touching all vertices Bad at calculations touching a very small number of vertices
  • 8. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7
  • 9. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2 3 4 4 5 6 7
  • 10. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2 3 4 4 5 6 7
  • 11. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 5 7 6 1 2 2 3 5 5 6
  • 12. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 2 3 3 4 4 5 5 6 5 7 6 1 2 2 3 5 5 6
  • 13. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 2 4 2 5 5 6 5 7 5 1 1 2 2
  • 14. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 2 4 2 5 5 6 5 7 5 1 1 2 2
  • 15. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5 1 1
  • 16. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5 1 1
  • 17. Example – Connected Components active inactive 3 forward message 2 backward message 1 1 2 1 3 1 4 1 5 5 6 5 7 5
  • 23. Worker ˆ= Map “Map” a user-defined algorithm over all vertices Output: set of messages to other vertices Available parameters: The current vertex and its outbound edges All incoming messages Global values Allow modifications on the vertex: Attach a result to this vertex and its outgoing edges Delete the vertex and its outgoing edges Deactivate the vertex
  • 24. Combine ˆ= Reduce “Reduce” all generated messages Output: An aggregated message for each vertex. Executed on sender as well as receiver. Available parameters: One new message for a vertex The stored aggregate for this vertex Typical combiners are SUM, MIN or MAX Reduces network traffic
  • 25. Activity ˆ= Termination Execute several rounds of Map/Reduce Count active vertices and messages Start next round if one of the following is true: At least one vertex is active At least one message is sent Terminate if neither a vertex is active nor messages were sent Store all non-deleted vertices and edges as resulting graph
  • 26. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store,
  • 27. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models.
  • 28. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models. Important: is able to compete with specialised products on their turf
  • 29. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store, with a common query language for all three data models. Important: is able to compete with specialised products on their turf allows for polyglot persistence using a single database
  • 30. is a multi-model database (document store & graph database),
  • 31. is a multi-model database (document store & graph database), is open source and free (Apache 2 license),
  • 32. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL),
  • 33. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections,
  • 34. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions
  • 35. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection,
  • 36. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server),
  • 37. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JS code in the Foxx Microservice Framework,
  • 38. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JS code in the Foxx Microservice Framework, offers many drivers for a wide range of languages,
  • 39. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JS code in the Foxx Microservice Framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation,
  • 40. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, configurable consistency guarantees using transactions memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JS code in the Foxx Microservice Framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, and enjoys good community as well as professional support.
  • 41. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine.
  • 42. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: custom-made complex queries or authorizations
  • 43. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: custom-made complex queries or authorizations schema-validation
  • 44. Extensible through JavaScript The Foxx Microservice Framework Allows you to extend the HTTP/REST API by your own routes, which you implement in JavaScript running on the database server, with direct access to the C++ DB engine. Unprecedented possibilities for data centric services: custom-made complex queries or authorizations schema-validation push feeds, etc.
  • 45. Pregel in ArangoDB Started as a side project in free hack time Experimental on operational database Implemented as an alternative to traversals Make use of the flexibility of JavaScript: No strict type system No pre-compilation, on-the-fly queries Native JSON documents Really fast development
  • 46. Cluster structure of ArangoDB Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11
  • 47. Pagerank for Giraph 1 public class SimplePageRankComputation extends BasicComputation < LongWritable , DoubleWritable , FloatWritable , DoubleWritable > { 2 public static final int MAX_SUPERSTEPS = 30; 3 4 @Override 5 public void compute(Vertex <LongWritable , DoubleWritable , FloatWritable > vertex , Iterable <DoubleWritable > messages) throws IOException { 6 if (getSuperstep () >= 1) { 7 double sum = 0; 8 for (DoubleWritable message : messages) { 9 sum += message.get(); 10 } 11 DoubleWritable vertexValue = new DoubleWritable ((0.15f / getTotalNumVertices ()) + 0.85f * sum); 12 vertex.setValue(vertexValue); 13 } 14 if (getSuperstep () < MAX_SUPERSTEPS) { 15 long edges = vertex.getNumEdges (); 16 sendMessageToAllEdges(vertex , new DoubleWritable(vertex. getValue ().get() / edges)); 17 } else { 18 vertex.voteToHalt (); 19 } 20 } 21 22 public static class SimplePageRankWorkerContext extends WorkerContext { 23 @Override 24 public void preApplication () throws InstantiationException , IllegalAccessException { } 25 @Override 26 public void postApplication () { } 27 @Override 28 public void preSuperstep () { } 29 @Override 30 public void postSuperstep () { } 31 } 32 33 public static class SimplePageRankMasterCompute extends DefaultMasterCompute { 34 @Override 35 public void initialize () throws InstantiationException , IllegalAccessException { 36 } 37 } 38 public static class SimplePageRankVertexReader extends GeneratedVertexReader <LongWritable , DoubleWritable , FloatWritable > { 39 @Override 40 public boolean nextVertex () { 41 return totalRecords > recordsRead; 42 } 44 @Override 45 public Vertex <LongWritable , DoubleWritable , FloatWritable > getCurrentVertex () throws IOException { 46 Vertex <LongWritable , DoubleWritable , FloatWritable > vertex = getConf ().createVertex (); 47 LongWritable vertexId = new LongWritable( 48 (inputSplit.getSplitIndex () * totalRecords) + recordsRead); 49 DoubleWritable vertexValue = new DoubleWritable(vertexId. get() * 10d); 50 long targetVertexId = (vertexId.get() + 1) % (inputSplit. getNumSplits () * totalRecords); 51 float edgeValue = vertexId.get() * 100f; 52 List <Edge <LongWritable , FloatWritable >> edges = Lists. newLinkedList (); 53 edges.add(EdgeFactory.create(new LongWritable( targetVertexId), new FloatWritable(edgeValue))); 54 vertex.initialize(vertexId , vertexValue , edges); 55 ++ recordsRead; 56 return vertex; 57 } 58 } 59 60 public static class SimplePageRankVertexInputFormat extends GeneratedVertexInputFormat <LongWritable , DoubleWritable , FloatWritable > { 61 @Override 62 public VertexReader <LongWritable , DoubleWritable , FloatWritable > createVertexReader(InputSplit split , TaskAttemptContext context) 63 throws IOException { 64 return new SimplePageRankVertexReader (); 65 } 66 } 67 68 public static class SimplePageRankVertexOutputFormat extends TextVertexOutputFormat <LongWritable , DoubleWritable , FloatWritable > { 69 @Override 70 public TextVertexWriter createVertexWriter( TaskAttemptContext context) throws IOException , InterruptedException { 71 return new SimplePageRankVertexWriter (); 72 } 73 74 public class SimplePageRankVertexWriter extends TextVertexWriter { 75 @Override 76 public void writeVertex( Vertex <LongWritable , DoubleWritable , FloatWritable > vertex) throws IOException , InterruptedException { 77 getRecordWriter ().write( new Text(vertex.getId(). toString ()), new Text(vertex.getValue ().toString ())) ; 78 } 79 } 80 } 81 }
  • 48. Pagerank for TinkerPop3 1 public class PageRankVertexProgram implements VertexProgram < Double > { 2 private MessageType.Local messageType = MessageType.Local.of (() -> GraphTraversal.<Vertex >of().outE()); 3 public static final String PAGE_RANK = Graph.Key.hide("gremlin .pageRank"); 4 public static final String EDGE_COUNT = Graph.Key.hide(" gremlin.edgeCount"); 5 private static final String VERTEX_COUNT = "gremlin. pageRankVertexProgram.vertexCount"; 6 private static final String ALPHA = "gremlin. pageRankVertexProgram.alpha"; 7 private static final String TOTAL_ITERATIONS = "gremlin. pageRankVertexProgram.totalIterations"; 8 private static final String INCIDENT_TRAVERSAL = "gremlin. pageRankVertexProgram.incidentTraversal"; 9 private double vertexCountAsDouble = 1; 10 private double alpha = 0.85d; 11 private int totalIterations = 30; 12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( Arrays.asList(PAGE_RANK , EDGE_COUNT)); 13 14 private PageRankVertexProgram () {} 15 16 @Override 17 public void loadState(final Configuration configuration) { 18 this.vertexCountAsDouble = configuration.getDouble( VERTEX_COUNT , 1.0d); 19 this.alpha = configuration.getDouble(ALPHA , 0.85d); 20 this.totalIterations = configuration.getInt( TOTAL_ITERATIONS , 30); 21 try { 22 if (configuration.containsKey(INCIDENT_TRAVERSAL)) { 23 final SSupplier <Traversal > traversalSupplier = VertexProgramHelper.deserialize(configuration , INCIDENT_TRAVERSAL); 24 VertexProgramHelper.verifyReversibility( traversalSupplier.get()); 25 this.messageType = MessageType.Local.of(( SSupplier) traversalSupplier); 26 } 27 } catch (final Exception e) { 28 throw new IllegalStateException(e.getMessage (), e); 29 } 30 } 32 @Override 33 public void storeState(final Configuration configuration) { 34 configuration.setProperty(GraphComputer.VERTEX_PROGRAM , PageRankVertexProgram.class.getName ()); 35 configuration.setProperty(VERTEX_COUNT , this. vertexCountAsDouble); 36 configuration.setProperty(ALPHA , this.alpha); 37 configuration.setProperty(TOTAL_ITERATIONS , this. totalIterations); 38 try { 39 VertexProgramHelper.serialize(this.messageType. getIncidentTraversal (), configuration , INCIDENT_TRAVERSAL); 40 } catch (final Exception e) { 41 throw new IllegalStateException(e.getMessage (), e); 42 } 43 } 44 45 @Override 46 public Set <String > getElementComputeKeys () { 47 return COMPUTE_KEYS; 48 } 49 50 @Override 51 public void setup(final Memory memory) { 52 53 } 54 55 @Override 56 public void execute(final Vertex vertex , Messenger <Double > messenger , final Memory memory) { 57 if (memory.isInitialIteration ()) { 58 double initialPageRank = 1.0d / this.vertexCountAsDouble ; 59 double edgeCount = Double.valueOf ((Long) this. messageType.edges(vertex).count().next()); 60 vertex.singleProperty(PAGE_RANK , initialPageRank); 61 vertex.singleProperty(EDGE_COUNT , edgeCount); 62 messenger.sendMessage(this.messageType , initialPageRank / edgeCount); 63 } else { 64 double newPageRank = StreamFactory.stream(messenger. receiveMessages(this.messageType)).reduce (0.0d, (a, b) -> a + b); 65 newPageRank = (this.alpha * newPageRank) + ((1.0d - this .alpha) / this.vertexCountAsDouble); 66 vertex.singleProperty(PAGE_RANK , newPageRank); 67 messenger.sendMessage(this.messageType , newPageRank / vertex.<Double >property(EDGE_COUNT).orElse (0.0d)); 68 } 69 } 70 71 @Override 72 public boolean terminate(final Memory memory) { 73 return memory.getIteration () >= this.totalIterations; 74 } 75 }
  • 49. Pagerank for ArangoDB 1 var pageRank = function (vertex , message , global) { 2 var total , rank , edgeCount , send , edge , alpha , sum; 3 total = global.vertexCount; 4 edgeCount = vertex._outEdges.length; 5 alpha = global.alpha; 6 sum = 0; 7 if (global.step > 0) { 8 while (message.hasNext ()) { 9 sum += message.next().data; 10 } 11 rank = alpha * sum + (1-alpha) / total; 12 } else { 13 rank = 1 / total; 14 } 15 vertex._setResult(rank); 16 if (global.step < global.MAX_STEPS) { 17 send = rank / edgeCount; 18 while (vertex._outEdges.hasNext ()) { 19 edge = vertex._outEdges.next(); 20 message.sendTo(edge._getTarget (), send); 21 } 22 } else { 23 vertex._deactivate (); 24 } 25 }; 26 27 var combiner = function (message , oldMessage) { 28 return message + oldMessage; 29 }; 30 31 var Runner = require ("org/arangodb/pregelRunner ").Runner; 32 var runner = new Runner (); 33 runner.setWorker(pageRank); 34 runner.setCombiner(combiner); 35 runner.setGlobal (" alpha", 0.85); 36 runner.setGlobal (" vertexCount", db.vertices.count ()); 37 runner.start (" myGraph ");
  • 50. Pregel-type problems page rank single-source shortest paths (all) maximal bipartite matching (randomized) semi-clustering connected components distributed minimum spanning forest graph coloring
  • 51. Thanks Twitter: @arangodb @neunhoef Github: ArangoDB/ArangoDB Google Group: arangodb IRC: arangodb https://www.arangodb.com