SlideShare a Scribd company logo
Complex queries in a
distributed multi-model
database
Max Neunhöffer
Tech Talk Geekdom, 16 March 2015
www.arangodb.com
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
Or one can just treat the
DB as a key/value store.
1
Documents and collections
{
"_key": "123456",
"_id": "chars/123456",
"name": "Duck",
"firstname": "Donald",
"dob": "1934-11-13",
"hobbies": ["Golf",
"Singing",
"Running"],
"home":
{"town": "Duck town",
"street": "Lake Road",
"number": 17},
"species": "duck"
}
When I say “document”,
I mean “JSON”.
A “collection” is a set of
documents in a DB.
The DB can inspect the
values, allowing for
secondary indexes.
Or one can just treat the
DB as a key/value store.
Sharding: the data of a
collection is distributed
between multiple servers.
1
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
The database offers queries and
transactions dealing with graphs.
2
Graphs
A
B
D
E
F
G
C
"likes"
"hates"
A graph consists of vertices and
edges.
Graphs model relations, can be
directed or undirected.
Vertices and edges are
documents.
Every edge has a _from and a _to
attribute.
The database offers queries and
transactions dealing with graphs.
For example, paths in the graph
are interesting.
2
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
3
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbies": ["Smoking"]},
{ "name": "Neunhöffer", "firstname": "Max",
"hobbies": ["Piano", "Golf"]},
...
]
3
Query 1
Fetch all documents in a collection
FOR p IN people
RETURN p
[ { "name": "Schmidt", "firstname": "Helmut",
"hobbies": ["Smoking"]},
{ "name": "Neunhöffer", "firstname": "Max",
"hobbies": ["Piano", "Golf"]},
...
]
(Actually, a cursor is returned.)
3
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RETURN { name: CONCAT(p.name, ", ",
p.firstname),
age : p.age }
4
Query 2
Use filtering, sorting and limit
FOR p IN people
FILTER p.age >= @minage
SORT p.name, p.firstname
LIMIT @nrlimit
RETURN { name: CONCAT(p.name, ", ",
p.firstname),
age : p.age }
[ { "name": "Neunhöffer, Max", "age": 44 },
{ "name": "Schmidt, Helmut", "age": 95 },
...
]
4
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number": LENGTH(L) }
5
Query 3
Aggregation and functions
FOR p IN people
COLLECT a = p.age INTO L
FILTER a >= @minage
RETURN { "age": a, "number": LENGTH(L) }
[ { "age": 18, "number": 10 },
{ "age": 19, "number": 17 },
{ "age": 20, "number": 12 },
...
]
5
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN { housename: h.housename,
streetname: h.streetname,
owner: p.name,
value: h.value }
6
Query 4
Joins
FOR p IN @@peoplecollection
FOR h IN houses
FILTER p._key == h.owner
SORT h.streetname, h.housename
RETURN { housename: h.housename,
streetname: h.streetname,
owner: p.name,
value: h.value }
[ { "housename": "Firlefanz",
"streetname": "Meyer street",
"owner": "Hans Schmidt", "value": 423000
},
...
]
6
Query 5
Modifying data
FOR e IN events
FILTER e.timestamp<"2014-09-01T09:53+0200"
INSERT e IN oldevents
FOR e IN events
FILTER e.timestamp<"2014-09-01T09:53+0200"
REMOVE e._key IN events
7
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "distance"} )
RETURN { begin : x.startVertex,
end : x.vertex,
distance : x.distance,
nrPaths : LENGTH(x.paths) }
8
Query 6
Graph queries
FOR x IN GRAPH_SHORTEST_PATH(
"routeplanner", "germanCity/Cologne",
"frenchCity/Paris", {weight: "distance"} )
RETURN { begin : x.startVertex,
end : x.vertex,
distance : x.distance,
nrPaths : LENGTH(x.paths) }
[ { "begin": "germanCity/Cologne",
"end" : {"_id": "frenchCity/Paris", ... },
"distance": 550,
"nrPaths": 10 },
...
]
8
Life of a query
1. Text and query parameters come from user
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
11. Distribute and link up engines on different servers
9
Life of a query
1. Text and query parameters come from user
2. Parse text, produce abstract syntax tree (AST)
3. Substitute query parameters
4. First optimisation: constant expressions, etc.
5. Translate AST into an execution plan (EXP)
6. Optimise one EXP, produce many, potentially better EXPs
7. Reason about distribution in cluster
8. Optimise distributed EXPs
9. Estimate costs for all EXPs, and sort by ascending cost
10. Instanciate “cheapest” plan, i.e. set up execution engine
11. Distribute and link up engines on different servers
12. Execute plan, provide cursor API
9
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
Blocks of “Items”
travel through the
pipeline
10
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
Query → EXP
Black arrows are
dependencies
Think of a pipeline
Each node provides
a cursor API
Blocks of “Items”
travel through the
pipeline
What is an “item”???
10
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
Thus: Items look differently in different parts of the plan
11
Pipeline and items
FOR a IN collA EnumerateCollection a
EnumerateCollection b
Singleton
Calculation xx
FOR b IN collB
LET xx = a.x Items have vars a, xx
Items have no vars
Items are the thingies traveling through the pipeline.
An item holds values of those variables in the current frame
Thus: Items look differently in different parts of the plan
We always deal with blocks of items for performance reasons
11
Execution plans
FOR a IN collA
RETURN {x: a.x, z: b.z}
EnumerateCollection a
EnumerateCollection b
Calculation xx == b.y
Filter xx == b.y
Singleton
Calculation xx
Return {x: a.x, z: b.z}
Calc {x: a.x, z: b.z}
FILTER xx == b.y
FOR b IN collB
LET xx = a.x
12
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
Singleton
EnumColl a
EnumColl b
Calc a.x == 10
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Filter a.x == 10
13
Move filters up
FOR a IN collA
FOR b IN collB
FILTER a.x == 10
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
Singleton
EnumColl a
EnumColl b
Calc a.x == 10
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Filter a.x == 10
13
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
However, the number of items trave-
ling in the pipeline is decreased.
Singleton
EnumColl a
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Calc a.x == 10
EnumColl b
Filter a.x == 10
13
Move filters up
FOR a IN collA
FILTER a.x < 10
FOR b IN collB
FILTER a.u == b.v
RETURN {u:a.u,w:b.w}
The result and behaviour does not
change, if the first FILTER is pulled
out of the inner FOR.
However, the number of items trave-
ling in the pipeline is decreased.
Note that the two FOR statements
could be interchanged!
Singleton
EnumColl a
Return {u:a.u,w:b.w}
Filter a.u == b.v
Calc a.u == b.v
Calc a.x == 10
EnumColl b
Filter a.x == 10
13
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
Singleton
EnumColl a
Calc L = ...
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
LET L = LENGTH(a.hobbies)
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
Singleton
EnumColl a
Calc L = ...
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
(since it cannot throw an exception).
Singleton
EnumColl a
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Remove unnecessary calculations
FOR a IN collA
FOR b IN collB
FILTER a.u == b.v
RETURN {h:a.hobbies,w:b.w}
The Calculation of L is unnecessary!
(since it cannot throw an exception).
Therefore we can just leave it out.
Singleton
EnumColl a
EnumColl b
Calc a.u == b.v
Filter a.u == b.v
Return {...}
14
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Singleton
EnumColl a
Filter ...
Calc ...
Sort a.y, a.x
Return a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order),
Singleton
EnumColl a
Filter ...
Calc ...
Sort a.y, a.x
Return a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order), then we can read
off the half-open interval between
{ y: 10, x: 17 } and
{ y: 10, x: 23 }
from the skiplist index.
Singleton
Sort a.y, a.x
Return a
IndexRange a
15
Use index for FILTER and SORT
FOR a IN collA
FILTER a.x > 17 &&
a.x <= 23 &&
a.y == 10
SORT a.y, a.x
RETURN a
Assume collA has a skiplist index on “y”
and “x” (in this order), then we can read
off the half-open interval between
{ y: 10, x: 17 } and
{ y: 10, x: 23 }
from the skiplist index.
The result will automatically be sorted by
y and then by x.
Singleton
Return a
IndexRange a
15
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a collection are distributed across the DB
servers.
16
Data distribution in a cluster
Requests
DBserver DBserver DBserver
CoordinatorCoordinator
4 2 5 3 11
The shards of a collection are distributed across the DB
servers.
The coordinators receive queries and organise their
execution
16
Scatter/gather
EnumerateCollection
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Scatter/gather
Remote
EnumShard
Remote Remote
EnumShard
Remote
Concat/Merge
Remote
EnumShard
Remote
Scatter
17
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
to this end, we either scatter the items to all DBservers,
or, if possible, we distribute each item to the shard
that is responsible for the modification.
18
Modifying queries
Fortunately:
There can be at most one modifying node in each query.
There can be no modifying nodes in subqueries.
Modifying nodes
The modifying node in a query
is executed on the DBservers,
to this end, we either scatter the items to all DBservers,
or, if possible, we distribute each item to the shard
that is responsible for the modification.
Sometimes, we can even optimise away a gather/scatter
combination and parallelise completely.
18

More Related Content

What's hot

Poster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQLPoster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQL
Ruben Taelman
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
Spark Summit
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Sachin Aggarwal
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Databricks
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
Spark Summit
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
Databricks
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
Jake Mannix
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
jaxLondonConference
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
Databricks
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Martin Junghanns
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWes McKinney
 
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIsPoster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Ruben Taelman
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
Pat Patterson
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Patterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta LakePatterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta Lake
Databricks
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 

What's hot (20)

Poster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQLPoster GraphQL-LD: Linked Data Querying with GraphQL
Poster GraphQL-LD: Linked Data Querying with GraphQL
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structure...
 
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And DataVertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
A look ahead at spark 2.0
A look ahead at spark 2.0 A look ahead at spark 2.0
A look ahead at spark 2.0
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
Designing and Building a Graph Database Application - Ian Robinson (Neo Techn...
 
Robust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache SparkRobust and Scalable ETL over Cloud Storage with Apache Spark
Robust and Scalable ETL over Cloud Storage with Apache Spark
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIsPoster Declaratively Describing Responses of Hypermedia-Driven Web APIs
Poster Declaratively Describing Responses of Hypermedia-Driven Web APIs
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Patterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta LakePatterns and Operational Insights from the First Users of Delta Lake
Patterns and Operational Insights from the First Users of Delta Lake
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 

Viewers also liked

Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)
ArangoDB Database
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
Max Neunhöffer
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them for
Michael Hackstein
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)
ArangoDB Database
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012
ArangoDB Database
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
Max Neunhöffer
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
ArangoDB Database
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
Max Neunhöffer
 
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
Helder Santana
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
ArangoDB Database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
ArangoDB Database
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
ArangoDB Database
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
Max Neunhöffer
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Big Data Spain
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the database
ArangoDB Database
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 months
Max Neunhöffer
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
Max Neunhöffer
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1ArangoDB Database
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSCon
ArangoDB Database
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
Max Neunhöffer
 

Viewers also liked (20)

Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
GraphDatabases and what we can use them for
GraphDatabases and what we can use them forGraphDatabases and what we can use them for
GraphDatabases and what we can use them for
 
Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)Hotcode 2013: Javascript in a database (Part 1)
Hotcode 2013: Javascript in a database (Part 1)
 
Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012 Running MRuby in a Database - ArangoDB - RuPy 2012
Running MRuby in a Database - ArangoDB - RuPy 2012
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
 
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
ArangoDB – Persistência Poliglota e Banco de Dados Multi-Modelos
 
Domain Driven Design & NoSQL
Domain Driven Design & NoSQLDomain Driven Design & NoSQL
Domain Driven Design & NoSQL
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...Jan Steemann: Modelling data in a schema free world  (Talk held at Froscon, 2...
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at... Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...
 
ArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the databaseArangoDB - Using JavaScript in the database
ArangoDB - Using JavaScript in the database
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 months
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Domain driven design @FrOSCon
Domain driven design @FrOSConDomain driven design @FrOSCon
Domain driven design @FrOSCon
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 

Similar to Complex queries in a distributed multi-model database

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
NoSQLmatters
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
Kai Chan
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Simon Price
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship group
Reuven Lerner
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Paul Leclercq
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
Ruby Meditation
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
Guohui Xiao
 
Easy R
Easy REasy R
Easy R
Ajay Ohri
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
WillThompson78
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
source{d}
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation Pipelines
Tom Schreiber
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB
 
3. python intro
3. python   intro3. python   intro
3. python intro
in4400
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
Lin Yo-An
 

Similar to Complex queries in a distributed multi-model database (20)

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL mat...
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship group
 
Pgbr 2013 fts
Pgbr 2013 ftsPgbr 2013 fts
Pgbr 2013 fts
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
 
Ontop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational DatabasesOntop: Answering SPARQL Queries over Relational Databases
Ontop: Answering SPARQL Queries over Relational Databases
 
Easy R
Easy REasy R
Easy R
 
Search Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search APISearch Intelligence & MarkLogic Search API
Search Intelligence & MarkLogic Search API
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Advanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation PipelinesAdvanced MongoDB Aggregation Pipelines
Advanced MongoDB Aggregation Pipelines
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
3. python intro
3. python   intro3. python   intro
3. python intro
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 

More from Max Neunhöffer

Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
Max Neunhöffer
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
Max Neunhöffer
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
Max Neunhöffer
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
Max Neunhöffer
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 

More from Max Neunhöffer (6)

Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 

Recently uploaded

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 

Recently uploaded (20)

Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 

Complex queries in a distributed multi-model database

  • 1. Complex queries in a distributed multi-model database Max Neunhöffer Tech Talk Geekdom, 16 March 2015 www.arangodb.com
  • 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  • 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  • 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  • 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  • 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  • 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  • 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  • 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  • 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  • 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  • 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  • 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  • 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  • 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  • 18. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  • 19. Query 2 Use filtering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  • 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  • 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  • 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  • 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  • 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp<"2014-09-01T09:53+0200" REMOVE e._key IN events 7
  • 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  • 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  • 27. Life of a query 1. Text and query parameters come from user 9
  • 28. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 9
  • 29. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 9
  • 30. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 9
  • 31. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 9
  • 32. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 9
  • 33. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 9
  • 34. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9
  • 35. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 9
  • 36. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 9
  • 37. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 9
  • 38. Life of a query 1. Text and query parameters come from user 2. Parse text, produce abstract syntax tree (AST) 3. Substitute query parameters 4. First optimisation: constant expressions, etc. 5. Translate AST into an execution plan (EXP) 6. Optimise one EXP, produce many, potentially better EXPs 7. Reason about distribution in cluster 8. Optimise distributed EXPs 9. Estimate costs for all EXPs, and sort by ascending cost 10. Instanciate “cheapest” plan, i.e. set up execution engine 11. Distribute and link up engines on different servers 12. Execute plan, provide cursor API 9
  • 39. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP 10
  • 40. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies 10
  • 41. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline 10
  • 42. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  • 43. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  • 44. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x Query → EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  • 45. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. 11
  • 46. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  • 47. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  • 48. Pipeline and items FOR a IN collA EnumerateCollection a EnumerateCollection b Singleton Calculation xx FOR b IN collB LET xx = a.x Items have vars a, xx Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  • 49. Execution plans FOR a IN collA RETURN {x: a.x, z: b.z} EnumerateCollection a EnumerateCollection b Calculation xx == b.y Filter xx == b.y Singleton Calculation xx Return {x: a.x, z: b.z} Calc {x: a.x, z: b.z} FILTER xx == b.y FOR b IN collB LET xx = a.x 12
  • 50. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  • 51. Move filters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Filter a.x == 10 13
  • 52. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  • 53. Move filters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the first FILTER is pulled out of the inner FOR. However, the number of items trave- ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Return {u:a.u,w:b.w} Filter a.u == b.v Calc a.u == b.v Calc a.x == 10 EnumColl b Filter a.x == 10 13
  • 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  • 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Filter ... Calc ... Sort a.y, a.x Return a 15
  • 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton Sort a.y, a.x Return a IndexRange a 15
  • 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton Return a IndexRange a 15
  • 62. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. 16
  • 63. Data distribution in a cluster Requests DBserver DBserver DBserver CoordinatorCoordinator 4 2 5 3 11 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  • 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  • 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  • 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. 18
  • 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modification. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18