SlideShare a Scribd company logo
Joins and aggregations in a 
distributed NoSQL DB 
NoSQLmatters, Barcelona, 22 November 2014 
Max Neunhöffer 
www.arangodb.com
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
Or one can just treat the 
DB as a key/value store. 
1
Documents and collections 
{ 
"_key": "123456", 
"_id": "chars/123456", 
"name": "Duck", 
"firstname": "Donald", 
"dob": "1934-11-13", 
"hobbies": ["Golf", 
"Singing", 
"Running"], 
"home": 
{"town": "Duck town", 
"street": "Lake Road", 
"number": 17}, 
"species": "duck" 
} 
When I say “document”, 
I mean “JSON”. 
A “collection” is a set of 
documents in a DB. 
The DB can inspect the 
values, allowing for 
secondary indexes. 
Or one can just treat the 
DB as a key/value store. 
Sharding: the data of a 
collection is distributed 
between multiple servers. 
1
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
The database offers queries and 
transactions dealing with graphs. 
2
Graphs 
A 
B 
D 
E 
F 
G 
C 
"likes" 
"hates" 
A graph consists of vertices and 
edges. 
Graphs model relations, can be 
directed or undirected. 
Vertices and edges are 
documents. 
Every edge has a _from and a _to 
attribute. 
The database offers queries and 
transactions dealing with graphs. 
For example, paths in the graph 
are interesting. 
2
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
3
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"hobbies": ["Smoking"]}, 
{ "name": "Neunhöffer", "firstname": "Max", 
"hobbies": ["Piano", "Golf"]}, 
... 
] 
3
Query 1 
Fetch all documents in a collection 
FOR p IN people 
RETURN p 
[ { "name": "Schmidt", "firstname": "Helmut", 
"hobbies": ["Smoking"]}, 
{ "name": "Neunhöffer", "firstname": "Max", 
"hobbies": ["Piano", "Golf"]}, 
... 
] 
(Actually, a cursor is returned.) 
3
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlimit 
RETURN { name: CONCAT(p.name, ", ", p.firstname), 
age : p.age } 
4
Query 2 
Use 1ltering, sorting and limit 
FOR p IN people 
FILTER p.age >= @minage 
SORT p.name, p.firstname 
LIMIT @nrlimit 
RETURN { name: CONCAT(p.name, ", ", p.firstname), 
age : p.age } 
[ { "name": "Neunhöffer, Max", "age": 44 }, 
{ "name": "Schmidt, Helmut", "age": 95 }, 
... 
] 
4
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "number": LENGTH(L) } 
5
Query 3 
Aggregation and functions 
FOR p IN people 
COLLECT a = p.age INTO L 
FILTER a >= @minage 
RETURN { "age": a, "number": LENGTH(L) } 
[ { "age": 18, "number": 10 }, 
{ "age": 19, "number": 17 }, 
{ "age": 20, "number": 12 }, 
... 
] 
5
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RETURN { housename: h.housename, 
streetname: h.streetname, 
owner: p.name, 
value: h.value } 
6
Query 4 
Joins 
FOR p IN @@peoplecollection 
FOR h IN houses 
FILTER p._key == h.owner 
SORT h.streetname, h.housename 
RETURN { housename: h.housename, 
streetname: h.streetname, 
owner: p.name, 
value: h.value } 
[ { "housename": "Firlefanz", 
"streetname": "Meyer street", 
"owner": "Hans Schmidt", "value": 423000 
}, 
... 
] 
6
Query 5 
Modifying data 
FOR e IN events 
FILTER e.timestamp < "2014-09-01T09:53+0200" 
INSERT e IN oldevents 
FOR e IN events 
FILTER e.timestamp < "2014-09-01T09:53+0200" 
REMOVE e._key IN events 
7
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight: "distance"} ) 
RETURN { begin : x.startVertex, 
end : x.vertex, 
distance : x.distance, 
nrPaths : LENGTH(x.paths) } 
8
Query 6 
Graph queries 
FOR x IN GRAPH_SHORTEST_PATH( 
"routeplanner", "germanCity/Cologne", 
"frenchCity/Paris", {weight: "distance"} ) 
RETURN { begin : x.startVertex, 
end : x.vertex, 
distance : x.distance, 
nrPaths : LENGTH(x.paths) } 
[ { "begin": "germanCity/Cologne", 
"end" : {"_id": "frenchCity/Paris", ... }, 
"distance": 550, 
"nrPaths": 10 }, 
... 
] 8
Life of a query 
Text and query parameters come from user 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
Distribute and link up engines on different servers 
9
Life of a query 
Text and query parameters come from user 
Parse text, produce abstract syntax tree (AST) 
Substitute query parameters 
First optimisation: constant expressions, etc. 
Translate AST into an execution plan (EXP) 
Optimise one EXP, produce many, potentially better EXPs 
Reason about distribution in cluster 
Optimise distributed EXPs 
Estimate costs for all EXPs, and sort by ascending cost 
Instanciate “cheapest” plan, i.e. set up execution engine 
Distribute and link up engines on different servers 
Execute plan, provide cursor API 
9
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
Blocks of “Items” 
travel through the 
pipeline 
10
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
Query ! EXP 
Black arrows are 
dependencies 
Think of a pipeline 
Each node provides 
a cursor API 
Blocks of “Items” 
travel through the 
pipeline 
What is an “item”??? 
10
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
Thus: Items look differently in different parts of the plan 
11
Pipeline and items 
Singleton 
FOR a IN collA EnumerateCollection a 
LET xx = a.x Calculation xx 
Items have vars a, xx 
EnumerateCollection b 
FOR b IN collB 
Items have no vars 
Items are the thingies traveling through the pipeline. 
An item holds values of those variables in the current frame 
Thus: Items look differently in different parts of the plan 
We always deal with blocks of items for performance reasons 
11
Execution plans 
FOR a IN collA 
LET xx = a.x 
FOR b IN collB 
RETURN {x: a.x, z: b.z} 
Singleton 
EnumerateCollection a 
Calculation xx 
EnumerateCollection b 
Calculation xx == b.y 
Filter xx == b.y 
Calc {x: a.x, z: b.z} 
Return {x: a.x, z: b.z} 
FILTER xx == b.y 
12
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
Singleton 
EnumColl a 
EnumColl b 
Calc a.x == 10 
Filter a.x == 10 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FOR b IN collB 
FILTER a.x == 10 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
Singleton 
EnumColl a 
EnumColl b 
Calc a.x == 10 
Filter a.x == 10 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
However, the number of items trave-ling 
in the pipeline is decreased. 
Singleton 
EnumColl a 
Calc a.x == 10 
Filter a.x == 10 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Move 1lters up 
FOR a IN collA 
FILTER a.x < 10 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {u:a.u,w:b.w} 
The result and behaviour does not 
change, if the 1rst FILTER is pulled 
out of the inner FOR. 
However, the number of items trave-ling 
in the pipeline is decreased. 
Note that the two FOR statements 
could be interchanged! 
Singleton 
EnumColl a 
Calc a.x == 10 
Filter a.x == 10 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {u:a.u,w:b.w} 
13
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
Singleton 
EnumColl a 
Calc L = ... 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
LET L = LENGTH(a.hobbies) 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
Singleton 
EnumColl a 
Calc L = ... 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
(since it cannot throw an exception). 
Singleton 
EnumColl a 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Remove unnecessary calculations 
FOR a IN collA 
FOR b IN collB 
FILTER a.u == b.v 
RETURN {h:a.hobbies,w:b.w} 
The Calculation of L is unnecessary! 
(since it cannot throw an exception). 
Therefore we can just leave it out. 
Singleton 
EnumColl a 
EnumColl b 
Calc a.u == b.v 
Filter a.u == b.v 
Return {...} 
14
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Singleton 
EnumColl a 
Calc ... 
Filter ... 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), 
Singleton 
EnumColl a 
Calc ... 
Filter ... 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), then we can read 
off the half-open interval between 
{ y: 10, x: 17 } and 
{ y: 10, x: 23 } 
from the skiplist index. 
Singleton 
IndexRange a 
Sort a.y, a.x 
Return a 
15
Use index for FILTER and SORT 
FOR a IN collA 
FILTER a.x > 17 && 
a.x <= 23 && 
a.y == 10 
SORT a.y, a.x 
RETURN a 
Assume collA has a skiplist index on “y” 
and “x” (in this order), then we can read 
off the half-open interval between 
{ y: 10, x: 17 } and 
{ y: 10, x: 23 } 
from the skiplist index. 
The result will automatically be sorted by 
y and then by x. 
Singleton 
IndexRange a 
Return a 
15
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of a collection are distributed across the DB 
servers. 
16
Data distribution in a cluster 
Requests 
Coordinator Coordinator 
DBserver DBserver DBserver 
1 4 2 5 3 1 
The shards of a collection are distributed across the DB 
servers. 
The coordinators receive queries and organise their 
execution 
16
Scatter/gather 
EnumerateCollection 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Scatter/gather 
Remote Remote 
EnumShard 
Remote 
EnumShard 
Remote 
Concat/Merge 
Remote 
EnumShard 
Remote 
Scatter 
17
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
to this end, we either scatter the items to all DBservers, 
or, if possible, we distribute each item to the shard 
that is responsible for the modi1cation. 
18
Modifying queries 
Fortunately: 
There can be at most one modifying node in each query. 
There can be no modifying nodes in subqueries. 
Modifying nodes 
The modifying node in a query 
is executed on the DBservers, 
to this end, we either scatter the items to all DBservers, 
or, if possible, we distribute each item to the shard 
that is responsible for the modi1cation. 
Sometimes, we can even optimise away a gather/scatter 
combination and parallelise completely. 
18

More Related Content

What's hot

Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Kai Chan
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
Kai Chan
 
Regular Expressions in Google Analytics
Regular Expressions in Google AnalyticsRegular Expressions in Google Analytics
Regular Expressions in Google Analytics
Shivani Singh
 
Google code search
Google code searchGoogle code search
Google code search
mona zavichi tork
 
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Kai Chan
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
Aleksei Beloshytski
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
Norberto Leite
 

What's hot (7)

Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 2 (SoCal Code Camp LA 2013)
 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
 
Regular Expressions in Google Analytics
Regular Expressions in Google AnalyticsRegular Expressions in Google Analytics
Regular Expressions in Google Analytics
 
Google code search
Google code searchGoogle code search
Google code search
 
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
Search Engine-Building with Lucene and Solr, Part 1 (SoCal Code Camp LA 2013)
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 

Similar to Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014

Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
Max Neunhöffer
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
MongoDB
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
MongoDB
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
OpenThink Labs
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
Gábor Szárnyas
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
MongoDB
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Paul Leclercq
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
André Ricardo Barreto de Oliveira
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB
 
01 ElasticSearch : Getting Started
01 ElasticSearch : Getting Started01 ElasticSearch : Getting Started
01 ElasticSearch : Getting Started
OpenThink Labs
 
Python crush course
Python crush coursePython crush course
Python crush course
Mohammed El Rafie Tarabay
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship group
Reuven Lerner
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
Dataspora
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
MongoDB
 

Similar to Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014 (20)

Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
[MongoDB.local Bengaluru 2018] Tutorial: Pipeline Power - Doing More with Mon...
 
06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis06. ElasticSearch : Mapping and Analysis
06. ElasticSearch : Mapping and Analysis
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, GermanyHarnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
 
01 ElasticSearch : Getting Started
01 ElasticSearch : Getting Started01 ElasticSearch : Getting Started
01 ElasticSearch : Getting Started
 
Python crush course
Python crush coursePython crush course
Python crush course
 
Dynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship groupDynamic languages, for software craftmanship group
Dynamic languages, for software craftmanship group
 
A Survey Of R Graphics
A Survey Of R GraphicsA Survey Of R Graphics
A Survey Of R Graphics
 
Webinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible SchemasWebinar: Strongly Typed Languages and Flexible Schemas
Webinar: Strongly Typed Languages and Flexible Schemas
 

More from NoSQLmatters

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
NoSQLmatters
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
NoSQLmatters
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
NoSQLmatters
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
NoSQLmatters
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
NoSQLmatters
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
NoSQLmatters
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
NoSQLmatters
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
NoSQLmatters
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
NoSQLmatters
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...
NoSQLmatters
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
NoSQLmatters
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
NoSQLmatters
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
NoSQLmatters
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
NoSQLmatters
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
NoSQLmatters
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...
NoSQLmatters
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
NoSQLmatters
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
NoSQLmatters
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
NoSQLmatters
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
NoSQLmatters
 

More from NoSQLmatters (20)

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
 

Recently uploaded

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 

Recently uploaded (20)

一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 

Max Neunhöffer – Joins and aggregations in a distributed NoSQL DB - NoSQL matters Barcelona 2014

  • 1. Joins and aggregations in a distributed NoSQL DB NoSQLmatters, Barcelona, 22 November 2014 Max Neunhöffer www.arangodb.com
  • 2. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } 1
  • 3. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. 1
  • 4. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. 1
  • 5. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. 1
  • 6. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. 1
  • 7. Documents and collections { "_key": "123456", "_id": "chars/123456", "name": "Duck", "firstname": "Donald", "dob": "1934-11-13", "hobbies": ["Golf", "Singing", "Running"], "home": {"town": "Duck town", "street": "Lake Road", "number": 17}, "species": "duck" } When I say “document”, I mean “JSON”. A “collection” is a set of documents in a DB. The DB can inspect the values, allowing for secondary indexes. Or one can just treat the DB as a key/value store. Sharding: the data of a collection is distributed between multiple servers. 1
  • 8. Graphs A B D E F G C "likes" "hates" 2
  • 9. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. 2
  • 10. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. 2
  • 11. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. 2
  • 12. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. 2
  • 13. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. 2
  • 14. Graphs A B D E F G C "likes" "hates" A graph consists of vertices and edges. Graphs model relations, can be directed or undirected. Vertices and edges are documents. Every edge has a _from and a _to attribute. The database offers queries and transactions dealing with graphs. For example, paths in the graph are interesting. 2
  • 15. Query 1 Fetch all documents in a collection FOR p IN people RETURN p 3
  • 16. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] 3
  • 17. Query 1 Fetch all documents in a collection FOR p IN people RETURN p [ { "name": "Schmidt", "firstname": "Helmut", "hobbies": ["Smoking"]}, { "name": "Neunhöffer", "firstname": "Max", "hobbies": ["Piano", "Golf"]}, ... ] (Actually, a cursor is returned.) 3
  • 18. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } 4
  • 19. Query 2 Use 1ltering, sorting and limit FOR p IN people FILTER p.age >= @minage SORT p.name, p.firstname LIMIT @nrlimit RETURN { name: CONCAT(p.name, ", ", p.firstname), age : p.age } [ { "name": "Neunhöffer, Max", "age": 44 }, { "name": "Schmidt, Helmut", "age": 95 }, ... ] 4
  • 20. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } 5
  • 21. Query 3 Aggregation and functions FOR p IN people COLLECT a = p.age INTO L FILTER a >= @minage RETURN { "age": a, "number": LENGTH(L) } [ { "age": 18, "number": 10 }, { "age": 19, "number": 17 }, { "age": 20, "number": 12 }, ... ] 5
  • 22. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } 6
  • 23. Query 4 Joins FOR p IN @@peoplecollection FOR h IN houses FILTER p._key == h.owner SORT h.streetname, h.housename RETURN { housename: h.housename, streetname: h.streetname, owner: p.name, value: h.value } [ { "housename": "Firlefanz", "streetname": "Meyer street", "owner": "Hans Schmidt", "value": 423000 }, ... ] 6
  • 24. Query 5 Modifying data FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" INSERT e IN oldevents FOR e IN events FILTER e.timestamp < "2014-09-01T09:53+0200" REMOVE e._key IN events 7
  • 25. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } 8
  • 26. Query 6 Graph queries FOR x IN GRAPH_SHORTEST_PATH( "routeplanner", "germanCity/Cologne", "frenchCity/Paris", {weight: "distance"} ) RETURN { begin : x.startVertex, end : x.vertex, distance : x.distance, nrPaths : LENGTH(x.paths) } [ { "begin": "germanCity/Cologne", "end" : {"_id": "frenchCity/Paris", ... }, "distance": 550, "nrPaths": 10 }, ... ] 8
  • 27. Life of a query Text and query parameters come from user 9
  • 28. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) 9
  • 29. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters 9
  • 30. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. 9
  • 31. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) 9
  • 32. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs 9
  • 33. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster 9
  • 34. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs 9
  • 35. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost 9
  • 36. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine 9
  • 37. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers 9
  • 38. Life of a query Text and query parameters come from user Parse text, produce abstract syntax tree (AST) Substitute query parameters First optimisation: constant expressions, etc. Translate AST into an execution plan (EXP) Optimise one EXP, produce many, potentially better EXPs Reason about distribution in cluster Optimise distributed EXPs Estimate costs for all EXPs, and sort by ascending cost Instanciate “cheapest” plan, i.e. set up execution engine Distribute and link up engines on different servers Execute plan, provide cursor API 9
  • 39. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP 10
  • 40. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies 10
  • 41. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline 10
  • 42. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API 10
  • 43. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline 10
  • 44. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y Query ! EXP Black arrows are dependencies Think of a pipeline Each node provides a cursor API Blocks of “Items” travel through the pipeline What is an “item”??? 10
  • 45. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. 11
  • 46. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame 11
  • 47. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan 11
  • 48. Pipeline and items Singleton FOR a IN collA EnumerateCollection a LET xx = a.x Calculation xx Items have vars a, xx EnumerateCollection b FOR b IN collB Items have no vars Items are the thingies traveling through the pipeline. An item holds values of those variables in the current frame Thus: Items look differently in different parts of the plan We always deal with blocks of items for performance reasons 11
  • 49. Execution plans FOR a IN collA LET xx = a.x FOR b IN collB RETURN {x: a.x, z: b.z} Singleton EnumerateCollection a Calculation xx EnumerateCollection b Calculation xx == b.y Filter xx == b.y Calc {x: a.x, z: b.z} Return {x: a.x, z: b.z} FILTER xx == b.y 12
  • 50. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 51. Move 1lters up FOR a IN collA FOR b IN collB FILTER a.x == 10 FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. Singleton EnumColl a EnumColl b Calc a.x == 10 Filter a.x == 10 Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 52. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 53. Move 1lters up FOR a IN collA FILTER a.x < 10 FOR b IN collB FILTER a.u == b.v RETURN {u:a.u,w:b.w} The result and behaviour does not change, if the 1rst FILTER is pulled out of the inner FOR. However, the number of items trave-ling in the pipeline is decreased. Note that the two FOR statements could be interchanged! Singleton EnumColl a Calc a.x == 10 Filter a.x == 10 EnumColl b Calc a.u == b.v Filter a.u == b.v Return {u:a.u,w:b.w} 13
  • 54. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 55. Remove unnecessary calculations FOR a IN collA LET L = LENGTH(a.hobbies) FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! Singleton EnumColl a Calc L = ... EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 56. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 57. Remove unnecessary calculations FOR a IN collA FOR b IN collB FILTER a.u == b.v RETURN {h:a.hobbies,w:b.w} The Calculation of L is unnecessary! (since it cannot throw an exception). Therefore we can just leave it out. Singleton EnumColl a EnumColl b Calc a.u == b.v Filter a.u == b.v Return {...} 14
  • 58. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  • 59. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), Singleton EnumColl a Calc ... Filter ... Sort a.y, a.x Return a 15
  • 60. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. Singleton IndexRange a Sort a.y, a.x Return a 15
  • 61. Use index for FILTER and SORT FOR a IN collA FILTER a.x > 17 && a.x <= 23 && a.y == 10 SORT a.y, a.x RETURN a Assume collA has a skiplist index on “y” and “x” (in this order), then we can read off the half-open interval between { y: 10, x: 17 } and { y: 10, x: 23 } from the skiplist index. The result will automatically be sorted by y and then by x. Singleton IndexRange a Return a 15
  • 62. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. 16
  • 63. Data distribution in a cluster Requests Coordinator Coordinator DBserver DBserver DBserver 1 4 2 5 3 1 The shards of a collection are distributed across the DB servers. The coordinators receive queries and organise their execution 16
  • 65. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  • 66. Scatter/gather Remote Remote EnumShard Remote EnumShard Remote Concat/Merge Remote EnumShard Remote Scatter 17
  • 67. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. 18
  • 68. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, 18
  • 69. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. 18
  • 70. Modifying queries Fortunately: There can be at most one modifying node in each query. There can be no modifying nodes in subqueries. Modifying nodes The modifying node in a query is executed on the DBservers, to this end, we either scatter the items to all DBservers, or, if possible, we distribute each item to the shard that is responsible for the modi1cation. Sometimes, we can even optimise away a gather/scatter combination and parallelise completely. 18