SlideShare a Scribd company logo
1 of 50
Download to read offline
Copyright © ArangoDB Inc. , 2018
One Engine, one Query Language.
Multiple Data Models.
Copyright © ArangoDB Inc. , 2018
¡Hola, me llamo Jan!
I am working for ArangoDB Inc. in Colonia, DE
I am one of the developers of ArangoDB,
the distributed, multi-model database
About me
Copyright © ArangoDB Inc. , 2018
Running complex queries
in a distributed system
Copyright © ArangoDB Inc. , 2018
Until recently, there was a tradeof to consider when choosing an
OLTP database
Database tradeofs
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Copyright © ArangoDB Inc. , 2018
In the last few years, there has been a trend towards distributed
databases adopting complex query functionality and transactions
Database trends
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Highly available
Scalable
Transactional guarantees
Complex queries, joins
“NewSQL”
(insert buzzword of choice)
Copyright © ArangoDB Inc. , 2018
●
Distributed databases primer
●
Organizing queries in a distributed database
●
Distributed ACID transactions
●
Q & A
Today I will only consider OLTP databases
Sorry, no Spark/Hadoop!
Agenda
Copyright © ArangoDB Inc. , 2018
Distributed databases
primer
Copyright © ArangoDB Inc. , 2018
A distributed database is a cluster of database nodes
The overall dataset is partitioned into smaller chunks (“shards”)
Adding new nodes to the database increases its capacity (scale out)
Distributed databases
Copyright © ArangoDB Inc. , 2018
Sharding example
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Adding a node = increased capacity
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6
4 nodes (A, B, C, D), 8 shards (S1, S2, S3, S4, S5, S6, S7, S8)
shards
node D
Shards: S7, S8
Copyright © ArangoDB Inc. , 2018
What about data loss?
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Node failure = data loss
node A node B node C
Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
shards
Copyright © ArangoDB Inc. , 2018
Shards example with replicas
node A node B node C
Shards: S1, S2
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S5, S6, S7
Replicas: S1, S3
shards
replicas
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Node failure with a replica setup
node A node B node C
Shards: S1, S2
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S5, S6, S7
Replicas: S1, S3
shards
replicas
3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Promoting replicas
node A node B node C
Shards: S1, S2, S4
Replicas: S4, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S3, S5, S6, S7
Replicas: S1, S3
shards
replicas
2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Creating new replicas
node A node B node C
Shards: S1, S2, S4
Replicas: S3, S5, S6, S7
Shards: S3, S4
Replicas: S2, S5
Shards: S3, S5, S6, S7
Replicas: S1, S2, S4
shards
replicas
2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
Copyright © ArangoDB Inc. , 2018
Organizing queries in a
distributed database
Copyright © ArangoDB Inc. , 2018
A typical distributed query will involve multiple nodes, and requires
communication between them
There is normally a coordinating node for per query, which is
responsible for
●
triggering data processing steps on the other nodes
●
putting together the partial results from the other nodes
●
sending the merged result back to the client
●
shutting down the query on the other nodes
Query coordination
Copyright © ArangoDB Inc. , 2018
Query coordination example
3 data nodes
Query coordinator node:
fetches data from nodes
merges the results
send result to client
shuts down query on nodes
Query result
data nodes:
return data of
shards
Copyright © ArangoDB Inc. , 2018
For each inter-node communication, there will be a network
roundtrip (latency++)
One of the major goals when running distributed queries is to
minimize the amount of network communication, e.g. by
●
restricting the query to as few shards as possible
●
pushing flter conditions to the shards
●
pre-aggregating data on the shards
Operations on diferent shards can also be executed in parallel to
reduce overall latency
Distributed query considerations
Copyright © ArangoDB Inc. , 2018
Now following are some example queries from ArangoDB
ArangoDB is a multi-model NoSQL database, which supports
documents, graphs and key-values
It can be run in single-server or distributed (cluster) mode
ArangoDB provides its own query language AQL, which is similar to
SQL, but has a diferent syntax
ArangoDB query examples
Copyright © ArangoDB Inc. , 2018
A simple ArangoDB query with a flter condition:
FOR u IN users
FILTER u.active == true
RETURN u
which is equivalent to SQL’s
SELECT * FROM users u WHERE u.active = 1
The coordinator will push the flter condition to the shards,
so they will only return data that satisfes the flter condition
Query example (flter)
Copyright © ArangoDB Inc. , 2018
Query example (flter)
3 data nodes
Query: FOR u IN users
FILTER u.active == true RETURN u coordinator:
fetches data from all shards
merges the results
Query result
data nodes:
return filteirieil
data of shards
Copyright © ArangoDB Inc. , 2018
Now a query using a flter on a shard key attribute:
FOR u IN users
FILTER u._key == “jsteemann”
RETURN u
which is equivalent to SQL’s
SELECT * FROM users u WHERE u._key = “jsteemann”
The coordinator will restrict to query to the one shard the data is
located on, push the flter condition to the shard and fetch the results
from there
Query example (flter on shard key)
Copyright © ArangoDB Inc. , 2018
Query example (flter on shard key)
3 data nodes
Query: FOR u IN users FILTER
u._key == “jsteemann” RETURN u coordinator:
fetches data from singlei
shard
Query result
singlei data node:
returns filteirieil
data of shard
Copyright © ArangoDB Inc. , 2018
Another ArangoDB query, now with a sort condition and a projection:
FOR u IN users
SORT u.name
RETURN u.name
which is equivalent to SQL’s
SELECT u.name FROM users u ORDER BY u.name
The coordinator will push the sort condition and the projection to all
shards, and combines the locally sorted results from the shards into a
totally ordered result (using merge-sort)
Query example (sorting)
Copyright © ArangoDB Inc. , 2018
Query example (sorting)
3 data nodes
Query: FOR u IN users
SORT u.name RETURN u.name coordinator:
fetches data from all shards
meirigei-sorits the results
Query result
data nodes:
return soriteil and
priojeicteil data of
shards
Copyright © ArangoDB Inc. , 2018
One more ArangoDB query, now using aggregation:
FOR u IN users
COLLECT year = DATE_YEAR(u.dob)
AGGREGATE count = COUNT(u.dob)
RETURN { year, count }
which is equivalent to SQL’s
SELECT YEAR(u.dob) AS year, COUNT(u.dob) AS count
FROM users u GROUP BY year
The coordinator will push the aggregation to all shards, and combines
the already aggregated results from the shards into a single result
Query example (aggregation)
Copyright © ArangoDB Inc. , 2018
Query example (aggregation)
3 data nodes
Query: FOR u IN users COLLECT ...
RETURN { year, count } coordinator:
fetches data from all shards
aggrieigateis thei
aggrieigateisQuery result
data nodes:
return
aggrieigateil data
of shards
Copyright © ArangoDB Inc. , 2018
One fnal ArangoDB query, now with an equi-join:
FOR u IN users FOR p IN purchases
FILTER u._key == p.user
RETURN { user: u, purchase: p }
which is equivalent to SQL’s
SELECT u.* AS user, p.* AS purchase
FROM users u, purchases p WHERE u._key = p.user
The coordinator will query all shards of the “purchases” collection, and
these will reach out to the coordinator again to get data from all shards
of the “users” collection
Query example (join)
Copyright © ArangoDB Inc. , 2018
Query example (join)
Query: FOR u IN users ...
RETURN {p , u } coordinator:
fetches data from all shards
of “purchases”
merges the results
Query result
data nodes:
fetch data from above
fetch data of shards for
“purchases”
join them
coordinator:
fetches data from all shards
of “users”
merges the results
data nodes:
return data of
shards for “users”
3 + 2 data nodes
Copyright © ArangoDB Inc. , 2018
Distributed
ACID transactions
Copyright © ArangoDB Inc. , 2018
With transactions, complex operations on multiple data items can be
executed in an all-or-nothing fashion
If something goes wrong, the database will do an automatic
cleanup of partially executed operations
With transactions, the database will ensure consistency of data and
protect us from anomalies, no matter if there are other concurrent
operations on the same data
Key take-away: transactions make application developers’ lifes easier
Benefts of transactions
Copyright © ArangoDB Inc. , 2018
Some distributed databases also support ACID transactions
or have plans to add them:
●
Google Cloud Spanner (Database as a service)
●
CockroachDB
●
FoundationDB
●
FaunaDB (closed source)
●
...
●
MongoDB (announced for future releases, with limitations)
Distributed databases with transactions
Copyright © ArangoDB Inc. , 2018
While a distributed transaction is ongoing, it may make modifcations
on diferent nodes
These changes need to be inefective (hidden) until the transaction
actually commits
On commit, the transaction’s changes must become instantly
visible on all nodes at the same time
Atomicity
Copyright © ArangoDB Inc. , 2018
Distributed databases normally store the status of transactions
(pending, committed, aborted) in a private section of the key space,
e.g:
Key Value
T0 commited
T1 aborted
T2 pending
When a transaction commits, its status key is atomically updated
from “pending” to “committed”
Atomicity
Copyright © ArangoDB Inc. , 2018
Databases that provide consistency normally serialize all write
operations for a key on the designated “leader” node for its shard
The state of data on the leader shard then is a consistent
”source of truth” for that shard
Write operations are replicated from leaders to replicas in the same
order as applied on the leader
Replicas are thus exact copies of the leader shards and can take over
any time
Consistency – designated leaders
Copyright © ArangoDB Inc. , 2018
Leader-only writes
Query: put(“amount”, 10)
Query: put(“amount”, 42)
Leader determines the order of the
operations for the same key and
executes them one after the other,
e.g.:
1. put(“amount”, 10)
2. put(“amount”, 42)
Query: put(“amount”, 42)
10
42
Copyright © ArangoDB Inc. , 2018
Shard leaders can change over time, e.g. in case of node failures,
planned maintenance
It is necessary that all nodes in the cluster have the same view on
who is the current leader for a specifc shard, and which are the
shard’s current replicas
Shard leadership
Copyright © ArangoDB Inc. , 2018
The nodes in the cluster normally use a “consensus protocol” to
exchange status messages
Paxos and RAFT are the most commonly used consensus protocols in
distributed databases
These protocols are designed to handle network partitions and node
failures, and will work reliably if a majority of nodes is still available
and can still exchange messages with each other
Consensus protocols
Copyright © ArangoDB Inc. , 2018
To ensure consistency, transactions that modify the same data must
be put into an unambiguous order
Having an unambiguous global order allows having a cross-node
consistent view on the data
This is hard to achieve because the transactions can start on diferent
nodes in parallel
Ordering transactions
Copyright © ArangoDB Inc. , 2018
Each transaction is assigned a timestamp when it is started
This same timestamp will be used later as the transaction’s commit
timestamp
The timestamps of transactions will be used for ordering them
Rule: a transaction with a lower timestamp happened before a
transaction with a higher timestamp
Ordering transactions using timestamps
Copyright © ArangoDB Inc. , 2018
Timestamps created by diferent nodes are not reliably comparable
due to clock skew
The solution to make them comparable in most cases is to defne an
“uncertainty interval” (which is the maximum tolerable clock skew)
If the timestamp diference is outside of the “uncertainty interval”,
two timestamps are safely comparable
Two timestamps with a diference inside the uncertainty interval are
not comparable safely, and the relative order of them is unknown
Clock skew
Copyright © ArangoDB Inc. , 2018
If the transactions could have infuence on each other, this is an
(actual or a potential) read or write confict, and one of the
transactions must be aborted or restarted
A transaction restart also means assigning a new, higher timestamp
Consistency using timestamps
Copyright © ArangoDB Inc. , 2018
To ensure isolation, a running transaction must not overwrite or
remove data that another ongoing transaction may still see
Write operations are stored in a multi-version data structure, which
can handle multiple values for the same key at the same time
Any transaction that reads or writes a key needs to fnd the “correct”
version of it
Isolation
Copyright © ArangoDB Inc. , 2018
Key Transaction ID Value
“amount” T0 10
”amount” T1 42
”name” T17 ”test”
”page” T2 ”index.html”
”page” T50 <removed>
Any operation can identify whether it can “see” an operation from
another transaction, simply by looking up the status and timestamp
of the corresponding transaction
Isolation – multi-versioning
Copyright © ArangoDB Inc. , 2018
Durability
To ensure durability, every write operation (and also transaction status
changes) needs to be persisted on multiple nodes (leader + replicas)
A commit is only considered successful if acknowledged by a
confgurable number of nodes
Copyright © ArangoDB Inc. , 2018
In the last few years, there has been a trend towards distributed
databases adopting complex query functionality and transactions
Database trends
Complex queries, joins
Transactional guarantees
Highly available
Scalable
traditional
relational
“NoSQL”
Highly available
Scalable
Transactional guarantees
Complex queries, joins
“NewSQL”
(insert buzzword of choice)
Copyright © ArangoDB Inc. , 2018
¡Muchas gracias!
¿Hay preguntas?
Copyright © ArangoDB Inc. , 2018
Please star ArangoDB on Github:
https://github.com/arangodb/arangodb
Participate in ArangoDB’s community survey to win a t-shirt:
https://arangodb.com/community-survey/
#arangodb | jan@arangodb.com
Icons made by Freepik (www.freepik.com) from www.faticon.com,
licensed by CC 3.0 BY
Links / credits

More Related Content

What's hot

Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsArangoDB Database
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB Database
 
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaBridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaSteve Watt
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowKristian Alexander
 
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Jean Ihm
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseKnoldus Inc.
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5Yan Zhou
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalystTakuya UESHIN
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RaySpark Summit
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingKristian Alexander
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Chris Fregly
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesTodd McGrath
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
 
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and VerticaBridging Structured and Unstructred Data with Apache Hadoop and Vertica
Bridging Structured and Unstructred Data with Apache Hadoop and Vertica
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
 
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
 
Introduction to DGraph - A Graph Database
Introduction to DGraph - A Graph DatabaseIntroduction to DGraph - A Graph Database
Introduction to DGraph - A Graph Database
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
Advanced Apache Spark Meetup Spark SQL + DataFrames + Catalyst Optimizer + Da...
 
Spark SQL with Scala Code Examples
Spark SQL with Scala Code ExamplesSpark SQL with Scala Code Examples
Spark SQL with Scala Code Examples
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 

Similar to Running complex data queries in a distributed system

Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Amazon Web Services
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scaleMark Schroering
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaChris Bailey
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxKnoldus Inc.
 
Big Data Transformations Powered By Spark
Big Data Transformations Powered By SparkBig Data Transformations Powered By Spark
Big Data Transformations Powered By SparkKnoldus Inc.
 
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksOracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksAmazon Web Services
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Amazon Web Services
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLAmazon Web Services
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicalsShelli Ciaschini
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...Amazon Web Services
 
Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availabilityRuben Verborgh
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumAmazon Web Services
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataHostedbyConfluent
 
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Amazon Web Services
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Amazon Web Services
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 

Similar to Running complex data queries in a distributed system (20)

Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
Extending Analytics Beyond the Data Warehouse, ft. Warner Bros. Analytics (AN...
 
Processing genetic data at scale
Processing genetic data at scaleProcessing genetic data at scale
Processing genetic data at scale
 
Doc store
Doc storeDoc store
Doc store
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptx
 
Big Data Transformations Powered By Spark
Big Data Transformations Powered By SparkBig Data Transformations Powered By Spark
Big Data Transformations Powered By Spark
 
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech TalksOracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
Oracle to Amazon Aurora Migration, Step by Step - AWS Online Tech Talks
 
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
Building with AWS Databases: Match Your Workload to the Right Database (DAT30...
 
Preparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/MLPreparing Your Data for Cloud Analytics & AI/ML
Preparing Your Data for Cloud Analytics & AI/ML
 
Sql portfolio admin_practicals
Sql portfolio admin_practicalsSql portfolio admin_practicals
Sql portfolio admin_practicals
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 
Querying datasets on the Web with high availability
Querying datasets on the Web with high availabilityQuerying datasets on the Web with high availability
Querying datasets on the Web with high availability
 
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift SpectrumModernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
Modernise your Data Warehouse with Amazon Redshift and Amazon Redshift Spectrum
 
Off-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier DataOff-Label Data Mesh: A Prescription for Healthier Data
Off-Label Data Mesh: A Prescription for Healthier Data
 
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
Working with Relational Databases in AWS Glue ETL (ANT342) - AWS re:Invent 2018
 
Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018Modernise your Data Warehouse - AWS Summit Sydney 2018
Modernise your Data Warehouse - AWS Summit Sydney 2018
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 

More from ArangoDB Database

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisArangoDB Database
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisArangoDB Database
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB Database
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBArangoDB Database
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databasesArangoDB Database
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?ArangoDB Database
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeArangoDB Database
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
 
Creating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosCreating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosArangoDB Database
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseArangoDB Database
 
Introduction to Foxx by our community member Iskandar Soesman @ikandars
Introduction to Foxx by our community member Iskandar Soesman @ikandarsIntroduction to Foxx by our community member Iskandar Soesman @ikandars
Introduction to Foxx by our community member Iskandar Soesman @ikandarsArangoDB Database
 
Polyglot Persistence & Multi-Model Databases
Polyglot Persistence & Multi-Model DatabasesPolyglot Persistence & Multi-Model Databases
Polyglot Persistence & Multi-Model DatabasesArangoDB Database
 

More from ArangoDB Database (20)

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
 
3.5 webinar
3.5 webinar 3.5 webinar
3.5 webinar
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
 
Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?Guacamole Fiesta: What do avocados and databases have in common?
Guacamole Fiesta: What do avocados and databases have in common?
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Fishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data LakeFishing Graphs in a Hadoop Data Lake
Fishing Graphs in a Hadoop Data Lake
 
An E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model DatabaseAn E-commerce App in action built on top of a Multi-model Database
An E-commerce App in action built on top of a Multi-model Database
 
Creating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosCreating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on Mesos
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph Database
 
Introduction to Foxx by our community member Iskandar Soesman @ikandars
Introduction to Foxx by our community member Iskandar Soesman @ikandarsIntroduction to Foxx by our community member Iskandar Soesman @ikandars
Introduction to Foxx by our community member Iskandar Soesman @ikandars
 
Polyglot Persistence & Multi-Model Databases
Polyglot Persistence & Multi-Model DatabasesPolyglot Persistence & Multi-Model Databases
Polyglot Persistence & Multi-Model Databases
 

Recently uploaded

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Running complex data queries in a distributed system

  • 1. Copyright © ArangoDB Inc. , 2018 One Engine, one Query Language. Multiple Data Models.
  • 2. Copyright © ArangoDB Inc. , 2018 ¡Hola, me llamo Jan! I am working for ArangoDB Inc. in Colonia, DE I am one of the developers of ArangoDB, the distributed, multi-model database About me
  • 3. Copyright © ArangoDB Inc. , 2018 Running complex queries in a distributed system
  • 4. Copyright © ArangoDB Inc. , 2018 Until recently, there was a tradeof to consider when choosing an OLTP database Database tradeofs Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL”
  • 5. Copyright © ArangoDB Inc. , 2018 In the last few years, there has been a trend towards distributed databases adopting complex query functionality and transactions Database trends Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL” Highly available Scalable Transactional guarantees Complex queries, joins “NewSQL” (insert buzzword of choice)
  • 6. Copyright © ArangoDB Inc. , 2018 ● Distributed databases primer ● Organizing queries in a distributed database ● Distributed ACID transactions ● Q & A Today I will only consider OLTP databases Sorry, no Spark/Hadoop! Agenda
  • 7. Copyright © ArangoDB Inc. , 2018 Distributed databases primer
  • 8. Copyright © ArangoDB Inc. , 2018 A distributed database is a cluster of database nodes The overall dataset is partitioned into smaller chunks (“shards”) Adding new nodes to the database increases its capacity (scale out) Distributed databases
  • 9. Copyright © ArangoDB Inc. , 2018 Sharding example node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 10. Copyright © ArangoDB Inc. , 2018 Adding a node = increased capacity node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6 4 nodes (A, B, C, D), 8 shards (S1, S2, S3, S4, S5, S6, S7, S8) shards node D Shards: S7, S8
  • 11. Copyright © ArangoDB Inc. , 2018 What about data loss? node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 12. Copyright © ArangoDB Inc. , 2018 Node failure = data loss node A node B node C Shards: S1, S2 Shards: S3, S4 Shards: S5, S6, S7 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7) shards
  • 13. Copyright © ArangoDB Inc. , 2018 Shards example with replicas node A node B node C Shards: S1, S2 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S5, S6, S7 Replicas: S1, S3 shards replicas 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 14. Copyright © ArangoDB Inc. , 2018 Node failure with a replica setup node A node B node C Shards: S1, S2 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S5, S6, S7 Replicas: S1, S3 shards replicas 3 nodes (A, B, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 15. Copyright © ArangoDB Inc. , 2018 Promoting replicas node A node B node C Shards: S1, S2, S4 Replicas: S4, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S3, S5, S6, S7 Replicas: S1, S3 shards replicas 2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 16. Copyright © ArangoDB Inc. , 2018 Creating new replicas node A node B node C Shards: S1, S2, S4 Replicas: S3, S5, S6, S7 Shards: S3, S4 Replicas: S2, S5 Shards: S3, S5, S6, S7 Replicas: S1, S2, S4 shards replicas 2 nodes (A, C), 7 shards (S1, S2, S3, S4, S5, S6, S7)
  • 17. Copyright © ArangoDB Inc. , 2018 Organizing queries in a distributed database
  • 18. Copyright © ArangoDB Inc. , 2018 A typical distributed query will involve multiple nodes, and requires communication between them There is normally a coordinating node for per query, which is responsible for ● triggering data processing steps on the other nodes ● putting together the partial results from the other nodes ● sending the merged result back to the client ● shutting down the query on the other nodes Query coordination
  • 19. Copyright © ArangoDB Inc. , 2018 Query coordination example 3 data nodes Query coordinator node: fetches data from nodes merges the results send result to client shuts down query on nodes Query result data nodes: return data of shards
  • 20. Copyright © ArangoDB Inc. , 2018 For each inter-node communication, there will be a network roundtrip (latency++) One of the major goals when running distributed queries is to minimize the amount of network communication, e.g. by ● restricting the query to as few shards as possible ● pushing flter conditions to the shards ● pre-aggregating data on the shards Operations on diferent shards can also be executed in parallel to reduce overall latency Distributed query considerations
  • 21. Copyright © ArangoDB Inc. , 2018 Now following are some example queries from ArangoDB ArangoDB is a multi-model NoSQL database, which supports documents, graphs and key-values It can be run in single-server or distributed (cluster) mode ArangoDB provides its own query language AQL, which is similar to SQL, but has a diferent syntax ArangoDB query examples
  • 22. Copyright © ArangoDB Inc. , 2018 A simple ArangoDB query with a flter condition: FOR u IN users FILTER u.active == true RETURN u which is equivalent to SQL’s SELECT * FROM users u WHERE u.active = 1 The coordinator will push the flter condition to the shards, so they will only return data that satisfes the flter condition Query example (flter)
  • 23. Copyright © ArangoDB Inc. , 2018 Query example (flter) 3 data nodes Query: FOR u IN users FILTER u.active == true RETURN u coordinator: fetches data from all shards merges the results Query result data nodes: return filteirieil data of shards
  • 24. Copyright © ArangoDB Inc. , 2018 Now a query using a flter on a shard key attribute: FOR u IN users FILTER u._key == “jsteemann” RETURN u which is equivalent to SQL’s SELECT * FROM users u WHERE u._key = “jsteemann” The coordinator will restrict to query to the one shard the data is located on, push the flter condition to the shard and fetch the results from there Query example (flter on shard key)
  • 25. Copyright © ArangoDB Inc. , 2018 Query example (flter on shard key) 3 data nodes Query: FOR u IN users FILTER u._key == “jsteemann” RETURN u coordinator: fetches data from singlei shard Query result singlei data node: returns filteirieil data of shard
  • 26. Copyright © ArangoDB Inc. , 2018 Another ArangoDB query, now with a sort condition and a projection: FOR u IN users SORT u.name RETURN u.name which is equivalent to SQL’s SELECT u.name FROM users u ORDER BY u.name The coordinator will push the sort condition and the projection to all shards, and combines the locally sorted results from the shards into a totally ordered result (using merge-sort) Query example (sorting)
  • 27. Copyright © ArangoDB Inc. , 2018 Query example (sorting) 3 data nodes Query: FOR u IN users SORT u.name RETURN u.name coordinator: fetches data from all shards meirigei-sorits the results Query result data nodes: return soriteil and priojeicteil data of shards
  • 28. Copyright © ArangoDB Inc. , 2018 One more ArangoDB query, now using aggregation: FOR u IN users COLLECT year = DATE_YEAR(u.dob) AGGREGATE count = COUNT(u.dob) RETURN { year, count } which is equivalent to SQL’s SELECT YEAR(u.dob) AS year, COUNT(u.dob) AS count FROM users u GROUP BY year The coordinator will push the aggregation to all shards, and combines the already aggregated results from the shards into a single result Query example (aggregation)
  • 29. Copyright © ArangoDB Inc. , 2018 Query example (aggregation) 3 data nodes Query: FOR u IN users COLLECT ... RETURN { year, count } coordinator: fetches data from all shards aggrieigateis thei aggrieigateisQuery result data nodes: return aggrieigateil data of shards
  • 30. Copyright © ArangoDB Inc. , 2018 One fnal ArangoDB query, now with an equi-join: FOR u IN users FOR p IN purchases FILTER u._key == p.user RETURN { user: u, purchase: p } which is equivalent to SQL’s SELECT u.* AS user, p.* AS purchase FROM users u, purchases p WHERE u._key = p.user The coordinator will query all shards of the “purchases” collection, and these will reach out to the coordinator again to get data from all shards of the “users” collection Query example (join)
  • 31. Copyright © ArangoDB Inc. , 2018 Query example (join) Query: FOR u IN users ... RETURN {p , u } coordinator: fetches data from all shards of “purchases” merges the results Query result data nodes: fetch data from above fetch data of shards for “purchases” join them coordinator: fetches data from all shards of “users” merges the results data nodes: return data of shards for “users” 3 + 2 data nodes
  • 32. Copyright © ArangoDB Inc. , 2018 Distributed ACID transactions
  • 33. Copyright © ArangoDB Inc. , 2018 With transactions, complex operations on multiple data items can be executed in an all-or-nothing fashion If something goes wrong, the database will do an automatic cleanup of partially executed operations With transactions, the database will ensure consistency of data and protect us from anomalies, no matter if there are other concurrent operations on the same data Key take-away: transactions make application developers’ lifes easier Benefts of transactions
  • 34. Copyright © ArangoDB Inc. , 2018 Some distributed databases also support ACID transactions or have plans to add them: ● Google Cloud Spanner (Database as a service) ● CockroachDB ● FoundationDB ● FaunaDB (closed source) ● ... ● MongoDB (announced for future releases, with limitations) Distributed databases with transactions
  • 35. Copyright © ArangoDB Inc. , 2018 While a distributed transaction is ongoing, it may make modifcations on diferent nodes These changes need to be inefective (hidden) until the transaction actually commits On commit, the transaction’s changes must become instantly visible on all nodes at the same time Atomicity
  • 36. Copyright © ArangoDB Inc. , 2018 Distributed databases normally store the status of transactions (pending, committed, aborted) in a private section of the key space, e.g: Key Value T0 commited T1 aborted T2 pending When a transaction commits, its status key is atomically updated from “pending” to “committed” Atomicity
  • 37. Copyright © ArangoDB Inc. , 2018 Databases that provide consistency normally serialize all write operations for a key on the designated “leader” node for its shard The state of data on the leader shard then is a consistent ”source of truth” for that shard Write operations are replicated from leaders to replicas in the same order as applied on the leader Replicas are thus exact copies of the leader shards and can take over any time Consistency – designated leaders
  • 38. Copyright © ArangoDB Inc. , 2018 Leader-only writes Query: put(“amount”, 10) Query: put(“amount”, 42) Leader determines the order of the operations for the same key and executes them one after the other, e.g.: 1. put(“amount”, 10) 2. put(“amount”, 42) Query: put(“amount”, 42) 10 42
  • 39. Copyright © ArangoDB Inc. , 2018 Shard leaders can change over time, e.g. in case of node failures, planned maintenance It is necessary that all nodes in the cluster have the same view on who is the current leader for a specifc shard, and which are the shard’s current replicas Shard leadership
  • 40. Copyright © ArangoDB Inc. , 2018 The nodes in the cluster normally use a “consensus protocol” to exchange status messages Paxos and RAFT are the most commonly used consensus protocols in distributed databases These protocols are designed to handle network partitions and node failures, and will work reliably if a majority of nodes is still available and can still exchange messages with each other Consensus protocols
  • 41. Copyright © ArangoDB Inc. , 2018 To ensure consistency, transactions that modify the same data must be put into an unambiguous order Having an unambiguous global order allows having a cross-node consistent view on the data This is hard to achieve because the transactions can start on diferent nodes in parallel Ordering transactions
  • 42. Copyright © ArangoDB Inc. , 2018 Each transaction is assigned a timestamp when it is started This same timestamp will be used later as the transaction’s commit timestamp The timestamps of transactions will be used for ordering them Rule: a transaction with a lower timestamp happened before a transaction with a higher timestamp Ordering transactions using timestamps
  • 43. Copyright © ArangoDB Inc. , 2018 Timestamps created by diferent nodes are not reliably comparable due to clock skew The solution to make them comparable in most cases is to defne an “uncertainty interval” (which is the maximum tolerable clock skew) If the timestamp diference is outside of the “uncertainty interval”, two timestamps are safely comparable Two timestamps with a diference inside the uncertainty interval are not comparable safely, and the relative order of them is unknown Clock skew
  • 44. Copyright © ArangoDB Inc. , 2018 If the transactions could have infuence on each other, this is an (actual or a potential) read or write confict, and one of the transactions must be aborted or restarted A transaction restart also means assigning a new, higher timestamp Consistency using timestamps
  • 45. Copyright © ArangoDB Inc. , 2018 To ensure isolation, a running transaction must not overwrite or remove data that another ongoing transaction may still see Write operations are stored in a multi-version data structure, which can handle multiple values for the same key at the same time Any transaction that reads or writes a key needs to fnd the “correct” version of it Isolation
  • 46. Copyright © ArangoDB Inc. , 2018 Key Transaction ID Value “amount” T0 10 ”amount” T1 42 ”name” T17 ”test” ”page” T2 ”index.html” ”page” T50 <removed> Any operation can identify whether it can “see” an operation from another transaction, simply by looking up the status and timestamp of the corresponding transaction Isolation – multi-versioning
  • 47. Copyright © ArangoDB Inc. , 2018 Durability To ensure durability, every write operation (and also transaction status changes) needs to be persisted on multiple nodes (leader + replicas) A commit is only considered successful if acknowledged by a confgurable number of nodes
  • 48. Copyright © ArangoDB Inc. , 2018 In the last few years, there has been a trend towards distributed databases adopting complex query functionality and transactions Database trends Complex queries, joins Transactional guarantees Highly available Scalable traditional relational “NoSQL” Highly available Scalable Transactional guarantees Complex queries, joins “NewSQL” (insert buzzword of choice)
  • 49. Copyright © ArangoDB Inc. , 2018 ¡Muchas gracias! ¿Hay preguntas?
  • 50. Copyright © ArangoDB Inc. , 2018 Please star ArangoDB on Github: https://github.com/arangodb/arangodb Participate in ArangoDB’s community survey to win a t-shirt: https://arangodb.com/community-survey/ #arangodb | jan@arangodb.com Icons made by Freepik (www.freepik.com) from www.faticon.com, licensed by CC 3.0 BY Links / credits