SlideShare a Scribd company logo
1 of 20
Introduction to
Parallel Processing
Algorithms
in Shared Nothing
Databases
Ofir Manor
Agenda
• Introduction
• Sample Architecture
• The optimizer and execution plans
• Examples of single table processing
• Examples of Join processing
Scaling Databases
• Scaling – expending a system to support more data / sessions.
• Best scalability – linear, predictable.

• Scale-up (bigger server) vs. Scale-out (more servers)
• Scaling up – easier, but limited, expensive

• Most common scale-out strategy – Sharding
• Spreading the data (rows in a table) across many independent nodes
• Each node has a different subset of the data – Shared Nothing

• Processing sharded data across shared nothing cluster is also called
Massive Parallel Processing (MPP)
• MPP databases appeared since the 80s (ex: Teradata), became popular in
the analytic space in the 2000s (ex: Netezza, Greenplum, Vertica)
• Open source examples over Hadoop – Hive(*), Impala
Sample MPP database architecture

SQL
Client

5. Results

Master Node

Holds Data Dictionary,
Sessions, Optimizer

4. Results

Shard 002
Shard 003
Shard 004
Shard 005

…
Shard nnn

Each table – distributed across all shards

1. SQL

2. Execution
Plan

3. Parallel SQL Execution

Shard 001
Processing – Analytical vs.
Operational
• With MongoDB – most operations involve a single document
• With SQL – most operations involve processing many rows, likely
across all shards
• Example, sum of sales per day per store

• Also, SQL is more expressive – it has a rich set of complex operations
(joining, aggregating, sorting etc)
• A database optimizer builds an execution plan:
• The access path per table (full scan, index scan etc)
• The order of the joins
• The type of each join (multiple algorithms)
Execution Plan- Sample Table
• Syntax and execution plans are based on Greenplum – but the lessons are general.

• We’ll start with a simple, single table, with no indexes.
• Hold data for calls
• CREATE TABLE calls
(subscriber_id integer,
call_date
date,
call_length
integer)
DISTRIBUTED BY (subscriber_id);

• We can control the sharding key (distribution key) – will allow later
some join optimizations.
• Row Placement: shard number = hash(subscriber_id, # of shards)
• Generally, we want data to be spread equally across all shards (no skew)
Single Table Execution - Plan 1
• EXPLAIN SELECT * FROM calls
WHERE call_date BETWEEN '2013/11/01' AND '2013/11/30';
•
QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Seq Scan on calls
Filter: call_date >= '2013-11-01'::date AND
call_date <= '2013-11-30'::date

• Sequential Scan – a full scan of each table shard
• Filter – applied during the scan
• Gather Motion – moving the result set of each shard to the master
Single Table Execution - Plan 2
• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date;
• Challenge – do the group by in parallel
• General case - could be millions or billions of groups

• Challenge – the rows for each group are distributed across all shards
• Conclusion – the processes in the shards need to communicate
Single Table Execution - Plan 2
Send Final Results to the Master

Process Group 2 - Final Aggregation of each group
Shard nnn

Shard 009

Shard 008

Shard 007

Shard 006

Shard 005

Shard 004

Shard 003

Shard 002

Shard 001

Re-distributing (streaming) the result set over the cluster network (n:n)
Process Group 1 - Local Scan, Filter and Aggregation

…
Single Table Execution - Plan 2
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> HashAggregate
Group By: calls.call_date
-> Redistribute Motion n:n
Hash Key: calls.call_date
-> HashAggregate
Group By: calls.call_date
-> Seq Scan on calls
Filter: call_length <= 60
• HashAggregate – does aggregation algorithm per group
• Redistribute Motion – redistribute the data across the shards to a new set of
processes
• Send each row in the result set to shard number = hash(call_date, # of shards)
Single Table Execution - Plan 3
• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date
ORDER BY call_date;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date
-> Sort
Sort Key: partial_aggregation.call_date
-> HashAggregate
Group By: calls.call_date
-> Redistribute Motion n:n
Hash Key: calls.call_date
-> HashAggregate
Group By: calls.call_date
-> Seq Scan on calls
Filter: call_length <= 60
Execution Plan- A Second Table
• Let’s add a second table so we can have some joins.
• It holds details of each subscriber
• CREATE TABLE subscribers
(subscriber_id
integer,
subscriber_city_code integer)
DISTRIBUTED BY(subscriber_id);

• To start with, both tables have the same distribution key
• So, the all the rows of any specific subscriber, from both tables, will be hosted
in the same shard.
• We can leverage this knowledge in our algorithm
• Later we will see what happens if this is not the case
Simple Join 1 – Same Distribution Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
FROM calls c JOIN subscribers s
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
-> Hash
-> Seq Scan on subscribers s
Filter: subscriber_city_code = 4
• Hash Join – joins two tables
• First table is processed, result set is hashed (based on the join key)
• Second table is scanned, joined to the first using hash lookups
Simple Join 2 – Same Distribution Key
• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
FROM calls c JOIN subscribers s
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Nothing new – just a mix of all we’ve seen
Simple Join 2 – Same Distribution Key
QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Redistribute Motion n:n
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
Filter: call_date >= '2012-01-04'::date AND
call_date <= '2012-01-06'::date
-> Hash
-> Seq Scan on subscribers s
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])
Simple Join 1 – Different Distribution
Key
• What if the subscriber table was distributed differently?
• ALTER TABLE subscribers
SET DISTRIBUTED BY(subscriber_city_code);
• Now our data about subscribers is mixed
• The list of customers in shard 1 in calls table is not the same as in subscriber table

• How to run Simple Join 1 query from before?
• Now, there has to be some shuffling of data over the network
• To minimize the work, it is better to shuffle the smaller table over the network
• Since the join key on calls table is the same as the distribution key (subscriber_id), we
can send each row from the result set of subscriber table directly to the right shard.
Simple Join 1 – Different Distribution
Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
FROM calls c JOIN subscribers s
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• Same query as Simple Join 1!
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
-> Hash
-> Redistribute Motion 1:n
Hash Key: s.subscriber_id
-> Seq Scan on subscribers s
Filter: subscriber_city_code = 4
Simple Join 2 – Different Distribution
Key
• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
FROM calls c JOIN subscribers s
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Same query as Simple Join 2 – just different distribution
Simple Join 2 – Different Distribution
Key

QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Redistribute Motion n:n
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
Filter: call_date >= '2012-01-04'::date AND
call_date <= '2012-01-06'::date
-> Hash
-> Redistribute Motion n:n
Hash Key: s.subscriber_id
-> Seq Scan on subscribers s
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])
Teasers
• EXPLAIN SELECT * FROM calls
ORDER BY call_length DESC
LIMIT 10;
(Easy - top 10 calls by length)
• EXPLAIN EXPLAIN SELECT call_date, count(*)
FROM calls WHERE call_length <= 60
GROUP BY call_date
HAVING count(*) >= 1000000
ORDER BY call_date;
(Easy – all days with at least a million short calls – HAVING clause)
• EXPLAIN SELECT call_date, count(distinct subscriber_id)
FROM calls GROUP BY call_date;
(Hard – per day, the number of subscribers with calls)
• EXPLAIN SELECT call_date,
count(distinct subscriber_id),
count(distinct call_length)
FROM calls GROUP BY call_date;
(Very Hard – two DISTINCT aggregations)

More Related Content

What's hot

Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Mauro Pagano
 
Full Table Scan: friend or foe
Full Table Scan: friend or foeFull Table Scan: friend or foe
Full Table Scan: friend or foeMauro Pagano
 
Oracle Diagnostics : Explain Plans (Simple)
Oracle Diagnostics : Explain Plans (Simple)Oracle Diagnostics : Explain Plans (Simple)
Oracle Diagnostics : Explain Plans (Simple)Hemant K Chitale
 
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...Nagios
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAnju Garg
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所Hiroshi Sekiguchi
 
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルXdb tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルXHiroshi Sekiguchi
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsSergey Petrunya
 
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...Hiroshi Sekiguchi
 
SQL Tuning, takes 3 to tango
SQL Tuning, takes 3 to tangoSQL Tuning, takes 3 to tango
SQL Tuning, takes 3 to tangoMauro Pagano
 
Cisco vs. huawei CLI Commands
Cisco vs. huawei CLI CommandsCisco vs. huawei CLI Commands
Cisco vs. huawei CLI CommandsBootcamp SCL
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 

What's hot (14)

Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Is your SQL Exadata-aware?
Is your SQL Exadata-aware?
 
Full Table Scan: friend or foe
Full Table Scan: friend or foeFull Table Scan: friend or foe
Full Table Scan: friend or foe
 
Oracle Diagnostics : Explain Plans (Simple)
Oracle Diagnostics : Explain Plans (Simple)Oracle Diagnostics : Explain Plans (Simple)
Oracle Diagnostics : Explain Plans (Simple)
 
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
Nagios Conference 2013 - Troy Lea - Leveraging and Understanding Performance ...
 
neutron测试例子
neutron测试例子neutron测试例子
neutron测试例子
 
Adaptive Query Optimization in 12c
Adaptive Query Optimization in 12cAdaptive Query Optimization in 12c
Adaptive Query Optimization in 12c
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所
 
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルXdb tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
db tech showcase Tokyo 2014 - L36 - JPOUG : SQLチューニング総合診療所 ケースファイルX
 
MariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histogramsMariaDB: Engine Independent Table Statistics, including histograms
MariaDB: Engine Independent Table Statistics, including histograms
 
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
 
SQL Tuning, takes 3 to tango
SQL Tuning, takes 3 to tangoSQL Tuning, takes 3 to tango
SQL Tuning, takes 3 to tango
 
Gdce 2010 dx11
Gdce 2010 dx11Gdce 2010 dx11
Gdce 2010 dx11
 
Cisco vs. huawei CLI Commands
Cisco vs. huawei CLI CommandsCisco vs. huawei CLI Commands
Cisco vs. huawei CLI Commands
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 

Viewers also liked

The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3vjaquez
 
Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Ian Huston
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseMd. Shamsur Rahim
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMSAli Usman
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Till Rohrmann
 

Viewers also liked (10)

The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3
 
Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python (PyData London 2014)
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Load Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed DatabaseLoad Balancing in Parallel and Distributed Database
Load Balancing in Parallel and Distributed Database
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 

Similar to Introduction to Parallel Processing Algorithms in Shared Nothing Databases

AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter CailliauRedis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter CailliauRedis Labs
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfgaros1
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...EDB
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Spark Summit
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Lucidworks
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jjexp
 

Similar to Introduction to Parallel Processing Algorithms in Shared Nothing Databases (20)

AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter CailliauRedis TimeSeries: Danni Moiseyev, Pieter Cailliau
Redis TimeSeries: Danni Moiseyev, Pieter Cailliau
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
sqltuningcardinality1(1).ppt
sqltuningcardinality1(1).pptsqltuningcardinality1(1).ppt
sqltuningcardinality1(1).ppt
 
Presentation
PresentationPresentation
Presentation
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
Rebalance API for SolrCloud: Presented by Nitin Sharma, Netflix & Suruchi Sha...
 
Enar short course
Enar short courseEnar short course
Enar short course
 
Practical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4jPractical Graph Algorithms with Neo4j
Practical Graph Algorithms with Neo4j
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Introduction to Parallel Processing Algorithms in Shared Nothing Databases

  • 1. Introduction to Parallel Processing Algorithms in Shared Nothing Databases Ofir Manor
  • 2. Agenda • Introduction • Sample Architecture • The optimizer and execution plans • Examples of single table processing • Examples of Join processing
  • 3. Scaling Databases • Scaling – expending a system to support more data / sessions. • Best scalability – linear, predictable. • Scale-up (bigger server) vs. Scale-out (more servers) • Scaling up – easier, but limited, expensive • Most common scale-out strategy – Sharding • Spreading the data (rows in a table) across many independent nodes • Each node has a different subset of the data – Shared Nothing • Processing sharded data across shared nothing cluster is also called Massive Parallel Processing (MPP) • MPP databases appeared since the 80s (ex: Teradata), became popular in the analytic space in the 2000s (ex: Netezza, Greenplum, Vertica) • Open source examples over Hadoop – Hive(*), Impala
  • 4. Sample MPP database architecture SQL Client 5. Results Master Node Holds Data Dictionary, Sessions, Optimizer 4. Results Shard 002 Shard 003 Shard 004 Shard 005 … Shard nnn Each table – distributed across all shards 1. SQL 2. Execution Plan 3. Parallel SQL Execution Shard 001
  • 5. Processing – Analytical vs. Operational • With MongoDB – most operations involve a single document • With SQL – most operations involve processing many rows, likely across all shards • Example, sum of sales per day per store • Also, SQL is more expressive – it has a rich set of complex operations (joining, aggregating, sorting etc) • A database optimizer builds an execution plan: • The access path per table (full scan, index scan etc) • The order of the joins • The type of each join (multiple algorithms)
  • 6. Execution Plan- Sample Table • Syntax and execution plans are based on Greenplum – but the lessons are general. • We’ll start with a simple, single table, with no indexes. • Hold data for calls • CREATE TABLE calls (subscriber_id integer, call_date date, call_length integer) DISTRIBUTED BY (subscriber_id); • We can control the sharding key (distribution key) – will allow later some join optimizations. • Row Placement: shard number = hash(subscriber_id, # of shards) • Generally, we want data to be spread equally across all shards (no skew)
  • 7. Single Table Execution - Plan 1 • EXPLAIN SELECT * FROM calls WHERE call_date BETWEEN '2013/11/01' AND '2013/11/30'; • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Seq Scan on calls Filter: call_date >= '2013-11-01'::date AND call_date <= '2013-11-30'::date • Sequential Scan – a full scan of each table shard • Filter – applied during the scan • Gather Motion – moving the result set of each shard to the master
  • 8. Single Table Execution - Plan 2 • EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date; • Challenge – do the group by in parallel • General case - could be millions or billions of groups • Challenge – the rows for each group are distributed across all shards • Conclusion – the processes in the shards need to communicate
  • 9. Single Table Execution - Plan 2 Send Final Results to the Master Process Group 2 - Final Aggregation of each group Shard nnn Shard 009 Shard 008 Shard 007 Shard 006 Shard 005 Shard 004 Shard 003 Shard 002 Shard 001 Re-distributing (streaming) the result set over the cluster network (n:n) Process Group 1 - Local Scan, Filter and Aggregation …
  • 10. Single Table Execution - Plan 2 • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> HashAggregate Group By: calls.call_date -> Redistribute Motion n:n Hash Key: calls.call_date -> HashAggregate Group By: calls.call_date -> Seq Scan on calls Filter: call_length <= 60 • HashAggregate – does aggregation algorithm per group • Redistribute Motion – redistribute the data across the shards to a new set of processes • Send each row in the result set to shard number = hash(call_date, # of shards)
  • 11. Single Table Execution - Plan 3 • EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date ORDER BY call_date; • QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date -> Sort Sort Key: partial_aggregation.call_date -> HashAggregate Group By: calls.call_date -> Redistribute Motion n:n Hash Key: calls.call_date -> HashAggregate Group By: calls.call_date -> Seq Scan on calls Filter: call_length <= 60
  • 12. Execution Plan- A Second Table • Let’s add a second table so we can have some joins. • It holds details of each subscriber • CREATE TABLE subscribers (subscriber_id integer, subscriber_city_code integer) DISTRIBUTED BY(subscriber_id); • To start with, both tables have the same distribution key • So, the all the rows of any specific subscriber, from both tables, will be hosted in the same shard. • We can leverage this knowledge in our algorithm • Later we will see what happens if this is not the case
  • 13. Simple Join 1 – Same Distribution Key • EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code, c.call_date, c.call_length FROM calls c JOIN subscribers s ON(c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code = 4; • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c -> Hash -> Seq Scan on subscribers s Filter: subscriber_city_code = 4 • Hash Join – joins two tables • First table is processed, result set is hashed (based on the join key) • Second table is scanned, joined to the first using hash lookups
  • 14. Simple Join 2 – Same Distribution Key • EXPLAIN SELECT c.call_date, s.subscriber_city_code, count (*), sum(c.call_length) FROM calls c JOIN subscribers s ON (c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code IN (9,99,999) AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘ GROUP BY 1,2 ORDER BY c.call_date, sum(c.call_length) DESC; • Nothing new – just a mix of all we’ve seen
  • 15. Simple Join 2 – Same Distribution Key QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date, sum -> Sort Sort Key: partial_aggregation.call_date, sum -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Redistribute Motion n:n Hash Key: c.call_date, s.subscriber_city_code -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c Filter: call_date >= '2012-01-04'::date AND call_date <= '2012-01-06'::date -> Hash -> Seq Scan on subscribers s Filter: subscriber_city_code = ANY ('{9,99,999}'::integer[])
  • 16. Simple Join 1 – Different Distribution Key • What if the subscriber table was distributed differently? • ALTER TABLE subscribers SET DISTRIBUTED BY(subscriber_city_code); • Now our data about subscribers is mixed • The list of customers in shard 1 in calls table is not the same as in subscriber table • How to run Simple Join 1 query from before? • Now, there has to be some shuffling of data over the network • To minimize the work, it is better to shuffle the smaller table over the network • Since the join key on calls table is the same as the distribution key (subscriber_id), we can send each row from the result set of subscriber table directly to the right shard.
  • 17. Simple Join 1 – Different Distribution Key • EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code, c.call_date, c.call_length FROM calls c JOIN subscribers s ON(c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code = 4; • Same query as Simple Join 1! • QUERY PLAN -------------------------------------------------Gather Motion n:1 -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c -> Hash -> Redistribute Motion 1:n Hash Key: s.subscriber_id -> Seq Scan on subscribers s Filter: subscriber_city_code = 4
  • 18. Simple Join 2 – Different Distribution Key • EXPLAIN SELECT c.call_date, s.subscriber_city_code, count (*), sum(c.call_length) FROM calls c JOIN subscribers s ON (c.subscriber_id = s.subscriber_id) WHERE s.subscriber_city_code IN (9,99,999) AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘ GROUP BY 1,2 ORDER BY c.call_date, sum(c.call_length) DESC; • Same query as Simple Join 2 – just different distribution
  • 19. Simple Join 2 – Different Distribution Key QUERY PLAN -------------------------------------------------Gather Motion n:1 Merge Key: call_date, sum -> Sort Sort Key: partial_aggregation.call_date, sum -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Redistribute Motion n:n Hash Key: c.call_date, s.subscriber_city_code -> HashAggregate Group By: c.call_date, s.subscriber_city_code -> Hash Join Hash Cond: c.subscriber_id = s.subscriber_id -> Seq Scan on calls c Filter: call_date >= '2012-01-04'::date AND call_date <= '2012-01-06'::date -> Hash -> Redistribute Motion n:n Hash Key: s.subscriber_id -> Seq Scan on subscribers s Filter: subscriber_city_code = ANY ('{9,99,999}'::integer[])
  • 20. Teasers • EXPLAIN SELECT * FROM calls ORDER BY call_length DESC LIMIT 10; (Easy - top 10 calls by length) • EXPLAIN EXPLAIN SELECT call_date, count(*) FROM calls WHERE call_length <= 60 GROUP BY call_date HAVING count(*) >= 1000000 ORDER BY call_date; (Easy – all days with at least a million short calls – HAVING clause) • EXPLAIN SELECT call_date, count(distinct subscriber_id) FROM calls GROUP BY call_date; (Hard – per day, the number of subscribers with calls) • EXPLAIN SELECT call_date, count(distinct subscriber_id), count(distinct call_length) FROM calls GROUP BY call_date; (Very Hard – two DISTINCT aggregations)