Introduction to Parallel Processing Algorithms in Shared Nothing Databases

Introduction to
Parallel Processing
Algorithms
in Shared Nothing
Databases
Ofir Manor

Agenda
• Introduction
• Sample Architecture
• The optimizer and execution plans
• Examples of single table processing
• Examples of Join processing

Scaling Databases
• Scaling – expending a system to support more data / sessions.
• Best scalability – linear, predictable.

• Scale-up (bigger server) vs. Scale-out (more servers)
• Scaling up – easier, but limited, expensive

• Most common scale-out strategy – Sharding
• Spreading the data (rows in a table) across many independent nodes
• Each node has a different subset of the data – Shared Nothing

• Processing sharded data across shared nothing cluster is also called
Massive Parallel Processing (MPP)
• MPP databases appeared since the 80s (ex: Teradata), became popular in
the analytic space in the 2000s (ex: Netezza, Greenplum, Vertica)
• Open source examples over Hadoop – Hive(*), Impala

Sample MPP database architecture

SQL
Client

5. Results

Master Node

Holds Data Dictionary,
Sessions, Optimizer

4. Results

Shard 002
Shard 003
Shard 004
Shard 005

…
Shard nnn

Each table – distributed across all shards

1. SQL

2. Execution
Plan

3. Parallel SQL Execution

Shard 001

Processing – Analytical vs.
Operational
• With MongoDB – most operations involve a single document
• With SQL – most operations involve processing many rows, likely
across all shards
• Example, sum of sales per day per store

• Also, SQL is more expressive – it has a rich set of complex operations
(joining, aggregating, sorting etc)
• A database optimizer builds an execution plan:
• The access path per table (full scan, index scan etc)
• The order of the joins
• The type of each join (multiple algorithms)

Execution Plan- Sample Table
• Syntax and execution plans are based on Greenplum – but the lessons are general.

• We’ll start with a simple, single table, with no indexes.
• Hold data for calls
• CREATE TABLE calls
(subscriber_id integer,
call_date
date,
call_length
integer)
DISTRIBUTED BY (subscriber_id);

• We can control the sharding key (distribution key) – will allow later
some join optimizations.
• Row Placement: shard number = hash(subscriber_id, # of shards)
• Generally, we want data to be spread equally across all shards (no skew)

Single Table Execution - Plan 1
• EXPLAIN SELECT * FROM calls
WHERE call_date BETWEEN '2013/11/01' AND '2013/11/30';
•
QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Seq Scan on calls
Filter: call_date >= '2013-11-01'::date AND
call_date <= '2013-11-30'::date

• Sequential Scan – a full scan of each table shard
• Filter – applied during the scan
• Gather Motion – moving the result set of each shard to the master

• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date;
• Challenge – do the group by in parallel
• General case - could be millions or billions of groups

• Challenge – the rows for each group are distributed across all shards
• Conclusion – the processes in the shards need to communicate

Send Final Results to the Master

Process Group 2 - Final Aggregation of each group
Shard nnn

Shard 009

Shard 008

Shard 007

Shard 006

Shard 005

Shard 004

Shard 003

Shard 002

Shard 001

Re-distributing (streaming) the result set over the cluster network (n:n)
Process Group 1 - Local Scan, Filter and Aggregation

…

• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> HashAggregate
Group By: calls.call_date
-> Redistribute Motion n:n
Hash Key: calls.call_date
-> HashAggregate
Filter: call_length <= 60
• HashAggregate – does aggregation algorithm per group
• Redistribute Motion – redistribute the data across the shards to a new set of
processes
• Send each row in the result set to shard number = hash(call_date, # of shards)

• EXPLAIN SELECT call_date, count(*)
FROM calls
WHERE call_length <= 60
GROUP BY call_date
ORDER BY call_date;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date
-> Sort
Sort Key: partial_aggregation.call_date
-> HashAggregate
Hash Key: calls.call_date
-> HashAggregate
Filter: call_length <= 60

Execution Plan- A Second Table
• Let’s add a second table so we can have some joins.
• It holds details of each subscriber
• CREATE TABLE subscribers
(subscriber_id
integer,
subscriber_city_code integer)
DISTRIBUTED BY(subscriber_id);

• To start with, both tables have the same distribution key
• So, the all the rows of any specific subscriber, from both tables, will be hosted
in the same shard.
• We can leverage this knowledge in our algorithm
• Later we will see what happens if this is not the case

Simple Join 1 – Same Distribution Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
FROM calls c JOIN subscribers s
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
Hash Cond: c.subscriber_id = s.subscriber_id
-> Seq Scan on calls c
-> Hash
-> Seq Scan on subscribers s
Filter: subscriber_city_code = 4
• Hash Join – joins two tables
• First table is processed, result set is hashed (based on the join key)
• Second table is scanned, joined to the first using hash lookups

• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Nothing new – just a mix of all we’ve seen

QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Group By: c.call_date, s.subscriber_city_code
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
-> Hash Join
call_date <= '2012-01-06'::date
-> Hash
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])

Simple Join 1 – Different Distribution
Key
• What if the subscriber table was distributed differently?
• ALTER TABLE subscribers
SET DISTRIBUTED BY(subscriber_city_code);
• Now our data about subscribers is mixed
• The list of customers in shard 1 in calls table is not the same as in subscriber table

• How to run Simple Join 1 query from before?
• Now, there has to be some shuffling of data over the network
• To minimize the work, it is better to shuffle the smaller table over the network
• Since the join key on calls table is the same as the distribution key (subscriber_id), we
can send each row from the result set of subscriber table directly to the right shard.

Key
• EXPLAIN SELECT s.subscriber_id, s.subscriber_city_code,
c.call_date, c.call_length
ON(c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code = 4;
• Same query as Simple Join 1!
• QUERY PLAN
-------------------------------------------------Gather Motion n:1
-> Hash Join
-> Hash
-> Redistribute Motion 1:n
Hash Key: s.subscriber_id
Filter: subscriber_city_code = 4

Key
• EXPLAIN SELECT c.call_date, s.subscriber_city_code,
count (*), sum(c.call_length)
ON (c.subscriber_id = s.subscriber_id)
WHERE s.subscriber_city_code IN (9,99,999)
AND call_date BETWEEN '2012/01/04' AND '2012/01/06‘
GROUP BY 1,2
ORDER BY c.call_date, sum(c.call_length) DESC;

• Same query as Simple Join 2 – just different distribution

Key

QUERY PLAN
-------------------------------------------------Gather Motion n:1
Merge Key: call_date, sum
-> Sort
Sort Key: partial_aggregation.call_date, sum
-> HashAggregate
Hash Key: c.call_date, s.subscriber_city_code
-> HashAggregate
-> Hash Join
call_date <= '2012-01-06'::date
-> Hash
Hash Key: s.subscriber_id
Filter: subscriber_city_code =
ANY ('{9,99,999}'::integer[])

Teasers
• EXPLAIN SELECT * FROM calls
ORDER BY call_length DESC
LIMIT 10;
(Easy - top 10 calls by length)
• EXPLAIN EXPLAIN SELECT call_date, count(*)
FROM calls WHERE call_length <= 60
GROUP BY call_date
HAVING count(*) >= 1000000
ORDER BY call_date;
(Easy – all days with at least a million short calls – HAVING clause)
• EXPLAIN SELECT call_date, count(distinct subscriber_id)
FROM calls GROUP BY call_date;
(Hard – per day, the number of subscribers with calls)
• EXPLAIN SELECT call_date,
count(distinct subscriber_id),
count(distinct call_length)
FROM calls GROUP BY call_date;
(Very Hard – two DISTINCT aggregations)

Introduction to Parallel Processing Algorithms in Shared Nothing Databases

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (10)

Similar to Introduction to Parallel Processing Algorithms in Shared Nothing Databases

Similar to Introduction to Parallel Processing Algorithms in Shared Nothing Databases (20)

Recently uploaded

Recently uploaded (20)

Introduction to Parallel Processing Algorithms in Shared Nothing Databases