How to scale relational (OLTP) databases. Think: Sharding @C16LV

Session ID:
Prepared by:
How to scale relational databases ?
Think: Sharding
4829
Maxym Kharchenko
Gluent.com

Whoami
• Started as a database kernel developer
• Then: ORACLE DBA for 15+ years
• Now: Developer at Gluent (past: amazon.com)
• OCM, ORACLE Ace Associate, AWS Developer
• Blog: intermediatesql.com
• Twitter: @maxymkh

The cool things that we do at Gluent
3
Gluent
Oracle
Teradata
NoSQL
Big Data
Sources
MSSQL
App
X
App
Y
App
Z
We glue these
worlds together!

The cool things that we do at Gluent

Relational databases are best for OLTP,
because they are ACID
Unfortunately,
relational databases cannot scale
Conventional wisdom

Riding Moore’s Law
(a.k.a: “traditional” database scaling)
2013 2014 2015 2016 2017
HW
overrun
here !
You are
here !

Traditional database scaling
Old System
New System
2013 2014 2015 2016 2017
Replace
HW
here !

But now: Data grows faster!
Old System
New System
2013 2014 2015 2016 2017

Moore’s Law:
The future ain’t what it used to be!
Old System
New System
2013 2014 2015 2016 2017

Let’s move the database to a bigger box

One machine is just not enough anymore

Clusters are hard
And expensive
- Need top-of-the-line HW
- And super smart engineers
- And additional license $$$$$$$$$

Solution: Shared nothing architecture
a.k.a. “Sharded”

Sharded architecture in a nutshell
=
$$$$$ $$

Split your data into small
independent chunks
Run each chunk
on cheap commodity hardware

Sharding is, basically, partitioning

Except, each “partition” is a database

Practical table design for sharding

Let’s “shard” a simple table
CREATE TABLE books (
id number PRIMARY KEY,
title varchar2(200),
author varchar2(200)
);

CREATE TABLE books (
id number PRIMARY KEY,
title varchar2(200),
author varchar2(200)
) SHARD BY <method> (<shard_key>) (
SPLIT SIZE evenly
SPLIT LOAD evenly
PREFER SINGLE SHARD ACCESS
DISCOURAGE DATA MOVE
USING <N> DATABASES
);
Let’s “shard” a simple table
Not a “real”
ORACLE command
(yet) 

Hey, let’s shard it by “name” range
SHARD BY LIST (first_letter(author))
(
…
SPLIT SIZE evenly
);
A-G
H-M N-T
U-Z

Hey, let’s shard it by “id” range
SHARD BY RANGE (id) (
…
SPLIT LOAD evenly
);
1-100 101-200 201-300 301-400

Hashes are your friend
SHARD BY HASH (id) (
SPLIT SIZE evenly
SPLIT LOAD evenly
);

But (especially for OLTP)
be sure to chose the right hash column
);
SELECT title FROM books
WHERE id = 34567876;

);
WHERE author = 'Isaac Asimov'
ORDER BY title;

SHARD BY HASH (author) (
);
0 1 2 3
ORDER BY title;

Think about eventual re-sharding
SHARD BY hash(author) (
USING 4 DATABASES
);
0 1 2 3

Think about eventual re-sharding
SHARD BY mod(hash(author), 4) (
);
0 1 2 3

Discourage data move
SHARD BY mod(hash_function(author), 6)(
);
0 1 2 3
4 5

Major resharding is a PITA
Hash Mod/4
1 1
2 2
3 3
4 0
5 1
6 2
7 3
8 0
9 1
10 2
11 3
12 0
Hash Mod/4 Mod/6
1 1 1
2 2 2
3 3 3
4 0 4
5 1 5
6 2 0
7 3 1
8 0 2
9 1 3
10 2 4
11 3 5
12 0 0

Solution: Logical shards
);
DB 1 DB 2 DB 3 DB 4

Solution: Logical shards
);
DB 1 DB 2 DB 3 DB 4
DB 5

Which database ?

Which database ?
Hash(author)
Lookup (hash)

Executing the query
def shard_query(sql, binds, shard_key):
""" Execute query in the correct db
"""
shard_hash = hash(shard_key)
logical_bucket = mod(shard_hash, TOTAL_BUCKETS)
physical_db = memcached_get_db(logical_bucket)
execute_query(physical_db, sql, binds)

Standbys
Unsharded StandbyShard 1 Shard 2
Apps
Read Only
Drop non-qualifying data Drop non-qualifying data

MViews
Shard1
Apps
Tab
A
Shard 2
MV
A
Tab
A
Create materialized
view …
as select …
from a@shard1
Drop
materialized view
…
preserve table
Read Only

Moving “data head”
Shard 1
Apps
Shard 2
Logical
Shard
Physical
Shard
(1,2,3,4) 1
(5,6,7,8) 2

Time Logical
Shard
Physical
Shard
2015(1,2,3,4) 1
2015(5,6,7,8) 2
Shard 1
Apps
Shard 2

Time Logical
Shard
Physical
Shard
2015(1,2,3,4) 1
2015(5,6,7,8) 2
2016(1,2) 1
2016(3,4) 3
2016(5,6) 2
2016(7,8) 4
Shard 2
Apps
Shard 3 Shard 4Shard 1

Why shards are awesome
• (potentially) Unlimited scaling
– 100s or 1000s of shards “in range”
• Once routed in, “it’s pure ORACLE”:
– Transactions, ACID, foreign keys etc
• Better maintenance:
– Smaller data, smaller load
• Eggs not in one basket:
– Even if a shard is down, “most of the system” is still up
• “Apples to apples comparison” with other shards

Why shards are NOT so great
• More systems
– Power, rack space etc
– Needs automation … bad
– More likely to fail overall
• Some operations become difficult:
– Transactions across shards
– Foreign keys across shards
• More work:
– Applications, developers, DBAs
– High skill, DIY everything

Thank you!
Please, evaluate my session: 4829
maxym@gluent.com
Twitter: @maxymkh

Data to be “sharded” has to be simple

Your data has to be “simple”
Think this Not that

This is also known as splitting
“by Nouns” or “by Verbs”

Your data is ready for sharding
When it looks
like this
Or like this
(at the most)

How to scale relational (OLTP) databases. Think: Sharding @C16LV

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to scale relational (OLTP) databases. Think: Sharding @C16LV

Similar to How to scale relational (OLTP) databases. Think: Sharding @C16LV (20)

More from Maxym Kharchenko

More from Maxym Kharchenko (7)

Recently uploaded

Recently uploaded (20)

How to scale relational (OLTP) databases. Think: Sharding @C16LV

Editor's Notes