The Art of Database Sharding
Upcoming SlideShare
Loading in...5
×
 

The Art of Database Sharding

on

  • 808 views

"The Art of Database Sharding"

"The Art of Database Sharding"
IOUG Collaborate 2012 presentation by Maxym Kharchenko

Statistics

Views

Total Views
808
Views on SlideShare
782
Embed Views
26

Actions

Likes
2
Downloads
26
Comments
1

3 Embeds 26

http://www.linkedin.com 20
https://www.linkedin.com 5
http://www.docshut.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Art of Database Sharding The Art of Database Sharding Presentation Transcript

    • The Artof Database Sharding Maxym Kharchenko Amazon.com
    • April 22-26, 2012Mandalay Bay Convention Center Las Vegas, Nevada, USA www.collaborate12.org www.collaborate12.ioug.org
    • When your data grows …Problem New System Old System
    • The Big Data problemOne machine is not enough
    • Vertical Scaling
    • Scaling Up …
    • Scaling Up …
    • Scaled!
    • What you get when you scale up2+2=5
    • What you get when you scale up2+2=3
    • Scale out, not up
    • Running on >1 machines Difficulty10,000,000 1 0 1 2 3 4 5 Number of machines Courtesy: John Rauser @amazon.com
    • Distributed computing is hard
    • Distributed System
    • Sharded System
    • Sharding is (relatively) easy
    • Split your datainto small independent chunks And run each chunkon cheap commodity hardware
    • How to split your data Data Data Data Data Data
    • How to split your data
    • How to split your data
    • How to split your data
    • How to split your data
    • Step 1: Split off different things
    • Vertical Partitioning
    • Vertical Partitioning
    • Vertical Partitioning
    • Step 2: Chose sharding key and function
    • Sharding
    • Bad Sharding Can we partition collaborate participants by last name ?CREATE Names Distribution Last TABLE Collaborate_Participants ( Shard Size9 last_name varchar2(30) PRIMARY KEY,8 signup_date date7)6543210 A B C D E F G H I J K L M N O P Q R S T U VWX Y Z 1 2 3 4
    • Avalanche Effect i.e. MD5 Bad Distribution Good Distribution
    • Step 3: Make enough shards
    • Hashes and BucketsGood Distribution MOD MOD MOD
    • Resharding3 shards 75 % bad Adding 4th shard Shard: New Shard:Hashed_id mod(hashed_id, 3) Old Shard: mod(hashed_id, 1 1 Hashed_id mod(hashed_id, 3) 4) 2 2 1 1 1 3 0 2 2 2 4 1 3 0 3 5 2 4 1 0 6 0 5 2 1 7 1 6 0 2 8 2 7 1 3 9 0 8 2 0 10 1 9 0 1 11 2 10 1 2 12 0 11 2 3 12 0 0
    • Logical Shards MODGood Distribution MOD MOD MOD
    • Implementing Shards: Standbys Apps Read Only Unsharded Shard 1 Standby Shard 2
    • Implementing Shards: Tables Apps Read Only Create Drop materialized view materialized view … … as select … preserve table Shard 2Shard1 from a@shard1 Tab Tab MV A A
    • Implementing Shards: Moving “data head” LogicalLogical Physical Time Physical Shard Shard Shard Shard Apps (1,2,3,4) 2011(1,2,3,4) 1 1 (5,6,7,8) 2011(5,6,7,8) 2 2 Time Logical Physical Shard Shard 2011(1,2,3,4) 1 2011(5,6,7,8) 2Shard 1 Shard 2 Shard 3 Shard 4 2012(1,2) 1 2012(3,4) 3 2012(5,6) 2 2012(7,8) 4
    • Why shards are awesome• Small data, small load – Better caching, faster queries – Smaller load, fewer surprises – Faster maintenance, i.e. restores• Eggs not in one basket: – Availability redefined – Safer maintenance• Multiple points of view: – SQL performance – System load
    • Why shards are NOT so great• More systems – Power, rack space etc – Needs automation … bad – More likely to fail overall• Some operations become impractical: – Joins across shards – Foreign keys across shards• More work: – Applications, developers, DBAs – High skill, DIY everything
    • Takeaways More > BiggerORACLE is still cool
    • Thank you! Session 369 maxym@amazon.comhttp://intermediatesql.com
    • Bad Sharding. Example 2Can we shard customers by meaningless sequence ?CREATE TABLE Orders ( order_id number PRIMARY KEY, customer_fname varchar2(30), customer_lname varchar2(30), order_date date) order_id: order_id: order_id: order_id: 10000 - 20000 20001 - 30000 30001 - 40000 40001 - 50000