MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
Upcoming SlideShare
Loading in...5
×
 

MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

on

  • 6,082 views

 

Statistics

Views

Total Views
6,082
Views on SlideShare
6,046
Embed Views
36

Actions

Likes
18
Downloads
203
Comments
0

6 Embeds 36

http://dinamicas.emvideos.com.br 22
https://twitter.com 5
http://www.linkedin.com 4
http://www.twylah.com 2
https://www.linkedin.com 2
http://paper.li 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen Presentation Transcript

  • The Secret Sauce of ShardingRyan ThiessenDatabase OperationsApril 2011
  • Agenda 1 Sharding 101 2 Bad Sharding 3 Facebook’s Universal Database 4 Re-Sharding 5 Operational Implications
  • Sharding 101
  • Bad news: there is no single way to shard▪  What is the secret sauce of anything?▪  Some basic building blocks▪  Moreabout what NOT to do rather than a specific recipe▪  Wide variation in implementation
  • Why not to shard your data▪  Can’t do JOINs inside the RDBMS across shards▪  Data denormalization has drawbacks ▪  Redundant storage ▪  Chore to keep everything in sync▪  Ops & Maintenance is harder ▪  Schema changes, are more difficult ▪  Monitoring challenges▪  You don’t do this because it’s cool, but because you have to
  • Why to shard your data▪  Because you have to▪  Doing joins outside of the RDBMS isn’t that bad▪  Less contention on hot tables▪  Continue using commodity hardware▪  Single instance failure affects only a small proportion of users
  • Basic building blocks of good sharding▪  Shard uniformity ▪  SKU, schema, queries▪  Organize shards according to data access patterns ▪  Picking the right key to shard on▪  Ability to grow, re-shard and shed load quickly▪  Achieve operational efficiencies of scale
  • Bad Sharding
  • “Sharding” by applicationBad sharding▪  Example: each application gets its own database▪  Result: ▪  Data distribution is non-uniform, massive hot spots ▪  Every data access pattern is unique ▪  Very little efficiency of scale Commerce User Logging Customer Sales Config Database Database Database Database Database Database
  • Fixed hashingBad Sharding▪  Example: you have X instances ▪  Hashing algorithm splits data evenly across each▪  Result: ▪  Unbalanced load, hot spots ▪  What to do about data growth? ▪  How do you re-shard and/or shed load?
  • Hyper-shardingBad Sharding▪  Example: hash keys randomly across all instances, without any grouping▪  Result: ▪  every fetch has to touch many shard to fulfills request ▪  Request latency becomes the max() of all shard latencies ▪  A single shard’s availability issue affects every request
  • How to choose a good shard key?▪  Understand how your applications will access your data ▪  Be careful of data distribution▪  Example: user ID▪  Example: time grouping▪  Example: random sharding▪  TL;DR: use the same methodology as picking a partition key
  • Facebook’s Universal Database
  • Multiple shards per physical hostFacebook UDB▪  Multiple database shards per MySQL instance▪  Multiple MySQL instances per host on different ports▪  Each shard has identical schemas▪  This enables web scale
  • HashingFacebook UDB▪  Group related objects together ▪  Collocate most user data on a single shard ▪  If an application has related objects, group them together▪  Whenreferring to objects in a remote shard, store a reference to the object in both shards▪  Multiple logical hashing schemes can co-exist over the same set physical hosts
  • Shard management serviceFacebook UDB▪  Methods: ▪  Map object IDs to logical (shard) IDs – procedural (simple hash) ▪  Map shard IDs to physical instances – manual▪  Use Thrift to access these methods from any language▪  Distribute shard metadata close to apps to reduce request latency ▪  Extremely read heavy ▪  Updated relatively infrequently
  • Example: fetching data from a shardFacebook UDB▪  Example: application request to get data for object ID 12345678901 ▪  Call a function: 12345678901 % 40000 => maps to shard 38901▪  Resolve shard ID 38901 to physical instances Instance Repl Type Region Enabled db243:3306 master A enabled db533:3308 replica A enabled db874:3306 replica B disabled db983:3307 replica B enabled▪  Application is in region B and only needs read, so prefer to return a connection to shard 38901 on instance db983:3307
  • Adding nodes Facebook UDB ▪  New user pools ▪  List(s) of shard IDs where new objects go ▪  Reverse the hashing function, generate object ID which maps to one of the new ID pool shards ▪  Usually new instances to add more overall capacity to the tier ▪  Can be existing instances to get more utilization App requests Get list of Generate ID Connect to thestorage on new available which maps to selected shard, node shards, pick one that shard save object
  • Re-Sharding
  • The Easy Way: shedding loadRe-Sharding▪  Split off logical dbs from a single MySQL instance Host1:3306 Host2:3306 Host1:3306 Host2:3306 Split ShardA ShardA ShardA ShardB 1.  Block writes 2.  Break replication from ShardC ShardD ShardB ShardB Host1->Host2 3.  Drop databases ShardC ShardC 4.  Reconfigure Shard Manager to point to new instances 5.  Re-enable writes ShardD ShardD•  Splitting off instances running on different ports is easier
  • The Hard Way: double-write dataRe-Sharding1.  Create new layout on all new instances2.  On each new write, store in both places3.  Separate process to backfill from the legacy storage4.  Switch over reads to the new storage5.  Monitor the old storage for reads6.  Stop double-writes, drop old tables▪  This is I/O intensive and painful, but very possible
  • Operational Implications
  • Everything is harderOperational Implications of sharding▪  Monitoring is harder▪  Schema changes are harder▪  Upgrades are harder▪  Backups and restores are harder▪  Etc. Seriously.▪  “This will probably never happen” will probably happen▪  90% of your time can be spent on 10% of the shards (or less)
  • Top-N monitoring Operational Implications ▪  Problemswith individual shards can get lost in the aggregate or mean ▪  Look at the worst “offenders”, identify outliers ▪  pmysqlis an excellent tool for doing this this quickly$ cat hosts.txt | pmysql ‘show status like “threads_running”’ |sort –k3 –n | tail –n20!!OHAI!
  • Uniformity of shardsOperational Implications▪  Every shard should have the same schema▪  Keeps the SKUs, configurations, etc, as consistent as possible▪  Don’t scale shards by migrating the worst to better hardware ▪  Ops will have to keep track of this in the future
  • Application gatingOperational Implications▪  Very easy for a bad application to consume all shard resources▪  Limit per-shard concurrency for each application ▪  User limits are OK ▪  Admission control is better▪  Log failures at both client and server levels
  • The Good News: efficiencies of scaleOperational Implications▪  The problems are hard, but there are solutions▪  Fixing the problems of the worst shards usually also have benefit the median shards▪  Loss of a single shard is not the end of your website▪  Easy to safely test changes on a small subset▪  Automationand tooling mean the team can debug and fix problems with high parallelism
  • (c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0