• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Sharding using MySQL and PHP
 

Sharding using MySQL and PHP

on

  • 9,838 views

...


In deploying MySQL, scale-out techniques can be used to scale out reads, but for scaling out writes, other techniques have to be used. To distribute writes over a cluster, it is necessary to shard the database and store the shards on separate servers. This session provides a brief introduction to traditional MySQL scale-out techniques in preparation for a discussion on the different sharding techniques that can be used with MySQL server and how they can be implemented with PHP. You will learn about static and dynamic sharding schemes, their advantages and drawbacks, techniques for locating and moving shards, and techniques for resharding.

Statistics

Views

Total Views
9,838
Views on SlideShare
9,732
Embed Views
106

Actions

Likes
17
Downloads
186
Comments
0

7 Embeds 106

http://incertain2.wordpress.com 94
https://twitter.com 4
http://www.slashdocs.com 3
https://si0.twimg.com 2
https://www.chatwork.com 1
http://www.linkedin.com 1
https://www.linkedin.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Sharding using MySQL and PHP Sharding using MySQL and PHP Presentation Transcript

    • Sharding using PHP Insert Picture HereMats Kindahl (Senior Principal Software Developer) 2Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
    • About the Presentation After this presentation you should know what sharding is and the basic caveats surrounding sharding. You should also have an idea of what is needed to develop a sharding solution.3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Program Agenda  Why do we shard  Introduction to sharding  High-level sharding architecture  Elements of a sharding solution  Sharding planning4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • What is sharding? Splintering Horizontal Partitioning ● Slice your database into independent data “shards” ● Queries execute only on one shard ● Shards can be stored on different servers5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Sharding for locality “B ig D at a ” cl os e to us er6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Sharding for performanceReduced working set Database vs. cache Parallel processing7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Sharding Limitations ● Auto-increment – Composite key – Distributed key generation – UUID? ● Cross-shard joins – Very expensive: avoid them – Federated tables?8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Developing a Sharding Solution9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • High-level Architecture ● Broker – Distributes queries ● Sharding Database – Information about the shards – If it goes down, all goes down – Need to be HA10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Running Example: Employees sample database Table Rows salaries 2 844 04700 titles 443 30800 employees 300 02400 dept_emp 331 60300 dept_manager 2400 departments 90011 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Areas to cover Data Meta-Data Sharding Query Operations12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Data Partition Mapping Shard Data Keys Allocation Key Columns Range Mapping Single Shard Dependent Columns Hash Mapping Multiple Shards Tables to Shard List Mapping13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data Table Rows salaries 284 404 700 titles 44 330 800 employees 30 002 400 dept_emp 33 160 300 dept_manager 2 400 departments 90014 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: sharding column(s) emp_no birth_date first_name last_name gender hire_date 4711 1989-06-13 John Smith M 2009-12-24 19275 1954-11-12 Sally Smith F 1975-01-01 27593 1477-05-19 Mats Kindahl M 2002-02-27 587003 1830-08-28 Charles Bell M 2003-11-31 ● Sharding columns dictated by queries – Queries should give same result before and after sharding ● One or more columns – Does not have to be primary key, but easier if it is ● Sharding key is needed for re-sharding15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: sharding column(s) ● Choice of sharding columns 9 millions – Distribution SE US – Locality ● Avoid non-unique keys 200 millions – Difficult to get good distribution – Avoid: Country – Prefer: Employee ID16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: dependent columns ? ? Foreign keys Table Rows salaries 284 404 700 ? titles 44 330 800 employees 30 002 400 dept_emp 33 160 300 dept_manager 2 400 ? departments 90017 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: dependent columns SELECT first_name, last_name, salary FROM salaries JOIN employees USING (emp_no) WHERE emp_no = 21012 AND CURRENT_DATE BETWEEN from_date AND to_date; ● Referential Integrity Constraint – Example query joining salaries and employees – Same key, same shard ● JOIN within a shard18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: dependent columns find uer y to Handy q colum ns ent all d epend ● Referential Integrity mysql> SELECT table_schema, table_name, column_name -> FROM -> information_schema.key_column_usage – Foreign Keys -> -> JOIN information_schema.table_constraints -> USING ● Dependent rows -> (table_schema, table_name, constraint_name) -> WHERE constraint_type = FOREIGN KEY -> AND referenced_table_schema = employees – Same shard -> -> AND referenced_table_name = employees AND referenced_column_name = emp_no; +--------------+--------------+-------------+ – Join on equality | table_schema | table_name | column_name | +--------------+--------------+-------------+ | employees | dept_emp | emp_no | ● Sharding Columns | employees | employees | dept_manager | emp_no | salaries | emp_no | | | employees | titles | emp_no | – Follow foreign keys +--------------+--------------+-------------+ 4 rows in set (0.56 sec)19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: unsharded tables Table Rows salaries 284 404 700 titles 44 330 800 employees 30 002 400 dept_emp 33 160 300 dept_manager 2 400 ? departments 90020 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: unsharded tables SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no) WHERE emp_no = 21012 GROUP BY emp_no; ● Referential Integrity Constraint – Join with sharded tables – Tables dept_emp (and dept_manager) references two tables ● Shard table departments? – Not necessary: small table – Difficult to get right: keeping shards of two tables in same location21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Partitioning the data: unsharded tables SELECT first_name, last_name, GROUP_CONCAT(dept_name) FROM employees JOIN dept_emp USING (emp_no) JOIN departments USING (dept_no) WHERE emp_no = 21012 GROUP BY emp_no; ● Solution: do not shard departments – Keep table on all shards – Joins will only need to address one shard ● You need to consider … how to update unsharded table22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Data Partition Mapping Shard Data Keys Allocation Key Columns Range Mapping Single Shard Dependent Columns Hash Mapping Multiple Shards Tables to Shard List Mapping23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Keys to Shards ● Given – Sharding key value – Optional other information (tables accessed, RO or RW, etc.) ● Provide the following – Shard location (host, port) – Shard identifier (if you have multiple shards for each server)24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Keys to Shards ● Range Mapping: range of values for each shard – Type-dependent ● Hash Mapping: hash of key to find shard – Type-independent – Complicated? ● List Mapping: list of keys for each shard – Does not offer good distribution25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Data Partition Mapping Shard Data Keys Allocation Key Columns Range Mapping Single Shard Dependent Columns Hash Mapping Multiple Shards Tables to Shard List Mapping26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Shard Allocation: Single Shard per Server ● Idea: there is only one shard on each server ● Advantage: Cross-database queries does not require rewrite ● Disadvantage: Expensive to balance server load … moving hot data from server requires re-sharding SELECT first_name, last_name FROM   employees.employees JOIN expenses.reciepts USING (emp_no) WHERE   currency = USD27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Shard Allocation: Multiple Shards per Server ● Idea: Keep several “virtual shards” on each server ● Advantages – Easier to balance load of servers … move hot virtual shards to other server – Improves performance – Increases availability28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Shard Allocation: Multiple Shards per Server ● Disadvantage: cross-database queries require rewrite – Error-prone – Expensive? ● Queries that go to one database not a problem SELECT first_name, last_name FROM   employees.employees JOIN expenses.reciepts USING (emp_no) WHERE   currency = USD29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Shard Allocation: Multiple Shards per Server ● Idea: Add suffix to database name (optionally table name) employees_N.employees employees_N.employees_N ● Idea: Keep substitution pattern in query string SELECT first_name, last_name FROM   {employees.employees} JOIN {expenses.reciepts} USING (emp_no) WHERE   currency = USD30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Shard Allocation: Multiple Shards per Server class my_mysqli extends mysqli {   var $shard_id;   public function query($query,                         $resultmode = MYSQLI_STORE_RESULT)   {     $real_query = preg_replace(/{(w+).(w+)}/,                                “$1_{$this­>shard_id}.$2”,                                $query);     return parent::query($real_query, $resultmode);   } }31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Areas that we need to cover Data Meta-Data Sharding Query Operations32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Meta Data Mapping Shard Mapping Methods Information Schemes Static Sharding Shard ID Range Mapping Dynamic Sharding Shard Host Hash Mapping Shard Specifics* List Mapping * If you use multiple shards per server33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Methods: Static Sharding ● Idea: Compute shard statically ● Advantages – Simple – No extra lookups – No single point of failure ● Disadvantage – Lack of flexibility34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Methods: Static Sharding, in code ● Dictionary class ● Input: sharding key ● Output: connection class Dictionary { private $emp_no; public function __construct() { ... } public function set_key($emp_no) { $this->emp_no = $emp_no; } public function get_connection() { $i = $this->shardinfo[$this->emp_no % count($this->shards)]; return new mysqli("p:{$i->host}", $i->user, $i->passwd, $i->db, $i->port); } }35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Methods: Static Sharding, in code $HIRED = <<<END_OF_QUERY SELECT first_name, last_name, hire_date, salary FROM employees AS e, salaries AS s WHERE s.emp_no = e.emp_no AND e.emp_no = ? AND CURRENT_DATE BETWEEN s.from_date AND s.to_date END_OF_QUERY; $DICTIONARY = new Dictionary(); $DICTIONARY->set_key($emp_no); $link = $DICTIONARY->get_connection(); if ($stmt = $link->prepare($HIRED)) { $stmt->bind_param(i, $emp_no); $stmt->execute(); $stmt->bind_result($first, $last, $hire, $salary); while ($stmt->fetch()) printf("%s %s was hired at %s and have a salary of %sn", $first, $last, $hire, $salary); }36 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Methods: Dynamic Sharding ● Idea: use a sharding database to keep track of shard locations ● Advantages: – Easy to migrate shards – Easy to re-shard ● Disadvantages: – Complex ● Performance?37 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Dynamic sharding, in code $FETCH_SHARD = <<<END_OF_QUERY shard selection query END_OF_QUERY; class Dictionary { var $dict; var $emp_no; public function __construct() { $this->dict = new mysqli(shardinfo.example.com, ...); } public set_key($emp_no) { $this->emp_no = $emp_no; } public function get_connection() { $stmt = $this->dict->prepare($FETCH_SHARD)) $stmt->bind_param(i, $this->emp_no); $stmt->execute(); $stmt->bind_result($no, $host, $user, $passwd, $db, $port); $stmt->fetch(); return new mysqli("p:{$host}", $user, $passwd, $db, $port); } }38 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Meta Data Mapping Shard Mapping Methods Information Schemes Static Sharding Shard ID Range Mapping Dynamic Sharding Shard Host Hash Mapping Shard Specifics* List Mapping * If you use multiple shards per server39 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Schemes: Range Mapping ● Most basic scheme Shard ID Lower ● One row for each range 0 0 ● Just store lower bound 1 20000 2 50000 SELECT shard_id, hostname, port FROM shard_ranges JOIN shard_locations USING (shard_id) WHERE key_id = 1 AND 2345 >= shard_ranges.lower_bound ORDER BY shard_ranges.lower_bound LIMIT 1;40 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Schemes: Regular Hashing ● Computing a hash from the key ShardID = SHA1(key) mod N ● Adding (or removing) a shard … can require moving rows between many shards … often a lot of rows41 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Schemes: Regular Hashing emp_no=20101 emp_no=43210 emp_no=23456 emp_no=36912 N N+1 0 1 2 3 4 HASH(key) mod N42 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Schemes: Consistent Hashing ● Computing a hash from the key Shard ID Hash SHA1(key) 6 08b1286ad1bebe6... ● Adding (or removing) a shard 2 1c2d4132144211a... 4 9893238ed75cfc9... … only require moving rows from one shard to the new 1 989bb9d2bc381f4... shard 5 cab8c76b85c4e24... 3 eccf30f69fe850f...43 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Mapping Schemes: Consistent Hashing shard1 emp_no=20101 emp_no=36912 shard4 Hash Ring shard2 emp_no=43210 emp_no=23456 shard5 shard344 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Areas that we need to cover Data Meta-Data Sharding Query Operations45 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Handling Query Connector Sharding Dispatch Caches Key Mechanism Time (TTL) Parsing Single/Multi Cast On Error Application Handling Reads Explicit provided Handling Updates Transaction Handling46 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Dispatch: Mechanism ● Proxy – Sharding key extracted from query – Requires extra hop ● Application level – Application provides sharding key – No extra hop47 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Dispatch: Query Type ● Read Query – How do you ensure that it is executing on the right shard? – How do you ensure that it is not cross-shard? ● Update Query – Updating an unsharded table – think about consistency48 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Dispatch: Handling Transactions ● All statements of a transaction should go to the same session – Sharding key on start of transaction? – Is it a read-only or read-write transaction? ● Statements for different transactions can go to different sessions – How to detect transaction boundaries ● Maintaining the session state49 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Dispatch: Handling TransactionsHmm... looks like Sharding key? Ah, there it is!a read transaction Session state? BEGIN SELECT salary INTO @s FROM salaries WHERE emp_no = 20101; SET @s = 1.1 * @s; INSERT INTO salaries VALUES (20101, @s); COMMIT Oops.. it was a BEGIN write transaction! INSERT INTO ...  COMMIT Transaction done! Clear session state? New transaction! Different connection?50 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Query Handling Query Connector Sharding Dispatch Caches Key Mechanism Time (TTL) Parsing Single/Multi Cast On Error Application Handling Reads Explicit provided Handling Updates Transaction Handling51 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Extracting Sharding Key ● Parsing the query – Locating the key – Handling Transactions ● Application-provided sharding key – Annotating queries – Separate function in connector52 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Extracting Sharding Key: Parsing Query ● Problem: Locating the key INSERT INTO   titles(emp_no, title, from_date) ● No generic parser SELECT emp_no, , CURRENT_DATE FROM titles JOIN employees      USING (emp_no) – Application specific parser WHERE first_name = Keith – Constrain application developer BEGIN SELECT … ● Transactions INSERT … COMMIT; – Key needed for first statement53 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Extracting Sharding Key: Application Provided ● Idea: Provide key explicitly /* emp_no=20101 */ BEGIN; SELECT … ● Annotate the statement INSERT … COMMIT; ● Extend connection manager – Demonstrated previously … $DICT­>set_key($key); $link = $DICT­>get_connection(); …54 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Areas that we need to cover Data Meta-Data Sharding Query Operations55 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Monitoring the System ● Monitor load of each node … to see if any node get an unfair number of queries ● Monitor load of each shard (multiple shards per node) … to see if a shard gets an unfair number of queries56 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Re-balancing the System ● If a instance is hot: – Move Shard: Move one shard to another instance ● If a shard is hot: – Split Shard: Split the shard into multiple shards – Move Shard: Move one of the shards to another instance ● If a shard is cold: – Merge Shard: Merge a shard with other shards ● Avoid it – very difficult to do on-line57 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Moving a Shard ● Offline (trivial) – Bring source and target nodes down – Copy shard from source to target – Update dictionary ● Online (tricky) – We go through it on the following slides58 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Online Move of Shard Application 1. Backup shard – Might be multiple databases – Note down binary log position ● “Backup position” Src Dst – Online backup ● mysqldump ● MySQL Enterprise Backup @Pos 2. Restore backup on destination59 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Online Move of Shard Application 3. Start replication – Source to target – Start replication from backup position Dst Src – Only replicate shard? replicate­wild­do­table=db_1.* @Pos60 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Online Move of Shard Application 4. Wait until destination is close enough 5. Write lock on source LOCK TABLES Dst Src 6. Note binary log position – “Catch-up Position” @Pos61 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Online Move of Shard Application 7. Wait for destination to reach catch-up position START SLAVE UNTIL MASTER_POS_WAIT Dst Src @Pos62 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Online Move of Shard Application 8. Update sharding database … will re-direct queries 9. Stop replication Dst Src RESET SLAVE 10.Drop old shard … unless you just wanted a copy @Pos63 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Splitting a Shard ● Application dependent – Change sharding key? – Change sharding scheme? ● Can be expensive ● You will have to do it … eventually64 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Operations: Splitting a Shard one.example.com two.example.com 1. Copy shard to new location 1 – Use on-line move described on 3 3 previous slides 2. Update sharding database – Will re-direct queries 3. Remove rows from both shards 2 – Remove rows that do not belong to the shard65 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Great! Wait a Lets Shard! minute...66 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • When to shard? ● Inherently more complex – Requires careful planning – Application design? ● Alternatives? – Functional partitioning? – Archiving old data?67 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Preparations for sharding ● Monitor the system – Types of queries ● What are the join queries – Access patterns ● What tables are accessed ● Find natural partition keys – Robust and easy to implement – Watch out for cross-shard joins68 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Summary ● What are your goals? ● Do your homework ● Dont be too eager ● Plan ● Develop sharding solution ● Revise the plans69 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
    • Thanks for attending! ● Questions? Comments? ● Download MySQL! http://dev.mysql.com ● Read our book! – Covers replication, sharding, scale-out, and much much more70 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.