Your SlideShare is downloading. ×
0
DATABASE
SHARDING
@NETLOG
Tags
performance,
partitioning, federation,
database, php, mySQL,
memcached, sphinx
jurriaanpersyn.com
lead web dev at Netlog
php + mysql + frontend
since 3 years
A Pan European Social Network


                   Over 40 million active members
                   Over 50 million uniqu...
Our reach beyond Europe
Visitor statistics
Database statistics



• huge amounts of data (eg. 100+ million
  friendships on nl.netlog.com)

• write-heavy app (1.4/1 ...
Performance problems



• most technologies in the web stack are
  stateless

• the only layer not being stateless is the ...
We build Netlog on open source software



• php                    • squid
• mysql                  • and many more ...
•...
Topics
- Netlog history of
  scaling
- Sharding architecture
- Implications
- Our approach
- Tackling the problems
Master
 r/w
Master
          w




Slave            Slave
  r                r
Top
               Master
                 w

Messages                    Friends
  r/w                         r/w
      ...
Top
                     Master
                       w

   Messages                           Friends
      w           ...
1040:
                             Top
               Too many
              connections   Master
                        ...
1040:
                                Top
                  Too many
                 connections   Master      1040:
    ...
?
More vertical
partitioning?
Master-to-master
  replication?
Caching?
Sharding!
pieces
  fragments
horizontal part.
   federation
Photos
                     %10 = 2
          Photos                  Photos
          %10 = 1                 %10 = 3
Pho...
More data?
More shards!
A simple example



   • A blog
   • Split users across 2 databases
   • Users with even ID go to database 1
   • Users wi...
What was ...



$db = DB::getInstance();

$db->prepare(quot;SELECT title, message
	 	 	 	 FROM BLOG_MESSAGES
	 	 	 	 WHERE...
Becomes ...



$db = DB::getInstance($userID);

$db->prepare(quot;SELECT title, message
	 	 	 	 FROM BLOG_MESSAGES
	 	 	 	...
What to partition on?



 • “sharding key”
 • eg. for a blog
  • partition on publication date?
  • partition on author?
 ...
How to partition that key?



 • Partitioning schemes
 • eg.
  • Vertical
  • Range based
  • Key or hash based
  • Direct...
The result



 • Several smaller tables means:
 • Your users love you again
  • You’re actually online!
 • Your DBA loves ...
Some implications ...



 • Problem: No cross-shard SQL queries
  •            between shards becomes
       (LEFT) JOIN

...
Some implications ...



 • Solutions:
  • Parallel querying of shards
  • Double systems
    • guestbook messages:
      ...
Some implications ...



 • Data consistency / referential integrity?
  • enforce integrity on application level
  • “cros...
Some implications ...



 • Balancing (eg. balanced on a user’s ID)
  • What about “power users”?
  • Differences in hardw...
Some implications ...



 • Your web servers talk to more / several
    databases

   • network implications
   • keep con...
Existing / related solutions?



 •   MySQL Cluster (high availability, performance,
     not distribution of writes)

 • ...
Existing / related solutions?



 •   MySQL Proxy (used by following two options,
     introduces LUA)

 •   HSCALE (not f...
Existing / related solutions?



 •   Hibernate Shards
     (whole new DB layer for us)

 •   SQLAlchemy
     (for Python)...
Goals



 • flexible for your hardware department
 • no massive rewrite
 • incremental implementation
 • support for multi...
Our solution



 • in-house
 • 100% php
 • middleware between application logic and
    class DB


 • complete caching lay...
sharddbhost001

    sharddb001            sharddb002

shard0001 shard0002   shard0005 shard0006



shard0003 shard0004   s...
Overview of our Sharding implementation



 • Sharding Management
  • Lookup System (directory based)
  • Balancer / Manag...
Sharding Management Directory



 • Lookup system translates              to the right
                               $use...
Sharded Tables API



 • All sharded records have a
  • “shard key” ($userID)
  • “$itemID” (identification of the record)...
Sharded Tables API



 • Example API:
  • An object per    $tableName/$userID-combination


    • implementation of a clas...
How we query the “simple” example ...



  Query: Give me the blog messages from author
  with id 26. 3 queries:

   1. wh...
Sharding Management: Balancing Shards



 • Define ‘load’ percentage for shards (#users),
   databases (#users, #filesize)...
Some implications ...



 •   Downtime of a single databasehost affects
     only users on shards on that DB

     •   a f...
Tackling the problems



 • memcached
 • parallel processing
 • sphinx
Memcached



 • Makes it faster
  • much faster
  • much much faster
 • Makes some “cross shard” things possible
Using memcached

  function isObamaPresident()
{
	 $memcache = new Memcache();
	 $result = $memcache->get('isobamapresiden...
Typical usage for Netlog Sharding API



• “Shard key” to “shard id” is cached
• Each sharded record is cached as array
  ...
How we cache the “simple” example ...



  Query: Give me the blog messages from author
  with id 26. 3 memcache requests:...
CacheRevisionNumbers


 • What? Cached version number to use in other
     cache-keys

 • Why? Caching of counts / lists
 ...
Some implications ...



 • Several caching modes:
     •   READ_INSERT_MODE


     •   READ_UPDATE_INSERT_MODE


 •   Mor...
Parallel processing



 • Split single HTTP-request into several
    requests and batch process (eg. Friends Of
    Friend...
Sphinx Search



 • Problem:
    How do you give an overview of eg. latest
    photos from different users? (on different
...
Tips & thoughts



 • First and foremost
  • Don’t do if it, if you don’t need to!
      (37signals.com)

   • Shard early...
Tips & thoughts



 • “Don’t do if it, if you don’t need to!”?
  • Sharding isn’t that easy
  • There is no out of the box...
Tips & thoughts



 • “Shard early and often!”?
  • What is the most relevant key to shard on
      for your app? (eg. use...
Tips & thoughts



 • Don’t forget server tuning / hardware
    upgrade / sql optimization

   • can be easier that (re-)s...
Tips & thoughts



• Migration to sharding system
   • Each shard reads from the master and
      copies data (can run in ...
Tips & thoughts



• We don’t need super high end hardware for our
  database systems anymore. Saves $.

• Backups of your...
Tips & thoughts



• We didn’t have to balance user by user yet
   • power users balanced inactive users
   • “virtual sha...
don’t do it if you don’t need to,
   but prepare for when you do


     the last thing you want is
to be unreachable due t...
netlog.com/go/developer
     jurriaan@netlog.com
Resources
• the great development and it services team at Netlog
• www.netlog.com/go/developer
• www.37signals.com/svn/pos...
Resources
• spockproxy.sourceforge.net
• www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and-
  Replication
• ora...
Database Sharding At Netlog
Database Sharding At Netlog
Upcoming SlideShare
Loading in...5
×

Database Sharding At Netlog

58,050

Published on

An approach to horizontal database scaling. Slides from a presentation at the mySQL dev room at FOSDEM 2009. Notes and remarks to these slides will become available soon on www.jurriaanpersyn.com and netlog.com/go/developer

Published in: Technology
4 Comments
88 Likes
Statistics
Notes
No Downloads
Views
Total Views
58,050
On Slideshare
0
From Embeds
0
Number of Embeds
25
Actions
Shares
0
Downloads
895
Comments
4
Likes
88
Embeds 0
No embeds

No notes for slide

Transcript of "Database Sharding At Netlog"

  1. 1. DATABASE SHARDING @NETLOG
  2. 2. Tags performance, partitioning, federation, database, php, mySQL, memcached, sphinx
  3. 3. jurriaanpersyn.com lead web dev at Netlog php + mysql + frontend since 3 years
  4. 4. A Pan European Social Network Over 40 million active members Over 50 million unique visitors per month Over 5 billion page views per month Over 26 active languages and 30+ countries 6 billion online minutes per month Top 5 most active countries: Italy, Belgium, Turkey, Switzerland and Germany
  5. 5. Our reach beyond Europe
  6. 6. Visitor statistics
  7. 7. Database statistics • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  8. 8. Performance problems • most technologies in the web stack are stateless • the only layer not being stateless is the data itself • hardest (backend) performance problems were mostly database related
  9. 9. We build Netlog on open source software • php • squid • mysql • and many more ... • apache • debian • memcached • sphinx • lighttpd
  10. 10. Topics - Netlog history of scaling - Sharding architecture - Implications - Our approach - Tackling the problems
  11. 11. Master r/w
  12. 12. Master w Slave Slave r r
  13. 13. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  14. 14. Top Master w Messages Friends w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs Slave Slave Slave Slave r r r r
  15. 15. 1040: Top Too many connections Master w Messages Friends w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs Slave Slave Slave Slave r r r r
  16. 16. 1040: Top Too many connections Master 1040: w Too many 1040: connections Too many Messages Friends connections w w Top Top Slave Slave r r Frnds Frnds Msgs Msgs1040: Too many Slave Slave Slave Slave connections r r r r
  17. 17. ?
  18. 18. More vertical partitioning?
  19. 19. Master-to-master replication?
  20. 20. Caching?
  21. 21. Sharding!
  22. 22. pieces fragments horizontal part. federation
  23. 23. Photos %10 = 2 Photos Photos %10 = 1 %10 = 3 Photos Photos %10 = 0 %10 = 4 Aggregation Photos Photos %10 = 9 %10 = 5 Photos Photos %10 = 8 %10 = 6 Photos %10 = 7
  24. 24. More data? More shards!
  25. 25. A simple example • A blog • Split users across 2 databases • Users with even ID go to database 1 • Users with uneven ID go to database 2 • Query: Give me the blog messages from author with id 26
  26. 26. What was ... $db = DB::getInstance(); $db->prepare(quot;SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}quot;); $db->assignInt('userID', $userID); $db->execute();
  27. 27. Becomes ... $db = DB::getInstance($userID); $db->prepare(quot;SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}quot;); $db->assignInt('userID', $userID); $db->execute();
  28. 28. What to partition on? • “sharding key” • eg. for a blog • partition on publication date? • partition on author? • author name? • author id? • partition on category?
  29. 29. How to partition that key? • Partitioning schemes • eg. • Vertical • Range based • Key or hash based • Directory based
  30. 30. The result • Several smaller tables means: • Your users love you again • You’re actually online! • Your DBA loves you again • Less DB load/machine, less crashing machines • Even maintainance queries go faster again!
  31. 31. Some implications ... • Problem: No cross-shard SQL queries • between shards becomes (LEFT) JOIN impossibly complicated • Solutions: • It’s possible to design (parts of) the application so there’s no need for cross- shard queries
  32. 32. Some implications ... • Solutions: • Parallel querying of shards • Double systems • guestbook messages: • shard on id of guestbook owner • shard on id of writer • Denormalizing data
  33. 33. Some implications ... • Data consistency / referential integrity? • enforce integrity on application level • “cross server transactions” 1. start transactions on both servers 2. commit transactions on both servers • check/fix routines can become substantial part of development cost
  34. 34. Some implications ... • Balancing (eg. balanced on a user’s ID) • What about “power users”? • Differences in hardware performance? • Adding servers if you grow? • Partitioning scheme choice is important • directory based++, but: SPOF? overhead?
  35. 35. Some implications ... • Your web servers talk to more / several databases • network implications • keep connections open? close after every query? pools?
  36. 36. Existing / related solutions? • MySQL Cluster (high availability, performance, not distribution of writes) • MySQL Partitioning (not feature complete when we needed it, Netlog not 5.1 ready/ compatible at that time, not directory based (?)) • HiveDB (mySQL sharding framework in Java, requires JVM, php interface in infancy state)
  37. 37. Existing / related solutions? • MySQL Proxy (used by following two options, introduces LUA) • HSCALE (not feature complete, partitions files, not yet cross-server, builds on MySQL Proxy) • Spock Proxy (fork of MySQL Proxy, atm only range based partitioning) • HyperTable (HQL) • HBase / BigTable
  38. 38. Existing / related solutions? • Hibernate Shards (whole new DB layer for us) • SQLAlchemy (for Python) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC (well, not MySQL)
  39. 39. Goals • flexible for your hardware department • no massive rewrite • incremental implementation • support for multiple sharding schemes • easy to understand • well balanced
  40. 40. Our solution • in-house • 100% php • middleware between application logic and class DB • complete caching layer built in • most sharded data carved by $userID
  41. 41. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  42. 42. Overview of our Sharding implementation • Sharding Management • Lookup System (directory based) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  43. 43. Sharding Management Directory • Lookup system translates to the right $userID db connection details • to $shardID $userID (via SQL/memcache - combination not fixed!) • to $hostname & $databasename $shardID (generated configuration files, with flags about shard being available for read/write)
  44. 44. Sharded Tables API • All sharded records have a • “shard key” ($userID) • “$itemID” (identification of the record) • related to $userID • mostly: combined auto_increment column
  45. 45. Sharded Tables API • Example API: • An object per $tableName/$userID-combination • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  46. 46. How we query the “simple” example ... Query: Give me the blog messages from author with id 26. 3 queries: 1. where is user 26? > shard 5 2. on shard 5 > give me all “blogIDs” (itemIDs) of user 26 > blogID 10, 12 & 30 3. on shard 5 > fetch details WHERE blogID IN(10,12,30) > array of title + message
  47. 47. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • We can do this on userID level • completely in PHP / transparant / no user downtime
  48. 48. Some implications ... • Downtime of a single databasehost affects only users on shards on that DB • a few profiles not available • communication with that profile not available
  49. 49. Tackling the problems • memcached • parallel processing • sphinx
  50. 50. Memcached • Makes it faster • much faster • much much faster • Makes some “cross shard” things possible
  51. 51. Using memcached function isObamaPresident() { $memcache = new Memcache(); $result = $memcache->get('isobamapresident'); // fetch if ($result === false) { // do some database heavy stuff $db = DB::getInstance(); $votes = $db->prepare(quot;SELECT COUNT(*) FROM VOTES WHERE vote = 'OBAMA'quot;)->execute(); $result = ($votes > (USA_CITIZEN_COUNT / 2)) ? 'Sure is!' : 'Nope.'; // well, ideally $memcache->set('isobamapresident', $result, 0); } return $result; }
  52. 52. Typical usage for Netlog Sharding API • “Shard key” to “shard id” is cached • Each sharded record is cached as array (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • “Revision number” per table/shard key- combination
  53. 53. How we cache the “simple” example ... Query: Give me the blog messages from author with id 26. 3 memcache requests: 1. where is user 26? > +/- 100% cache hit ratio 2. on shard 5 > give me all “blogIDs” (itemIDs) of user 26 > list query (cache result of query) 3. on shard 5 > fetch details WHERE blogID IN(10,12,30) > details of each item +/- 100% cache hit ratio (multi get on memcache)
  54. 54. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • is number, bumped on $cacheRevisionNumber every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  55. 55. Some implications ... • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE • More PHP processing • Needs memory • PHP-webservers scale more easily
  56. 56. Parallel processing • Split single HTTP-request into several requests and batch process (eg. Friends Of Friends feature) • Fetching friends of friends requires looping over friends (shard) (memcache makes this possible) • But: people with 1000+ friends > Process in batches of 500? • Processing 1000+ takes longer then 2*500+
  57. 57. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  58. 58. Tips & thoughts • First and foremost • Don’t do if it, if you don’t need to! (37signals.com) • Shard early and often! (startuplessonslearned.blogspot.com)
  59. 59. Tips & thoughts • “Don’t do if it, if you don’t need to!”? • Sharding isn’t that easy • There is no out of the box solution that works for every set-up/technology/... • Maintainance does get a bit tougher • Complicates your set-up • There might be cheaper and better hardware available tomorrow.
  60. 60. Tips & thoughts • “Shard early and often!”? • What is the most relevant key to shard on for your app? (eg. userID) • Do you know that key on every query you send to a database? (function calls, objects) • Design your application / database schemas so you know that key everywhere you need it. (Migrating schemas is also hard.)
  61. 61. Tips & thoughts • Don’t forget server tuning / hardware upgrade / sql optimization • can be easier that (re-)sharding • Only scale those parts of your application that require high performance • (some clutter can remain on your db's, but are they causing harm?)
  62. 62. Tips & thoughts • Migration to sharding system • Each shard reads from the master and copies data (can run in parallel, eg. from different slaves, to minimize downtime) • Usage of Sharded Tables API is possible without tables actually been sharded (transition phase).
  63. 63. Tips & thoughts • We don’t need super high end hardware for our database systems anymore. Saves $. • Backups of your data will be different. • You can run alters/backups/maintainance queries on (temp) slaves of a shard and switch master/slave to minimize downtime. • If read-write performance isn’t an issue to shard your data, maybe the downtime it takes for an ALTER query on a big table can be.
  64. 64. Tips & thoughts • We didn’t have to balance user by user yet • power users balanced inactive users • “virtual shards” allowed us to split eg. 1 host with 12 db’s into 2 hosts with 6 db’s • Have a plan for scaling up. • When average load reaches x how will you add? • Goes together w/ replication+clustering
  65. 65. don’t do it if you don’t need to, but prepare for when you do the last thing you want is to be unreachable due to popularity
  66. 66. netlog.com/go/developer jurriaan@netlog.com
  67. 67. Resources • the great development and it services team at Netlog • www.netlog.com/go/developer • www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on- sharding • www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard- lessons.html • www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing- Billions-of-Queries-Per-Day • startuplessonslearned.blogspot.com/2009/01/sharding-for-startups.html • www.codefutures.com/weblog/database-sharding • www.25hoursaday.com/weblog/2009/01/16/ BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSch emes.aspx • highscalability.com • dev.mysql.com/doc/refman/5.1/en/partitioning.html • www.hibernate.org/414.html • en.wikipedia.org/wiki/SQLAlchemy
  68. 68. Resources • spockproxy.sourceforge.net • www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and- Replication • oracle2mysql.wordpress.com/2007/08/23/scale-out-notes-on-sharding- unique-keys-foreign-keys • www.flickr.com/photos/kt Notes to the slides will become available soon at the Netlog Developer Blog and my personal blog (www.jurriaanpersyn.com)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×