Database Sharding
#BarCampGhent2
Tags
scaling, performance,
database, php, mySQL,
memcached, sphinx
echo “Hello, world!”;




 Jayme Rotsaert

    core developer @ Netlog

    since 2 years

 Jurriaan Persyn

    lead web ...
What is sharding?




        A technique to scale databases
Some requirements @ Netlog ...




•   serve 36+ million unique users

•   4+ billion pageviews a month

•   huge amounts ...
Master
 r/w
Master
          w




Slave            Slave
  r                r
Top
               Master
                 w

Messages                    Friends
  r/w                         r/w
      ...
Top
                     Master
                       w

   Messages                           Friends
      w           ...
1040:
               Too many      Top
              connections   Master
                              w

   Messages    ...
1040:
                  Too many      Top
                 connections   Master      1040:
                               ...
?
Vertical
partitioning?
Master-to-master
  replication?
Caching?
Sharding!
Friends
                     %10 = 2
          Friends                 Friends
          %10 = 1                 %10 = 3
F...
More data?
More shards!
Existing solutions?




 •   MySQL NDB storage engine
     (sharded, not dynamic)

 •   memcached from Mysql
     (SQL-fun...
Our solution




 • in-house
 • in php
 • middleware between application logic and
     class DB


 • typically carve shar...
sharddbhost001

    sharddb001            sharddb002

shard0001 shard0002   shard0005 shard0006



shard0003 shard0004   s...
Overview of our Sharding implementation




 • Sharding Management
  • “DNS” System (the modulo function)
  • Balancer / M...
Sharding Management “DNS”




 • “DNS” system translates      $userID   to the right db
    connection details

   •   $us...
Sharded Tables API




 • Example API:
  • An object per         /      -combination
                     $tableName $user...
Some implications ...




  • No cross-shard (i.e. cross-user) SQL queries
   •            between sharded tables
        ...
Some implications ... (2)




  •   Your DBA loves you again

      •   Smaller, thus faster tables

      •   Simpler, th...
Some implications ... (3)




  •   $itemID   will only be unique in relation to $userID

  •   Downtime of a single datab...
Sharding Management: Balancing Shards




 • Define ‘load’ percentage for shards (#users),
    databases (#users, #filesiz...
What is memcached?




General-purpose distributed memory caching
Using memcached


function isSober($user)
{
	 $memcache = new Memcache();
	 $cacheKey = 'issober_' . $user->getUserID();
	...
memcached usage for Sharding




 • Typical usage:
  • Each sharded record is cached
         (key: table/userID/itemID)

...
CacheRevisionNumbers



 • What? Cached version number to use in other
     cache-keys

 • Why? Caching of counts / lists
...
Sphinx Search



 • Problem:
    How do you give an overview of eg. latest
    photos from different users? (on different
...
netlog.com/go/developer
jayme@netlog.com - jurriaan@netlog.com
Database Sharding at Netlog
Upcoming SlideShare
Loading in...5
×

Database Sharding at Netlog

12,500

Published on

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

Published in: Technology
2 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total Views
12,500
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
306
Comments
2
Likes
26
Embeds 0
No embeds

No notes for slide

Transcript of "Database Sharding at Netlog"

  1. 1. Database Sharding #BarCampGhent2
  2. 2. Tags scaling, performance, database, php, mySQL, memcached, sphinx
  3. 3. echo “Hello, world!”; Jayme Rotsaert core developer @ Netlog since 2 years Jurriaan Persyn lead web developer @ Netlog since 3 years
  4. 4. What is sharding? A technique to scale databases
  5. 5. Some requirements @ Netlog ... • serve 36+ million unique users • 4+ billion pageviews a month • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  6. 6. Master r/w
  7. 7. Master w Slave Slave r r
  8. 8. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  9. 9. Top Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  10. 10. 1040: Too many Top connections Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  11. 11. 1040: Too many Top connections Master 1040: w Too many 1040: connections Too many Messages connections Friends w w Top Top Slave Slave r r Msgs Msgs1040: Frnds Frnds Too many Slave Slave connections Slave Slave r r r r
  12. 12. ?
  13. 13. Vertical partitioning?
  14. 14. Master-to-master replication?
  15. 15. Caching?
  16. 16. Sharding!
  17. 17. Friends %10 = 2 Friends Friends %10 = 1 %10 = 3 Friends Friends %10 = 0 %10 = 4 Aggregation Friends Friends %10 = 9 %10 = 5 Friends Friends %10 = 8 %10 = 6 Friends %10 = 7
  18. 18. More data? More shards!
  19. 19. Existing solutions? • MySQL NDB storage engine (sharded, not dynamic) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC • HiveDB (mySQL sharding framework in Java)
  20. 20. Our solution • in-house • in php • middleware between application logic and class DB • typically carve shards by $userID
  21. 21. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  22. 22. Overview of our Sharding implementation • Sharding Management • “DNS” System (the modulo function) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  23. 23. Sharding Management “DNS” • “DNS” system translates $userID to the right db connection details • $userID to $shardID DNS (via SQL/memcache - combination not fixed!) • $shardIDto $hostname & $databasename (generated configuration files)
  24. 24. Sharded Tables API • Example API: • An object per / -combination $tableName $userID • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  25. 25. Some implications ... • No cross-shard (i.e. cross-user) SQL queries • between sharded tables (LEFT) JOIN becomes impossibly complicated • It’s possible to design (parts of) application so there’s no need for cross-shard queries • Denormalize if you need $userID SELECT on other than • Data consistency
  26. 26. Some implications ... (2) • Your DBA loves you again • Smaller, thus faster tables • Simpler, thus faster queries • More atomic operations > better caching • More PHP processing • Needs memory • PHP-webservers scale more easily
  27. 27. Some implications ... (3) • $itemID will only be unique in relation to $userID • Downtime of a single databasehost affects only users on that DB
  28. 28. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • Done completely in PHP / transparant / no user downtime
  29. 29. What is memcached? General-purpose distributed memory caching
  30. 30. Using memcached function isSober($user) { $memcache = new Memcache(); $cacheKey = 'issober_' . $user->getUserID(); $result = $memcache->get($cacheKey); // fetch if ($result === false) { // do some database heavy stuff $result = (($user->getJobIndustry() == Industry::DEFENSE) && $location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; // whatever! $memcache->set($cacheKey, $result, 0); // unlimited ttl } return $result; } var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
  31. 31. memcached usage for Sharding • Typical usage: • Each sharded record is cached (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE
  32. 32. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • $cacheRevisionNumberis number, bumped on every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  33. 33. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  34. 34. netlog.com/go/developer jayme@netlog.com - jurriaan@netlog.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×