Database Sharding at Netlog
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Database Sharding at Netlog

  • 15,004 views
Uploaded on

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • As promised, here's a blog post with a full textual transcript of the info in these slides: http://www.jurriaanpersyn.com/archives/2009/02/12/databas...
    Are you sure you want to
    Your message goes here
  • An updated version of this presentation was given at FOSDEM 2009. The time slot here was bigger so I had space to go a little bit more into detail. As I have a ton of notes and remarks to the slides a blog post will follow on my personal blog (www.jurriaanpersyn.com) and the Netlog developer site at netlog.com/go/developer
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
15,004
On Slideshare
14,942
From Embeds
62
Number of Embeds
4

Actions

Shares
Downloads
303
Comments
2
Likes
26

Embeds 62

http://www.slideshare.net 39
http://l.lj-toys.com 12
http://lj-toys.com 7
http://www.linkedin.com 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Database Sharding #BarCampGhent2
  • 2. Tags scaling, performance, database, php, mySQL, memcached, sphinx
  • 3. echo “Hello, world!”; Jayme Rotsaert core developer @ Netlog since 2 years Jurriaan Persyn lead web developer @ Netlog since 3 years
  • 4. What is sharding? A technique to scale databases
  • 5. Some requirements @ Netlog ... • serve 36+ million unique users • 4+ billion pageviews a month • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  • 6. Master r/w
  • 7. Master w Slave Slave r r
  • 8. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  • 9. Top Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 10. 1040: Too many Top connections Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 11. 1040: Too many Top connections Master 1040: w Too many 1040: connections Too many Messages connections Friends w w Top Top Slave Slave r r Msgs Msgs1040: Frnds Frnds Too many Slave Slave connections Slave Slave r r r r
  • 12. ?
  • 13. Vertical partitioning?
  • 14. Master-to-master replication?
  • 15. Caching?
  • 16. Sharding!
  • 17. Friends %10 = 2 Friends Friends %10 = 1 %10 = 3 Friends Friends %10 = 0 %10 = 4 Aggregation Friends Friends %10 = 9 %10 = 5 Friends Friends %10 = 8 %10 = 6 Friends %10 = 7
  • 18. More data? More shards!
  • 19. Existing solutions? • MySQL NDB storage engine (sharded, not dynamic) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC • HiveDB (mySQL sharding framework in Java)
  • 20. Our solution • in-house • in php • middleware between application logic and class DB • typically carve shards by $userID
  • 21. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  • 22. Overview of our Sharding implementation • Sharding Management • “DNS” System (the modulo function) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  • 23. Sharding Management “DNS” • “DNS” system translates $userID to the right db connection details • $userID to $shardID DNS (via SQL/memcache - combination not fixed!) • $shardIDto $hostname & $databasename (generated configuration files)
  • 24. Sharded Tables API • Example API: • An object per / -combination $tableName $userID • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  • 25. Some implications ... • No cross-shard (i.e. cross-user) SQL queries • between sharded tables (LEFT) JOIN becomes impossibly complicated • It’s possible to design (parts of) application so there’s no need for cross-shard queries • Denormalize if you need $userID SELECT on other than • Data consistency
  • 26. Some implications ... (2) • Your DBA loves you again • Smaller, thus faster tables • Simpler, thus faster queries • More atomic operations > better caching • More PHP processing • Needs memory • PHP-webservers scale more easily
  • 27. Some implications ... (3) • $itemID will only be unique in relation to $userID • Downtime of a single databasehost affects only users on that DB
  • 28. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • Done completely in PHP / transparant / no user downtime
  • 29. What is memcached? General-purpose distributed memory caching
  • 30. Using memcached function isSober($user) { $memcache = new Memcache(); $cacheKey = 'issober_' . $user->getUserID(); $result = $memcache->get($cacheKey); // fetch if ($result === false) { // do some database heavy stuff $result = (($user->getJobIndustry() == Industry::DEFENSE) && $location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; // whatever! $memcache->set($cacheKey, $result, 0); // unlimited ttl } return $result; } var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
  • 31. memcached usage for Sharding • Typical usage: • Each sharded record is cached (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE
  • 32. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • $cacheRevisionNumberis number, bumped on every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  • 33. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  • 34. netlog.com/go/developer jayme@netlog.com - jurriaan@netlog.com