Database Sharding at Netlog
Upcoming SlideShare
Loading in...5
×
 

Database Sharding at Netlog

on

  • 14,387 views

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

Statistics

Views

Total Views
14,387
Views on SlideShare
14,325
Embed Views
62

Actions

Likes
26
Downloads
301
Comments
2

4 Embeds 62

http://www.slideshare.net 39
http://l.lj-toys.com 12
http://lj-toys.com 7
http://www.linkedin.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • As promised, here's a blog post with a full textual transcript of the info in these slides: http://www.jurriaanpersyn.com/archives/2009/02/12/databas...
    Are you sure you want to
    Your message goes here
    Processing…
  • An updated version of this presentation was given at FOSDEM 2009. The time slot here was bigger so I had space to go a little bit more into detail. As I have a ton of notes and remarks to the slides a blog post will follow on my personal blog (www.jurriaanpersyn.com) and the Netlog developer site at netlog.com/go/developer
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Database Sharding at Netlog Database Sharding at Netlog Presentation Transcript

  • Database Sharding #BarCampGhent2
  • Tags scaling, performance, database, php, mySQL, memcached, sphinx
  • echo “Hello, world!”; Jayme Rotsaert core developer @ Netlog since 2 years Jurriaan Persyn lead web developer @ Netlog since 3 years View slide
  • What is sharding? A technique to scale databases View slide
  • Some requirements @ Netlog ... • serve 36+ million unique users • 4+ billion pageviews a month • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  • Master r/w
  • Master w Slave Slave r r
  • Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  • Top Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 1040: Too many Top connections Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 1040: Too many Top connections Master 1040: w Too many 1040: connections Too many Messages connections Friends w w Top Top Slave Slave r r Msgs Msgs1040: Frnds Frnds Too many Slave Slave connections Slave Slave r r r r
  • ?
  • Vertical partitioning?
  • Master-to-master replication?
  • Caching?
  • Sharding!
  • Friends %10 = 2 Friends Friends %10 = 1 %10 = 3 Friends Friends %10 = 0 %10 = 4 Aggregation Friends Friends %10 = 9 %10 = 5 Friends Friends %10 = 8 %10 = 6 Friends %10 = 7
  • More data? More shards!
  • Existing solutions? • MySQL NDB storage engine (sharded, not dynamic) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC • HiveDB (mySQL sharding framework in Java)
  • Our solution • in-house • in php • middleware between application logic and class DB • typically carve shards by $userID
  • sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  • Overview of our Sharding implementation • Sharding Management • “DNS” System (the modulo function) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  • Sharding Management “DNS” • “DNS” system translates $userID to the right db connection details • $userID to $shardID DNS (via SQL/memcache - combination not fixed!) • $shardIDto $hostname & $databasename (generated configuration files)
  • Sharded Tables API • Example API: • An object per / -combination $tableName $userID • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  • Some implications ... • No cross-shard (i.e. cross-user) SQL queries • between sharded tables (LEFT) JOIN becomes impossibly complicated • It’s possible to design (parts of) application so there’s no need for cross-shard queries • Denormalize if you need $userID SELECT on other than • Data consistency
  • Some implications ... (2) • Your DBA loves you again • Smaller, thus faster tables • Simpler, thus faster queries • More atomic operations > better caching • More PHP processing • Needs memory • PHP-webservers scale more easily
  • Some implications ... (3) • $itemID will only be unique in relation to $userID • Downtime of a single databasehost affects only users on that DB
  • Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • Done completely in PHP / transparant / no user downtime
  • What is memcached? General-purpose distributed memory caching
  • Using memcached function isSober($user) { $memcache = new Memcache(); $cacheKey = 'issober_' . $user->getUserID(); $result = $memcache->get($cacheKey); // fetch if ($result === false) { // do some database heavy stuff $result = (($user->getJobIndustry() == Industry::DEFENSE) && $location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; // whatever! $memcache->set($cacheKey, $result, 0); // unlimited ttl } return $result; } var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
  • memcached usage for Sharding • Typical usage: • Each sharded record is cached (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE
  • CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • $cacheRevisionNumberis number, bumped on every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  • Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  • netlog.com/go/developer jayme@netlog.com - jurriaan@netlog.com