Your SlideShare is downloading. ×
0
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Database Sharding at Netlog
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Database Sharding at Netlog

12,433

Published on

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

A technique we use at Netlog to keep databases scalable. How we implemented it, and how we use memcached and Sphinx to tackle it's issues.

Published in: Technology
2 Comments
26 Likes
Statistics
Notes
No Downloads
Views
Total Views
12,433
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
306
Comments
2
Likes
26
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Database Sharding #BarCampGhent2
  • 2. Tags scaling, performance, database, php, mySQL, memcached, sphinx
  • 3. echo “Hello, world!”; Jayme Rotsaert core developer @ Netlog since 2 years Jurriaan Persyn lead web developer @ Netlog since 3 years
  • 4. What is sharding? A technique to scale databases
  • 5. Some requirements @ Netlog ... • serve 36+ million unique users • 4+ billion pageviews a month • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  • 6. Master r/w
  • 7. Master w Slave Slave r r
  • 8. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  • 9. Top Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 10. 1040: Too many Top connections Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 11. 1040: Too many Top connections Master 1040: w Too many 1040: connections Too many Messages connections Friends w w Top Top Slave Slave r r Msgs Msgs1040: Frnds Frnds Too many Slave Slave connections Slave Slave r r r r
  • 12. ?
  • 13. Vertical partitioning?
  • 14. Master-to-master replication?
  • 15. Caching?
  • 16. Sharding!
  • 17. Friends %10 = 2 Friends Friends %10 = 1 %10 = 3 Friends Friends %10 = 0 %10 = 4 Aggregation Friends Friends %10 = 9 %10 = 5 Friends Friends %10 = 8 %10 = 6 Friends %10 = 7
  • 18. More data? More shards!
  • 19. Existing solutions? • MySQL NDB storage engine (sharded, not dynamic) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC • HiveDB (mySQL sharding framework in Java)
  • 20. Our solution • in-house • in php • middleware between application logic and class DB • typically carve shards by $userID
  • 21. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  • 22. Overview of our Sharding implementation • Sharding Management • “DNS” System (the modulo function) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  • 23. Sharding Management “DNS” • “DNS” system translates $userID to the right db connection details • $userID to $shardID DNS (via SQL/memcache - combination not fixed!) • $shardIDto $hostname & $databasename (generated configuration files)
  • 24. Sharded Tables API • Example API: • An object per / -combination $tableName $userID • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  • 25. Some implications ... • No cross-shard (i.e. cross-user) SQL queries • between sharded tables (LEFT) JOIN becomes impossibly complicated • It’s possible to design (parts of) application so there’s no need for cross-shard queries • Denormalize if you need $userID SELECT on other than • Data consistency
  • 26. Some implications ... (2) • Your DBA loves you again • Smaller, thus faster tables • Simpler, thus faster queries • More atomic operations > better caching • More PHP processing • Needs memory • PHP-webservers scale more easily
  • 27. Some implications ... (3) • $itemID will only be unique in relation to $userID • Downtime of a single databasehost affects only users on that DB
  • 28. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • Done completely in PHP / transparant / no user downtime
  • 29. What is memcached? General-purpose distributed memory caching
  • 30. Using memcached function isSober($user) { $memcache = new Memcache(); $cacheKey = 'issober_' . $user->getUserID(); $result = $memcache->get($cacheKey); // fetch if ($result === false) { // do some database heavy stuff $result = (($user->getJobIndustry() == Industry::DEFENSE) && $location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; // whatever! $memcache->set($cacheKey, $result, 0); // unlimited ttl } return $result; } var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
  • 31. memcached usage for Sharding • Typical usage: • Each sharded record is cached (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE
  • 32. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • $cacheRevisionNumberis number, bumped on every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  • 33. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)
  • 34. netlog.com/go/developer jayme@netlog.com - jurriaan@netlog.com

×